Good Enough Passwords

I deal with passwords a lot. In any given day I log into five or six servers, another dozen web sites, plus my personal systems and tools. Some are stored in password managers, some are memorized, some are mine, and some are shared. To avoid losing things I need I have several patterns, schemes, and password generation tools I use to try to keep up with it all and make sure I’m using good passwords most of the time.

I don’t have a deep background in cryptography and a definitely don’t consider myself an expert in the related math. But I have spent a lot time time with users and observing behaviors, so I do consider myself reasonably knowledgeable about how people actually behave with their passwords.

A couple years ago this XKCD comic came out, and an admin’s view of it has become one of my measures of their understanding of users and passwords:

XKCD Password Strength Comic

If you’re interested in understanding what technically right and wrong with the math and assumptions in that comic you might find some of the references on explain XKCD’s discussion interesting.

I don’t actually think that’s the interesting part. To me the interesting part is the number of system administrators I have talked to who are convinced the comic is wrong, because they are also convinced that anyone who doesn’t use truly random password and avoid password reuse is stupid and therefore their behavior can be ignored. These people have access to your system, and it’s your job to keep those system secure. The problem isn’t your users, the problem is we have made passwords unreasonably hard to do right.

At this point people generally know they are supposed to use good passwords, like we all know we’re supposed to brush after every meal. Sure there are people that do that, but not most of us. We all know that passwords should be long (even if we fight over the exact length to require) and use lots of different kinds of characters (although many password systems require them to be from the latin character set). But it’s too hard to follow all the rules, and security experts are so concerned about being right they don’t provide useful guidance about when the cheat.

Let me grant those who like long-random passwords the following point: if you use different passwords on every system that needs one, and they are all truly random strings, and you memorize them all so you don’t have them recorded someplace they could be stolen, you have the hardest for an attacker to crack. Great for you. But I work with people who are not perfect, have limited memories, and need to be able to have shared access on a regular basis.

Knowing the perfect random password generation pattern is useful in some cases (or so I’ve heard), but rarely are you in a case where you can use the perfect setup. I don’t care about perfect: I’m not perfect, I don’t work in a perfect office, have perfect colleagues, or perfect clients. So here are my good-enough rules for admins and developers.

0) Make it easier to do the right thing than the hard thing. This gets to be rule zero, because everything else is meant to support this idea. You want the path of least resistance to be the one that gets you the results that are secure enough to protect your systems from the attackers they face. Make sure your users have good tools for storing passwords, settings on password fields to encourage good (not perfect) behavior, and a minimum of stupid rules you don’t really understand but someone told you are “best practice”.

1) If you make it hard for people, they will find a way around you and likely weaken security. It might be post-it notes on monitors, cycles of passwords that are 5 long (because you force them to not use the last 4). If you make it hard to pick a password (because you required punctuation but not  ‘, \, “, &, or !), you will end up with lots of passwords that are curse words – and your attacker will thank you for shrinking the search space. If they are using touch devices to type them, they will do things like repeat as many characters as you allow to make it easier to type (if you ban any repeats: again the attackers thank you for shrinking the search space). All things you want them to stop doing.

2) Do not have a maximum password length. Any time I hit a system with an upper bound of 12 I want to scream (although jokes about chimps might be a better tactic). Even if you are using a secure hashing system that ignores all characters after some point: who cares? Why limit the attacker’s search space to only strings between 8-12 characters?!? Sure that’s a massive search space, but not nearly as big as it could be.

3) Do have minimum lengths. Minimum lengths forces your users to do two things. First, not use passwords that could be broken in less time than it took you to read this article. Second, it gives you leverage to push them to either good phrases or generators. If you’re smart and don’t have legacy systems to support go with something like 15 or 20 characters.

4) Expect people to share passwords. Many times this is actually a basic job function. If I won the lottery tomorrow (unlikely since I don’t play) and don’t come to work (also unlikely since I would wait until I had the money reinvested before making plans) the person taking my place needs to be able to access all the tools, servers, and accounts I’ve setup. If she can’t do those basic things I haven’t done my job responsibly.

5) Provide secure means to share passwords. I have more than once been sent a password in chat (running through Google, Slack, or once upon a time AOL’s servers), email, word documents, text file, and a variety of other terrible solutions. This happens not because my colleagues didn’t know it was a bad thing to do but because they didn’t have a good option. We spend so much time locking down passwords, that we don’t create secure channels to hand them around responsibly which defeats the purpose of secured storage.

6) Pay attention to how users will be using individual passwords. Not all passwords are created equal, which is why I encourage you to support throw away passwords: something short, easy to remember, and only used places it doesn’t matter if it were stolen. But even when a password is important there are issues like the ease of entering them: if I have to enter a password 4 times a day it better be easy to type or I better be able to copy and paste it. If I need it once a month it should be impossible to remember and its okay if it takes me 5 minutes to get it right. Most of us can’t type complicated passwords quickly, and if we have to enter it a bunch we want to be fast.  This is even more true for people using touch interfaces where shift is an extra keystroke as is changing to a different part of the standard keyboard.

7) Stop telling people they have to use a different password every time. This is an extension of number six. People have too many passwords, and that’s not changing soon. Sure we can encourage them to use LastPass, or a tool like it, but most people aren’t going to (and if they did that could be its own problem since it creates single points of failure). Tell them to use a different password when it’s important, and to use a throw away password or scheme when it’s not.

Not everything needs to be Fort Knox so stop pretending it is.  Important things like your bank account(s), your email, Facebook need their own passwords because they can be used to do real damage in the real world. Online communities, games, and other trivial places asking you to sign in do not.

8) Don’t lecture people about bad personal password habits. Honestly, this is probably the hardest one (here I am lecturing you about not lecturing them). Usually the first people to admit they are sloppy about passwords are developers and sysadmins. Sure, they will tell you about the awesome password wallet they use first, and the two factor authentication they created for their blog, but then toss off that all their production servers have the same root password and it’s 8-10 nonrandom characters. Even if you are perfect (if you are still reading this by definition you probably have room for improvement) don’t lecture people who aren’t. It just makes them feel they can’t admit when something has gone wrong, or if they don’t understand something. When you find people doing it wrong, show them how easy it is to do it right, and if it isn’t apologize and fix it.

Responding to Drupal Break-ins

If you support any web site long enough you will suffer a break in. If you support lots of web sites you will suffer them more often than you’ll want to admit in public. A few weeks ago my number came up again in the attack lottery when we discovered a client’s web site was being used as a proxy and redirect to a fake shoe site.

It wasn’t the first time I’d suffered a break in, and unfortunately I don’t expect it to be the last. My last experience with a major break in was shortly after Drupalgeddon (I patched all the clients I was supporting before they were breached but had to clean up sites that weren’t patched by other vendors), and the attackers had learned a few new tricks in the meantime.

If you are responding to a break in on a Drupal site there are directions on drupal.org to help guide you through an attack response, but I thought it might be helpful to talk through a version of what response can look like in practice. I think it’s also useful for us all to admit our weaknesses from time to time to help us all make sure we’re making new mistakes.

Overview

At the outset I’m going to admit we never found the initial source of the attack, what we did find were the tools they placed after the break in. The most likely cause was poor server patching practices by the client’s host, but there were also some Drupal security patches that had been slow to be get installed as well. During the attack I worked with members of the Drupal Security team (particularly Greg Knaddison who generously provided feedback on this article as well – of course any remaining mistakes are mine), who were helpful in giving me suggestions and who were clearly interested in helping us make sure we resolved the problem.

The site was being used as part of a scam advertising network. The attacker was leveraging the reputation of the site to create records in search engine indexes that were redirects to a fake shoe sales site. There were also a number of tools placed on the server that gave them full access to the Drupal database and the ability to run arbitrary PHP scripts. And it was clear by the end they had placed additional backdoors we never found – they may have had full control over the OS as well.

How we found out

Google told us.

We got an alert from Google reporting SPAM content on the site. At first we couldn’t find the content they were talking about, which unfortunately slowed our escalating our response, because it was only directly available to search engines. The junior developer who was initially assigned to review the message from Google eventually figured out how to find the listing on Google (a Google site search for Nikes and some hash codes the attacker was using), but couldn’t figure out how out it got there, and escalated the task to me.

Once I saw what she’d figured out my stomach sank. At first I was still hoping there might be some other explanation, or some simple matter of a single user account getting exploited, but that seemed unlikely (since we couldn’t find the content on the server) and I quickly knew it was going to be a mess.

Initial Response

The first thing I did was make sure we had a copy of the exploited site: code, files, and database. I would have rolled the site back to a recent backup, but our five-day rolling database snapshots were not enough to get back to before the attack began. We spun up new virtual machines for myself and another senior developer to start reviewing copies in environments isolated from other work.

Since the URLs we had for testing were a fairly unique pattern we started to Google those – and we got lots of hits. As soon as we knew the problem was larger than our site, we opened an issue with the Drupal security team and started to feed them all the information we had gathered. While their practice is not to get involved in resolving attacks directly (their role is to ensure the security of Drupal core and contributed modules), they were supportive and helpful in suggesting places to look for problems and resolution strategies.

Attacks we found

By the time I was alerted to the problem there were already several malicious tools installed, some of which I’d seen versions of before, and some were new to me – all were designed to be hidden from sight through some simple but effective obfuscation. Over the course of the next couple of days I found several backdoors manually, wrote tools to help me find more, and played entirely too much whack-a-mole (more on that in a bit).

There were two main categories of attack I was chasing: PHP scripts scattered around the public files directory, and records added to Drupal’s database tables.

Database table exploits

If you dealt with sites in the aftermath of Drupalgeddon, or other hacked Drupal sites, you have probably seen what happens when an attacker inserts PHP into carefully targeted parts of a Drupal database. In the ones I’d seen before attackers replaced the callback functions in Drupal’s menu_router table with PHP of their own. In this case the attacker used the Block module’s ability to use PHP to place a block to provide themselves a way to execute arbitrary PHP by sending a post request to the server. They leveraged the fact that the main system block is always available and therefore is a reliable place to insert a backdoor. By posting a form with a specific form element they were able to execute arbitrary PHP and therefore use that to place additional malicious code.

The attacker also leveraged Drupal’s system table to get more complex attack code loaded. They created a record for a file to be loaded as a module and then uploaded that file to the site’s files directory where they were guaranteed Drupal had write access.

filename: sites/default/files/styles/medium/public/57h3d21.jpg
name:overly
type: module
owner:
status:1
bootstrap: 0
schema_version:0
weight: 0
info: a:11:{s:4:"name";s:6:"overly";s:11:"description";s:58:"Displays the Drupal administration interface in an overly.";s:7:"package";s:4:"Core";s:7:"version";s:4:"7.32";s:4:"core";s:3:"7.x";s:7:"project";s:6:"drupal";s:9:"datestamp";s:10:"1413387510";s:12:"dependencies";a:0:{}s:3:"php";s:5:"5.2.4";s:5:"files";a:0:{}s:9:"bootstrap";i:0;}

This was the script doing the redirects and filtering traffic so that pages only appeared to search engines. Usually these records have filenames that are .module, .php, or .inc files, but in this case it was a .jpg file named to be similarly to actual files on the site to make it hard to spot.

The content of that file was a PHP script not an image. The script did several things, and was the main tool the attacker was actively using during the time we were trying to stop them. It served as a simple proxy of content that they would present to the search engines, and redirect those same pages to the scam site for anyone else. It also provided code to make sure the content of user login forms was sent to the attacker, and a backup backdoor incase some of their others were lost.

We actually had to remove this particular attack more than once (always using the misspelled “overly” module) and each time it came back with a new file, and each time using a different but similar disguise to try to make their code blend in with legitimate files.

.htaccess files in public files

The other trick that was new to me (and a more aggressive stance by Drupal core on this approach is being discussed) was to take advantage of the .htaccess patterns in Apache to re-enable PHP execution within the public files directory. Drupal’s default .htaccess file disables PHP at the root of the public files directory and in theory all subdirectories, but that can simply be undone by a malicious .htaccess file (unless you block it in Apache’s main configuration – which in my opinion defeats the purpose of using .htaccess).

The attacker had placed a number of basic PHP-based exploits on the server using this technique to allow them to run the scripts. The tools themselves were not Drupal-specific, and likely the .htaccess file would work just as well on a number of other PHP-based CMS platforms.
Since the files directory gets deep and complicated there is no reasonable way to scan the whole thing by hand: particularly since several of the files were using inaccurate file extensions (like .jpg or no extension at all) and file names meant to blend into the background. So in addition to checking for any .htaccess files below the files directory root, I wrote a simple Python script to scan a directory for anything that includes the string <?php:

How we fixed it

We immediately made sure all code on the site was up-to-date, and I removed every exploit I could find. And for a couple days I played whack-a-mole with the attacker. Every day I would remove a series of exploits, disable their ability to redirect users to their scam, and every night they would break back in through a backdoor I’d failed to find.

The final solution was to replace the server, deploy a version of the code known to be good, and deploying copies of the database and files that have been scanned for any PHP in places it shouldn’t be – which involved a combination of the scanner above and hand checking every place the database stores PHP, not a fast process.

What we will do better next time

Part of any security event of this nature needs to be a full review of your internal processes and controls to make sure you reduce your risk and improve your response the next time something occurs (because unfortunately there will be a next time for all of us).

One of our first areas of improvement is a shared understanding that it’s more important to resolve the attack than determine the cause. This goes against the question that developers are constantly asked during and after an attack: “How did this happen?” While you need to know something about what happened, in the end it’s more important to make it stop. I still don’t know what happened that started the attack, I know I stopped it by blocking every attack vector I could think of and replacing every part of the stack with a version known to be fully up-to-date. It would have been faster and cheaper if I’d just started there: yes there is a risk I would have missed some of the code in the database if I hadn’t taken the time to review what I was finding, but frankly I doubt that risk is has high as the risk that new exploits will appear while I’m working to understand the previous one.

Beyond that basic shift in approach we developed a three part list of improvements:

  • Things we needed to improve right away.
  • Things we needed to improve soon.
  • Things that should be part of ongoing improvement.

The highest priority items were coming up with better internal process for initial response, and making sure we are deploying all security updates in a timely but still careful manner, including monitoring our hosting partners to ensure servers stay up-to-date as well. These are basics that are easy to let slip over time – particularly monitoring that your partners are doing their job correctly.

The second category of fixes is filled with workflow and procedure improvements. We were already were in the process of improving our code handling (migrating from SVN to Git, better production monitoring, more internal code review, etc), and we accelerated our plans to complete that work. This category also includes a complete review of our existing backup procedures to make sure they provide the level of coverage our clients need.

The final category of longer term adjustments includes tasks that include ensuring all developers are given (and expected to take) professional development opportunities around security best practices, doing more internal sharing about emerging ideas and trends, and encouraging more community engagement so we are better able to leverage the community resources in a crisis.

This Week’s Drupal Fire Drill

This week many in the Drupal community lost a lot of sleep Tuesday night because the security team treated us to a warning about major security updates due out on Wednesday. Fortunately for many it wasn’t a crisis in the end, but it gave us all a chance to practice for the worst. Basically, it was like a fire drill in a elementary school: we got to prepare like there was a disaster, but since they wasn’t one we don’t really know how it would have gone if there was actually a fire. We haven’t had a stop-drop-and-roll type of emergency in a while, so it was a good refresher on how to handle a crisis.

For those who don’t know what I’m talking about here’s a quick review. At Cyberwoven, like many Drupal shops, we follow the Drupal Security twitter feed on one of our Slack channels so we saw this mid-afternoon Tuesday:

Slack posting of tweet from the security team.

I read the PSA with images of Drupalgeddon dancing in my head:

There will be multiple releases of Drupal contributed modules on Wednesday July 13th 2016 16:00 UTC that will fix highly critical remote code execution vulnerabilities (risk scores up to 22/25).These contributed modules are used on between 1,000 and 10,000 sites. The Drupal Security Team urges you to reserve time for module updates at that time because exploits are expected to be developed within hours/days. Release announcements will appear at the standard announcement locations.

Drupal core is not affected. Not all sites will be affected. You should review the published advisories on July 13th 2016 to see if any modules you use are affected.

Oh that bold line up there wasn’t part of the original announcement. On Tuesday we didn’t have a sense of scale, were we talking about modules that everyone uses on almost every site (ctools came up more than once). It’s the one thing I wish the security team had done differently: given us that sense of scale.

I read all security postings, and make sure we take prompt steps to address them for clients as needed, but the potential here was that we’d have to update all 70+ sites in a few hours or less which is very different from your run-of-the-mill security update that often aren’t related to use cases and threat profiles for the majority of sites.

Here’s what we did next:

Tuesday

  1. Took a minute to panic, complain, and joke about pending illnesses. This is actually a useful step because it allowed me to burn off some nervous energy and then to focus on the real work.
  2. Pulled out the list of all active clients with Drupal sites, and doubled checked it for accuracy.
  3. Made sure a developer had a working repo for all sites (70+). Since we had a couple people out of the office, and some projects had been reassigned recently to different developers, this was an important step to make sure no sites fell through the cracks during a rush to update them all.
  4. Made sure we knew which were 6 sites, and which were 7 in case we are able to determine that 6 is also affected. Since I knew the announcement would likely skip D6, we needed to accept that we might have to take those sites offline for a time.
  5. Made sure leadership knew that all developers may be busy start at 16:00 UTC on Wednesday. We didn’t actually cancel anything right away, but I didn’t want anyone surprised if we were all too busy posting and testing updates to worry about things like meetings.
  6. Made sure complex projects were thought about ahead of time: sites with unusual setups or ongoing dev work that make sudden updates complex. For example we have one client that has 16 sites that all have an unusual set up, so we agreed who would handle those and made sure she was prepared.

Wednesday

  1. First thing in the morning I saw the update to the notice that gave us a sense of scale and relaxed a little, but still made sure we were fully prepped.
  2. Noon: the announcement of what modules were affected was released and a couple other developers and I immediately reviewed the releases. We relaxed once we determined none of our clients were using any of the modules listed.
  3. I reviewed code from each of the three modules to see what the change was to look for ways to improve my own code to avoid similar errors.
  4. Looked for ways to improve our response for the next time it’s not a drill.

Things would have been more exciting if we’d had to update our sites. Since we were prepared it was a matter of minutes for us to check that all our sites were secure. Each developer checked all the sites in their sandbox, and since I knew all sites were in someone’s sandbox that gave us 100% coverage without having to do lots of double checks.

I think it is too easy to look past doing the code review of modules we weren’t using but I find this kind of follow up really useful. Looking back on Drupalgeddon it’s amazing how much pain was caused by such a small error (16 characters are all that were needed to fix it). And by seeking to understand what went wrong you can look for places that you make similarly invalid assumptions.

If you read my post on making new mistakes, you also know I believe that looking for improvement is the most important detail (particularly when it turned out to be a drill, not a fire). Here’s my initial list of things to improve:

  • Have a system to automatically check every site for specific modules (this was already under development but will take a little while longer to complete).
  • Make sure at least two developers have a working sandbox for all projects at all times in case something comes out during a vacation.
  • Improve internal messaging about what to expect – template message and process.  I tossed together some disjointed thoughts for account managers. But disjointed developer thinking does not make people feel like you’re on top of things.
  • Have better tracking of outliers:  I completely missed that I had a demo site on Pantheon that did have the Coder module running.  Since it was Pantheon they alerted me to this problem and had taken steps until I could do the update myself. But it would have been bad if that site had been someplace else and/or in production.
  • Make sure everyone one knows when the actual release is coming out, and what the outcome was. Several developers were hoping to get lunch before the updates, but hadn’t done so when the announcement came out (which could have been a problem if we’d ended up busy).  And I spent the rest of the day answering one-off questions from the account team who wanted to know if the announcement had been bad news.

Ideally we’d come up with a method to automate security updates (maybe all updates), but that’s not totally straightforward.  We have to worry about required patches, non-standard setups, automated testing, and other details. There has been discussion on the Pantheon power user’s mailing list, but every shop has a slightly different workflow (like the fact that we don’t use Pantheon much at all) so we’ll need to come up with a system that accommodates our system.