Storing passwords: are you doing it right?

1423089844246[1]
Almost all web applications involve user authentication at some point, and many use the good old “password” as the primary approach to check if you really are who you claim to be.

This burdens developers with the important responsibility of having to store and process passwords in a secure way. Even if the app you’re protecting is not anything important (say, a silly online game), if your server is hacked and your users’ passwords are stolen, that could have dire consequences — imagine if someone uses the same password for their online banking! (Yes, many people are that stupid.) So let’s see the various approaches that can be taken here…

Plain, unencrypted passwords

No, no, no! Even if you’re not collecting any valuable private data. Re-read the last paragraph.

Passwords encrypted with a secret key

So you could encrypt the password database with DES, AES, or whatever encryption algorithm. In case a hacker gets it, all they will take is a bunch of gibberish that’s worthless without the key. Right?

Well, this option is a bit better than the above one, that’s for sure. It can limit the damage if the data leak is contained to the password database (such as with an SQL injection attack). But you’d still have to store the secret key somewhere, and your server has to be able to access it. So if an attacker can gain privileged access to the server, then they can get the secret key too… game over.

Passwords processed with a hash function

This is a bit better than the above, but still insecure. However, plain hashing is probably still the single most common way passwords are stored in non-enterprise applications. That’s probably because in many programming languages, it’s the easiest way for a programmer who is lazy (or pressed by looming deadlines) to add at least some security. For example, MySQL lets you easily apply the popular MD5 and SHA1 hash functions to a string.

Hashing is different than encryption. It’s more like fingerprinting. A cryptographic hash function can perform a one-way mathematical transformation of your password. You get a fixed-length piece of data that you store in your database. It’s impossible to recover the original password from that. However, it is possible to calculate the hashes of two text values, and verify if they are the same. In this way, you can check if the user has entered the correct password when you don’t even know it! You just compare the hashes.

What’s the problem here? Well, if an attacker gets the hashes, they can simply try to get the passwords by “brute force” (trying every possible combination of letters, numbers, etc. until they hit a match or give up). Modern hardware is fast enough to make this feasible, and attackers can also come “pre-armed” with a ready-made table of millions of common passwords and their corresponding hashes. Additionally, you can look at the hashes to see if two users have the same password, or if one person used the same password on two different websites — this gives important clues to the attackers.

Slow, salted hash

There are two tactics that, together, help prevent cracking of hash values. The first one is to repeat the hash function — you calculate the hash, then the hash of the hash, and so on, hundreds of times. That makes it slower to calculate the hashed value. When a user logs in, a slowdown of a tenth of a second is not going to be noticed. But when cracking a hash, you have to test billions and billions of possible values, so the same slowdown could be enough to prevent bad guys from getting the password.

In addition, there is this thing called a salt — in a typical implementation, for each user, a random, non-secret string would be generated and appended to the password before hashing it. Then the non-encrypted salt is stored together with the hash.

What’s the point of this? Well, the same password with a different salt would generate a different hash. You can no longer tell if two users have the same password just by looking at the hashes. You can no longer come with a table of common passwords/hashes prepared in advance.

Here is a sample user table where the password is simply hashed with the SHA1 algorithm:

user	pasword_hash
leonardo	1f6ccd2be75f1cc94a22a773eea8f8aeb5c68217
donatello	1f6ccd2be75f1cc94a22a773eea8f8aeb5c68217
michelangelo	1f6ccd2be75f1cc94a22a773eea8f8aeb5c68217
raphael	1f6ccd2be75f1cc94a22a773eea8f8aeb5c68217

These four users obviously have the same password, and by looking up the hash in a “rainbow table” readily available online, you can find out in less than a minute that the password is “pizza”.

So let’s “salt” this pizza. See what happens? Same password, much more difficult to steal:

user	password_hash	salt
leonardo	465a6b2ffafed490b7393b5c8b6627b526e933e1	xAbqMfoB
donatello	61cca7b6f4fb480e594b5f5055a4b00d11d21d7c	NY6_aXkl
michelangelo	3dd135ea6500df2b314591372f78e0fa065dcf6b	poFRDtcl
raphael	083a5de9cea7c3617c5b30a02279b4af931e5887	v&gxONFz

This is nothing new

And you know what? All these considerations were described more than thirty years ago in a four-page paper dealing with password security on UNIX systems (R. Morris & K. Thompson 1979, “Password Security: A Case History“, Communications of the ACM, Volume 22, Issue 11). These are supposed to be standard security practices, but often they are not followed, even by large and knowledgeable providers like Adobe.

PHP implementation: long due

Since I primarily work with PHP, I am pleased to note that since 2013, when PHP 5.5.0 was released, there is a simplified way to work with safe password hashes that satisfy all the above conditions. The new password_hash and password_verify functions use a safe algorithm. Their default options are secure enough – no extra keys to provide, no fancy settings to configure.

PHP developers (like me) have no excuse not to use these functions. (Well, to be honest, they have one – some web hosting providers still haven’t upgraded to php 5.5 – but that makes it all the more important to be careful about the hosting that you choose!)