Tags:
Steve Kosten is an instructor with the SANS Institute for DEV541: Secure Coding in Java/JEE.
Password Storage Mistakes
I was visiting a web site recently that I haven't visited in many, many years. I tried a few old passwords I used to use before I started using a password storage system, but no luck. I was defeated. Barred from entry into this site. But wait, they have a "Forgot Password" link; knowing I will soon have entry into the site, I confidently click on that link (after entering what I believe my username is). Boom, a few seconds later, I have an email from this web site that I will not name. Opening the email, there it was. The password I had created from ages ago. Head-slap.
The head slap wasn't for me forgetting my password; what were the developers of this site doing storing MY PASSWORD in clear text??!! Where anyone with access to the database could readily read my password along w/ thousands of other users. I was incensed. But wait, while we know that storing passwords in clear text in the database is a big mistake, isn't it possible that my password wasn't stored in clear text, but rather the application stores my password in encrypted form and decrypts it prior to sending out the email. Was I being unjustifiably harsh?
Storing my password using reversible encryption means that my password may be decrypted. Someone who has access to the application's system and key, may also readily access my password and the thousands of users at this site. One compromise of the key and ALL passwords are toast.
Hashing
So I don't like clear text storage of passwords, I don't like my passwords being stored in encrypted form, what do I like? Well, I want to see my password hashed as a message digest, with a few extra steps that I will dive into later. But first, let's review what a message digest is.
Hashing, which results in a message digest, is a one-way operation. When a hash is calculated, the input is provided and the result is a fixed length output. The SHA-512 algorithm has a 64 byte output, the SHA-256 algorithm has a fixed 32 byte output, and the bcrypt algorithm has a fixed 60 byte output. For each of these algorithms, there are an infinite number of potential inputs into the algorithm, but a finite number of potential outputs from the algorithm. That is, there will be more than one input that creates the same output. When this happens, this is called a "collision". There has to be collisions because you can't map infinite input space onto finite output space without collisions. Good cryptographic algorithms make it difficult to find collisions, especially meaningful ones. Moving forward, let's assume we're using good cryptographically strong algorithms that make it extremely hard to find collisions.
So by hashing my password, we've solved the password storage problem. When I first setup my password, the application hashes my password and stores the message digest. When I subsequently login, the application again hashes my clear-text password that I entered and compares it to the message digest previously stored. If they match, then the passwords are the same (again, we're using good cryptographically strong algorithm that makes it very hard to find collisions). We know hashing is irreversible, which means that even the people who have access to the database cannot get my password!! Problem solved!
But we're being too quick, the problem isn't yet solved; people crack passwords. Unfortunately, while hashing is irreversible, passwords that have been hashed can still be cracked. A number of password cracking methods can be used, including password dictionaries, rainbow tables, or brute force. A password dictionary is not just my Webster's unabridged dictionary. It is that Webster's unabridged dictionary in every conceivable language on steroids. It has misspellings of words, numbers added throughout the words, capital variations, lower case variations, mixed case variations. It is huge. A rainbow table is a table that contains the pre-computed hash values for all of the passwords in the password dictionary. During a password attack, looking up a hash value in the rainbow table is much faster than hashing and comparing each value in the password dictionary. The details of how rainbow tables work are out of scope for this blog. For now, know that a very good rainbow table can take up a lot of space on disk and can readily crack most passwords people use.
Salting
So is this hashing useless to store passwords? No. You just need a little something special to make it better. And a little salt does just the trick. Well, actually a big complex salt is better. Suppose I have the following as my password "123456". This would be trivially easy to crack, any dictionary attack or rainbow table could be used to find this password. But suppose the developers, before they hash my password, append a large random character string (aka "salt"). So rather than calculating the hash of 123456, or H(123456), they instead store the following large random character string (salt) along with my password, say "4zLrfqPz!AYrkesHvKb6u?T". When I create my password, the application code creates a salt, stores it, and then calculates the hash of my password concatenated with the generated salt, that is H(password + salt) or in my case H(12345644z!LrfqPzAYrkesHvKb6u?T). When the application developers decided to use salts when hashing, they effectively made my trivially simple password much more complex and made it very unlikely that someone could crack the password on a pre-computed dictionary attack or through the use of a pre-computed rainbow table. Rather, with a sufficiently large random salt, an attacker must create a separate dictionary or rainbow table for each user's password they are trying to crack. This is extraordinarily expensive and that is why strong salts effectively break dictionary attacks and rainbow tables.
It does seem odd that salts are not secret and can be stored right along with the message digest. The use of a salt makes a cracker's job much more difficult. Rather than being able to use one dictionary attack or one rainbow table to try to crack every unsalted password stored, they now need to create one rainbow table or dictionary for each and every password/message digest. That is, in a dictionary attack, if they want to crack my "123456" password, they would need to concatenate my salt in order to crack my password. The next user has a different salt, which forces the attacker to recalculate the hash using the new salt. This makes the attacker's job harder, and slows down the password cracking attack.
But if someone really wants to break into my account, and they have my salt, they can create a rainbow table just to crack my password or they could conduct a brute force attack.
In a brute force attack, one determines the password space and then tries every potential password in the password space. If an application only allows users to have a 1 digit password, then the password space is 10. The potential passwords are 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. That is, for a one character password with 10 possible characters, there are 101 possible passwords. If I allow 2 digit passwords, there are 102 (100) possible passwords in the potential password space (0, 1, 2, 3, 4, ?. , 97, 98, 99). By introducing complexity in the passwords, say mixed case alphanumeric up to 10 characters long, the number of potential passwords in the password space becomes 6210 (26 lower case + 26 upper case + 10 digits)10 characters or about 8.4 x 1017 or 840,000,000,000,000,000. Brute force attacks with complex passwords can be difficult. With simpler password requirements that limit the password space, brute force attacks can be more readily done, especially by serializing Graphics Processing Units (GPUs) together to perform the heavy computational load quickly. If you cannot enforce very complex passwords, what else can an application developer do to protect against brute force attacks?
The Work Factor
We can slow them down, to a certain extent. Suppose at my fictional website "myApplication.com", I can authenticate a user in 0.5 seconds, end-to-end from an end user perspective. Suppose my users don't notice or complain about slowness unless I don't respond in 2 seconds. Knowing this, I may wish to authenticate all users in less than 1.5 seconds. So I slow my authentication process down. How does this affect a brute force attacker? The trick is in how I slow my authentication process down.
When I calculate the hash in myApplication.com, if I use a hashing algorithm such as bcrypt, then I can affect the work factor needed to compute the message digest. The bcrypt algorithm allows you to specify the work factor you desire until you get the response time you are seeking; in my case, 1.5 seconds end-to-end. Alternatively, if you want to use a hashing algorithm such as SHA-256 which doesn't have a work factor, you can effectively force it by just iterating the hashes prior to storing. That is, rather than just doing one round of the H(salt+password), I iterate over it thousands of times. In the first iteration of the hashing, I calculate H(salt+password) to get the message digest, or MD1. The second iteration, I calculate the hash of the salt + message digest, or H(salt+MD1) to get MD2. I then continue doing this for thousands of iterations until I slow my authentication speed to levels that don't affect the user but that do slow down an attacker. My end users may not notice that I have a higher workload factor on my bcrypt algorithm or that I'm doing 10,000 iterations on my SHA-256 hashing function, but an attacker who is slowed down by a factor of 10,000 definitely does notice this. It makes my password harder to crack.
And that is just what I want! Well, 2-factor authentication, but that's a topic for another day!
If you are interested in topics like this and other aspects of application security, you may wish to consider looking into DEV541 Secure Coding in Java/JEE: Developing Defensible Applications and DEV544 Secure Coding in .NET: Developing Defensible Applications, which cover topics such as the one covered in this blog and many more.
Steve Kosten is the Denver Chapter President of the Open Web Application Security Project (OWASP) that focuses on information security education related to software applications. He is also co-organizer of AppSec USA 2014. He is an application security specialist who reviews software applications for top 100 firms across multiple industries including the financial, defense, identity management and more. He has a Masters degree in Information Security, and is CISSP and CISM certified.