The recent news that the popular dating site Plenty Of Fish was hacked and that passwords and other user information was stolen truly disheartened me.
It was just the latest in a seemingly endless list of such hacks over the years, recently including Gawker Media (Lifehacker, Gizmodo), McDonald's, Walgreen's and Pizza Hut.
Apparently, Little Bobby Tables is alive and well.
What is disheartening to me is not so much the security breaches themselves. No site is completely secure. The operators of these and other sites may have been able to do more than they did to prevent the breaches, or maybe they had really good (but not good enough) security. But security breaches will happen. I think that every site has to assume that it is possible that their site will be hacked and their database stolen.
I'm not suggesting that they just accept the inevitable, and give up. I'm suggesting that everyone that builds a website that collects information from their users (that's every site that has user accounts) should care enough to do their best to ensure that when the site is compromised, the damage is as minimal as possible. That's why the Plenty Of Fish hack was so disturbing. Passwords were apparently stored in the database in clear-text. Once the database was stolen, passwords were readily available, with no further work on the hacker's part.
I had thought that clear-text passwords were such a well-known evil that no-one was doing it any more, but apparently I was mistaken. Web developers who don't know any better apparently still store plain-text passwords so that they can implement the even less-secure feature of emailing it to you if you forget it (or even sometimes even if you don't).
So, what's the right thing to do? Obviously, the most important prerequisites are being aware of the problem, and caring enough about your users to do something about it. If you've read this far, I'll assume that you have those two covered.
The next thing is to not store passwords. Let me repeat that - do not store your user's passwords. By that, I do not mean that you should encrypt the passwords before you store them, though that would be a huge improvement over plaintext. The problem with storing encrypted passwords is twofold. The first, as previously mentioned, is that it may tempt you to decrypt it for some reason (like sending it to the user in an email). The second problem is that unless you are really careful with your encryption, the data thief may very well have the resources to decrypt the passwords.
Instead of storing a user's password, store a cryptographic hash of their password. A cryptographic hash is also known as a one-way hash, and cannot be "decrypted". For authentication, take the user-entered password, run it through the same hash function, and then compare the result to the hash stored in the database. Provide users with the means to securely re-set their password (choose a new one) in the event that they forget it, rather than attempting to send them the old one.
There are still some issues to be aware of when using cryptographic hashes. If there are collisions (two different passwords hash to the same value), and the thief possesses one of the clear-text passwords by some other means, then they can substitute the known password for the one in the other account. Of course, there's always the possibility that two or more users will pick the same password. For example, in the Gawker Media case, the password data was published on the web, and it was revealed that thousands of users had chosen the same few passwords, with '123456' and 'password' leading the list. If you use a simple hash function, then it seems likely that you will have lots of collisions. This makes several attack vectors, like a dictionary attack, or other brute-force attack much more likely to be successful.
The answer to collisions is random salt. By appending random bits to the password before hashing it, you can make the hashes much more likely to be unique. A common method is to generate a new, random salt value for each user, and to store the salt in an additional column in the password table. Making the salt available to a hacker does not make the hashes more susceptible to a brute-force attack if no two of the salt values are the same. One thing to be careful of is using the username for salt - unless you intend to never let a user change their username.
All hash functions are not created equal. As time goes on, and more and more computing power becomes available at relatively low cost, once-unbreakable hash functions become breakable. For instance, SHA-1, apparently still the most widely used cryptographic hash function, was broken in 2005, and the National Institute of Standards and Technology has recommended that federal agencies stop using it. SHA-2, the recommended replacement for SHA-1, while not yet broken, has already been recognized by NIST as potentially insecure, and there is already an effort under way to develop SHA-3.
One-way, randomly-salted SHA-2 (or eventually SHA-3) hashes should solve your password security issues, but what about other user data? What about email addresses, credit card numbers, and the like? I'd like to suggest that the first line of defense is collecting as little sensitive information as possible. Storing credit card information (where legal) is generally unnecessary and provides the potential attacker with substantial incentive. In short, unless you are a credit card company, you probably shouldn't store credit card information.
Other user information should be treated just as carefully - stored only if absolutely necessary, and stored with encryption as secure as possible if it must be stored. Assume from the start that your database will be compromised, and make it as difficult as possible for the data to be used.
There are many other security issues to be concerned about, and I encourage you to research them, but if you care about your users and your reputation, then the least you can do is stop storing sensitive user information in the clear.