If you’re a developer working on or maintaining a website catering to the general public, chances are you’ve implemented some form of password reset via security question-and-answer into your site. How are you storing the answers to these questions in your database? Are you encrypting them? Storing the (hopefully cryptographic, salted) hashes? Or are you storing them plain text?
I can’t answer for you, but I can tell you that I’ve never used a system that didn’t leave tell-tale signs of storing these answers in plaintext. Here’s the thing – if it’s possible to use these answers to reset a password, then these answers, by extension, are passwords too.
In some ways, answers to password reset questions are more important than the password itself. With the password, an attacker can compromise and gain control of a user’s account. With the answers to security questions, an attacker can compromise a user’s entire online and offline security, steal their identity, and quite-literally ruin their lives. Think about it, these same questions (mother’s maiden name, childhood best friend, street you grew up on, where you were on New Year’s Eve of 2000) are the same questions every site asks you to confirm your identity and reset your password. They’re the questions your telephone banker asks before divulging account info or letting you wire money to an international account. They’re the questions that you’ll be asked when applying for a credit card to prove you’re who you claim to be.
For many, a strictly email-based password reset interface isn’t an option, whether it’s because you need to implement some form of over-the-phone verification when speaking to customer support to perform actions requiring authentication (à la every bank and wireless carrier), as an extra step to prevent against password resets by people with access to your email (à la Facebook, Gmail, etc.), or to bypass email entirely when you can’t guarantee your users (still) have access to their email or are capable of following the oh-so-complicated process of logging in to a separate system in order to reset the password on yours.
Regardless of your reasons for using a security-question-and-answer password reset interface, regardless of whether it’s the only the first or just the only step in the password reset process, answers to these questions are sensitive and they should be treated as such. It doesn’t matter who you are, how good your intentions are, or how “secure” your system is – at some point, you and your customers will wish these answers weren’t stored in plain text in your database. It’s not just about when (never if) your database is compromised – can you guarantee you’ll never have a security breach or an untrustworthy employee?
How do you protect your users’ passwords? It’s 2015, and it’s probably safe to say that even the most oblivious of developers and architects know that storing passwords in the database in plaintext is a horrible idea (which doesn’t mean they won’t still do it, but that’s getting off-topic), and probably even know that storing the MD5 (salted or not) isn’t such a hot idea either (even if they don’t quite know why). If you know what you’re doing, you’re probably storing your passwords in the database with some sort of key-stretching algorithm using PBKDF2, scrypt, bcrypt, or similar. Awesome. But what about these valuable answers to the security questions? Store these answers in the database with [scrypt|bcrypt|pbkdf2] as well.
There really is no way around this. If you love or respect your users at all, you will treat these answers as sacred. You will show your users that you value the fact that they’ve entrusted you with them, and you will never, ever, ever store them as plain-text in the database ever again.
But I need to validate these answers over the phone!
OK, and where’s the problem? Your customer support agents ask the customer “what’s your mother’s maiden name” and then they have to enter it into an input box for validation, whereupon the password hash is calculated and compared to the correct hash. Why are they doing the comparison visually in the first place? Look at it this way – not only are you protecting your users against database compromise, you’ve also solved your problem with attackers using your technical support staff as socially-engineered proxies to do their bidding.
But I don’t want the answers to be case-sensitive!
Here the workarounds and hacks start, but your users’ data is worth it. There is no rule on what you can or can’t do to the input text (so long as you’re careful). Instead of your answer hash, as stored in the database, being
h(answer) it can be
tolower is a unicode-aware, multibyte-safe, culture-invariant transform, of course.
But I need to present the user with a hint!
Getting into dangerous territory here, you should probably ask yourself if you’re sure you want to do that. But assuming you insist, and you want to present your customer with a censored/redacted version of the answer as a “hint,” you can store the redacted hint in the database separately. But really, you shouldn’t.
(e.g. instead of storing “Willowbrook Rd” as-is in the database so you can show a hint “W***o*****k R*” to the user, go ahead and store the literal string “W***o*****k R*” in the database. But don’t.)
But I want to handle misspellings gracefully!
Are you sure you want your customer support agents accepting Smyth instead of Smith? Are you sure? Fine, on your own head be it, but fuzzy-hashing is a thing. I’m not entirely sure on how you can safely use that in conjunction with a key-stretching algorithm for maximum security, but anything beats plaintext.
But no one else cares!
Alas, here we come to the crux of the matter. This post is written largely in vain. It’s likely far too late; chances are, if you’re reading this, the answers to your own questions are already sitting there in a database, just waiting to be leaked and abused. Actually, they probably already have been – attackers have long prioritized username/password data over anything else because it’s the lowest hanging fruit and can be brute-forced to gain access to email and bank accounts, etc. but for dedicated attackers or identity thieves, that information’s already out there just waiting to be abused.
Writing this post feels like an exercise in futility. The number one rule in security is that when something is leaked, it’s leaked. You don’t bother protecting an access key that was accidentally uploaded to a GitHub repository – you nuke it, revoke it, and generate a new one. You can’t unspill a glass of milk any more than you can undivulge a secret – only we don’t have that luxury when dealing with our identity. The Social Security Administration doesn’t believe in issuing new SSNs to replace old ones, and there’s no changing the name of the street you grew up in – not when all the credit bureaus already know and won’t accept anything else as the answer.
Is it honestly too late?