Possibly one of the biggest hurdles that stands in the way of fostering innovation and discovering newer and better techniques of doing old things is the ease with which developers and designers today can quickly research and find so-called “best practices.” While a quick Google search for “user table structure” or “best way to design password reset” can reduce (but never extinguish!) outlandish practices and horrific mistakes, it does nothing to encourage developers to think outside the box, and results in the perpetuation of less-than-optimal approaches.
To that end, there’s one thing in particular that virtually all documented approaches get wrong, and that’s writing to the database when you should be using modern cryptography instead. It might sound like a bit of a non-sequitur — after all, what does storing information have to do with cryptography when one usually exists only to supplement the other? Which is exactly right. Too often, you’ll find software writing to the database not because it needs to store something, but because it needs to guarantee something. Which is what cryptography is for.
Without searching for contrived examples, here we’ll present a few real-world cases – ones that might likely even be in your codebase as we speak. Switching from reading and writing from/to the database to using modern crypto to validate information not stored in the database will fix code that’s more often than not hard to scale, impossible to maintain, and performs poorly to boot.
Case Study 1: Resetting user passwords
Let’s say you have a clean and simple database table for your application’s users:
Then you decide to add functionality to implement password resets, in the event that users forgot their password. Users use a form to provide their email address, and if that email corresponds to a known userid, an email is sent with the link to reset the password. You can’t just send them a link to
http://myapp.com/resetPassword?userId=johnnysmith because obviously you need to validate that the request came from the email you sent – verifying that the link was one you generated. So you end up with another table like this (because normalization!):
You’re generating a reset token when the user requests a password and sending it to them via email. And to head off people that will try to reuse the token, you put in an expiration time, and of course you also delete this record in case when the user logs in, when the password is reset, and so on and so forth – meaning you need read/write access to the database in all of these cases, and code to check/delete existing records to prevent attackers from using a password reset token if the target user didn’t, or if they already did but before the
expirationTime was reached, etc. In other words: a maintenance nightmare, filled with security gotchas, difficult to scale, and hard to write.
What you should be doing is much simpler, faster, and more secure. Don’t add another database table, don’t write information you don’t actually care about to the database (basically, anything you’ll be deleting soon!), and don’t rely on recording and comparing information to check if it’s been tampered with. Instead, use a cryptographic hash to generate a secure URL the user can use to reset their password. Without delving into the implementation details, this is what the end result would look like:
It’s easy enough to see how we can use a cryptographic hash like HMAC256 to generate a token we can use to verify that the request to reset Johnny’s password did indeed generate from us and will expire on Dec. 22, 2012 (bonus points to anyone that gets the reference). But how can we do the additional checks such as verifying the password reset URL isn’t reused multiple times before it’s expiration date and preventing its use if Johnny remembers his password and changes it himself?
Answer: include any fixed data which you want to check against in the hash itself.
In this case, the
%SECURITYTOKEN% in the link above could be formulated like this:
Then when the user request comes in, simply concatenate the current values of parameters like
clientIpAddress as retrieved from the database to the information they provided in the URL itself (the userId and the
expirationTime, which are dynamic and will need to be provided by the user), and calculate the HMAC for a “valid” request. If at any point since the token was generated the user changed their password, switched computers, etc. the request would simply not authenticate. You can “store” as much or as little info as you like in the hash, without needing the client to play it back to you, without having to record anything in the database, and without needing to worry about clearing db tables for certain events.
Case Study 2: Creating new accounts that require email activation
Raise your hand if you absolutely require users to validate their emails before creating a full account on your site, but still create the user record in the database and mark it as “unverified” when they provide their info, then cross your fingers and pray your user comes back with a validated email address token to finish creating their account and set them on their merry way.
There’s a better way. There always is.
When a user creates an account on your site, you can send them a link to validate their account info without having to store it unvalidated in your DB just yet.
Now you don’t have to worry about clearing old bitrot from the database or worrying about when to expire non-verified accounts. In this event,
%SECURITYHASH% would be the HMAC of just the secret key and the email, but it could be extended to contain other data such as the requested username, an expiration time, etc. though you’ll need to pass some of that info back and forth in the validation link.
The keen-eyed will notice that we’ve avoided shuttling too much data back and forth in URLs given to users by making the validation step the first step, but only out of a desire to generate cleaner URLs. In this particular case, when the user returns to the site we can collect the remaining information, such as the password, their second aunt three-times-removed’s maiden name, and the middle initial of their first crush’s current wife. We could just as easily have collected all this upfront and included it in the validation URL in the email (as an encrypted query string, of course). If you’re sending RTF or HTML emails with pretty links, it’s definitely not a bad option, though users using plaintext email clients will probably curse you until they’re blue in the face at the sight of a twenty-line URL. And the AOL users.
One thing to keep in mind: because we didn’t write in the database, if you’re using usernames independent of emails and you collect all information up-front, multiple persons can request the same username and you won’t know it’s under contention until two or more of them get back to you with the validated URLs. Make sure to check availability once they’ve returned to your site – this new paradigm will take a bit of getting used to!
Case Study 3: One-time-use and expiring resources
Traditionally, many webapps generated one-time-use and expiring content by creating records in a table with a unique id and, optionally, an expiration date. For example, links to paid downloads, access tokens giving permissions to carry out an action (backend admin tasks, posting to an account, etc), and storing login sessions. This one is fortunately something a lot of developers have already embraced HMAC signatures for, primarily thanks to AWS/S3 and its heavy use of HMAC for signed requests.
Instead of creating a table in the database and generated unique IDs for your users to use once, validating requests against its contents, and removing them when they’ve been used or expired, signed URLs that make it possible to verify that the request was created/signed-off-on by you, without needing to record proof of it anywhere. That’s the point of HMAC signatures: the fact that the client knows the right signature is proof that you gave it to them.
Signed requests can completely transform the way you write code and change absolutely everything about the way your application behaves and runs. Signed requests are easier to maintain since the proof is, as they say, in the pudding – meaning you don’t need to keep your database schema in-step with your code and you don’t need to write down every little detail you don’t actually care about just to make sure it initiated from you in the first place. It frees up your code from having to access the database for every little nitty-gritty, and you can even run simpler web applications entirely without a database.
There is one really important caveat in all this: once the data’s signed and out of your hands, it’s really out of your hands. If you make a mistake, there’s no way of taking it back unless you reset your security token! You can’t just go in the database and delete a single bad record (which you should never be doing anyway, everything should be systematic!), you’ll need to revoke all or nothing. You could also hash a version number with your security token (you don’t need to include it in the URL given to the client), but that’s ultimately the same thing, cryptographically speaking.
One last thing: this is all operating under the assumption that this is data you don’t want. If you want to record users that registered but never validated their email and went through with creating their account so you can bug them again to do so at a later date (only you don’t really know if they provided a real email, do you?) or if you want to be able to see (for some crazy reason) the list of pending password resets for your users’ accounts, you’ll obviously need to store these in the database. But then why are you reading this article anyway?