Life in a post-database world: using crypto to avoid DB writes

CryptoPossibly one of the biggest hurdles that stands in the way of fostering innovation and discovering newer and better techniques of doing old things is the ease with which developers and designers today can quickly research and find so-called “best practices.” While a quick Google search for “user table structure” or “best way to design password reset” can reduce (but never extinguish!) outlandish practices and horrific mistakes, it does nothing to encourage developers to think outside the box, and results in the perpetuation of less-than-optimal approaches.

To that end, there’s one thing in particular that virtually all documented approaches get wrong, and that’s writing to the database when you should be using modern cryptography instead. It might sound like a bit of a non-sequitur — after all, what does storing information have to do with cryptography when one usually exists only to supplement the other? Which is exactly right. Too often, you’ll find software writing to the database not because it needs to store something, but because it needs to guarantee something. Which is what cryptography is for.

Without searching for contrived examples, here we’ll present a few real-world cases – ones that might likely even be in your codebase as we speak. Switching from reading and writing from/to the database to using modern crypto to validate information not stored in the database will fix code that’s more often than not hard to scale, impossible to maintain, and performs poorly to boot.

Case Study 1: Resetting user passwords

Let’s say you have a clean and simple database table for your application’s users:

[userId|email|bcryptHash]

Then you decide to add functionality to implement password resets, in the event that users forgot their password. Users use a form to provide their email address, and if that email corresponds to a known userid, an email is sent with the link to reset the password. You can’t just send them a link to http://myapp.com/resetPassword?userId=johnnysmith because obviously you need to validate that the request came from the email you sent – verifying that the link was one you generated. So you end up with another table like this (because normalization!):

[resetToken|userId|expirationTime]

You’re generating a reset token when the user requests a password and sending it to them via email. And to head off people that will try to reuse the token, you put in an expiration time, and of course you also delete this record in case when the user logs in, when the password is reset, and so on and so forth – meaning you need read/write access to the database in all of these cases, and code to check/delete existing records to prevent attackers from using a password reset token if the target user didn’t, or if they already did but before the expirationTime was reached, etc. In other words: a maintenance nightmare, filled with security gotchas, difficult to scale, and hard to write.

What you should be doing is much simpler, faster, and more secure. Don’t add another database table, don’t write information you don’t actually care about to the database (basically, anything you’ll be deleting soon!), and don’t rely on recording and comparing information to check if it’s been tampered with. Instead, use a cryptographic hash to generate a secure URL the user can use to reset their password. Without delving into the implementation details, this is what the end result would look like:

http://myapp.com/resetPassword?userId=johnnysmith&expirationTime=1356156000&token=%SECURITYHASH%

It’s easy enough to see how we can use a cryptographic hash like HMAC256 to generate a token we can use to verify that the request to reset Johnny’s password did indeed generate from us and will expire on Dec. 22, 2012 (bonus points to anyone that gets the reference). But how can we do the additional checks such as verifying the password reset URL isn’t reused multiple times before it’s expiration date and preventing its use if Johnny remembers his password and changes it himself?

Answer: include any fixed data which you want to check against in the hash itself.

In this case, the %SECURITYTOKEN% in the link above could be formulated like this:

HMAC256("userId=johnnysmith&expirationTime=1356156000&oldBcryptHash=$oldBcryptHash&clientIpAddress=$clientIpAddress", $mySuperSecretKey)

Then when the user request comes in, simply concatenate the current values of parameters like oldBcryptHash and clientIpAddress as retrieved from the database to the information they provided in the URL itself (the userId and the expirationTime, which are dynamic and will need to be provided by the user), and calculate the HMAC for a “valid” request. If at any point since the token was generated the user changed their password, switched computers, etc. the request would simply not authenticate. You can “store” as much or as little info as you like in the hash, without needing the client to play it back to you, without having to record anything in the database, and without needing to worry about clearing db tables for certain events.

Case Study 2: Creating new accounts that require email activation

Raise your hand if you absolutely require users to validate their emails before creating a full account on your site, but still create the user record in the database and mark it as “unverified” when they provide their info, then cross your fingers and pray your user comes back with a validated email address token to finish creating their account and set them on their merry way.

There’s a better way. There always is.

When a user creates an account on your site, you can send them a link to validate their account info without having to store it unvalidated in your DB just yet.

http://myapp.com/createAccount?email=johnnysmith@gmail.com&token=%SECURITYHASH%

Now you don’t have to worry about clearing old bitrot from the database or worrying about when to expire non-verified accounts. In this event, %SECURITYHASH% would be the HMAC of just the secret key and the email, but it could be extended to contain other data such as the requested username, an expiration time, etc. though you’ll need to pass some of that info back and forth in the validation link.

The keen-eyed will notice that we’ve avoided shuttling too much data back and forth in URLs given to users by making the validation step the first step, but only out of a desire to generate cleaner URLs. In this particular case, when the user returns to the site we can collect the remaining information, such as the password, their second aunt three-times-removed’s maiden name, and the middle initial of their first crush’s current wife. We could just as easily have collected all this upfront and included it in the validation URL in the email (as an encrypted query string, of course). If you’re sending RTF or HTML emails with pretty links, it’s definitely not a bad option, though users using plaintext email clients will probably curse you until they’re blue in the face at the sight of a twenty-line URL. And the AOL users.

One thing to keep in mind: because we didn’t write in the database, if you’re using usernames independent of emails and you collect all information up-front, multiple persons can request the same username and you won’t know it’s under contention until two or more of them get back to you with the validated URLs. Make sure to check availability once they’ve returned to your site – this new paradigm will take a bit of getting used to!

Case Study 3: One-time-use and expiring resources

Traditionally, many webapps generated one-time-use and expiring content by creating records in a table with a unique id and, optionally, an expiration date. For example, links to paid downloads, access tokens giving permissions to carry out an action (backend admin tasks, posting to an account, etc), and storing login sessions. This one is fortunately something a lot of developers have already embraced HMAC signatures for, primarily thanks to AWS/S3 and its heavy use of HMAC for signed requests.

Instead of creating a table in the database and generated unique IDs for your users to use once, validating requests against its contents, and removing them when they’ve been used or expired, signed URLs that make it possible to verify that the request was created/signed-off-on by you, without needing to record proof of it anywhere. That’s the point of HMAC signatures: the fact that the client knows the right signature is proof that you gave it to them.

Conclusions

Signed requests can completely transform the way you write code and change absolutely everything about the way your application behaves and runs. Signed requests are easier to maintain since the proof is, as they say, in the pudding – meaning you don’t need to keep your database schema in-step with your code and you don’t need to write down every little detail you don’t actually care about just to make sure it initiated from you in the first place. It frees up your code from having to access the database for every little nitty-gritty, and you can even run simpler web applications entirely without a database.

There is one really important caveat in all this: once the data’s signed and out of your hands, it’s really out of your hands. If you make a mistake, there’s no way of taking it back unless you reset your security token! You can’t just go in the database and delete a single bad record (which you should never be doing anyway, everything should be systematic!), you’ll need to revoke all or nothing. You could also hash a version number with your security token (you don’t need to include it in the URL given to the client), but that’s ultimately the same thing, cryptographically speaking.

One last thing: this is all operating under the assumption that this is data you don’t want. If you want to record users that registered but never validated their email and went through with creating their account so you can bug them again to do so at a later date (only you don’t really know if they provided a real email, do you?) or if you want to be able to see (for some crazy reason) the list of pending password resets for your users’ accounts, you’ll obviously need to store these in the database. But then why are you reading this article anyway?

cURL error: Failed to connect to localhost port 7700 after 0 ms: Couldn't connect to serverpost 3247 not indexed

35 thoughts on “Life in a post-database world: using crypto to avoid DB writes

  1. Amazing article. I never considered the points presented here, I’ve always done the lame database option but from not on I’ll be using crypto for all this stuff. Great read.

  2. in your example would it be easy to guess since only source of randomness is expiration date.

    so H(userId + expirationTime) then app.com/reset?id=userId&hash=H

    and if you’re using key based hashing. Then the moment key is lost wouldn’t it be possible to reset passwords of all users by randomly guessing expiration time then using key to generate hashes.

    BTW isn’t this sort how cookie session work too

  3. You’re correct that you don’t need to be storing the info in the DB.

    However you missed a key business reason: all that data you are talking about as ‘bitrot’ is exactly the data you need to capture and measure to judge how you’re performing, and improve your business.

    eg.
    Changed the confirmation screens and emails – did more people click the link and confirm this month, vs last month on the old screen?

  4. A question: you say “include any fixed data which you want to check against in the hash itself.”, to safeguard against someone or user reusing same url to reset password over and over.

    But i am unclear how is that working. How adding oldHash into equation helping that.

  5. @Umer:

    Including the oldHash in the HMAC hash allows you to use the currently set hash in the database as part of your hash validation. If the password was changed AFTER the HMAC has was generated, then the new HMAC has will not match the original HMAC hash…because oldHash has changed..and thus you prevent the user from doing whatever it is that they were trying to do.

  6. One important thing to note is that the “cryptographic hash” mentioned in this article is actually a message authentication code[1] – it’s technically fine to refer to it as a *keyed* hash, but you have to be careful about keeping the concepts separate.

    SHA-256 is a “cryptographic hash” but it is vulnerable to length extension attacks[2]. If you cook up your own way to implement the recommendations from this article, you may introduce mistakes that have bitten other sites, like Flickr[3].

    In addition, you should make sure to encode/serialize data into the string-to-be-hashed carefully.
    Otherwise, you may introduce an attack that runs field into each other (e.g. if a user provided string is “blah&userName=joe”, you usually want to make sure it doesn’t get serialized as “userProvidedString=blah&userName=joe”).

    [1] https://en.wikipedia.org/wiki/Message_authentication_code
    [2] https://en.wikipedia.org/wiki/Length_extension_attack
    [3] http://netifera.com/research/flickr_api_signature_forgery.pdf

  7. Did I miss something – you keep mentioning one time use links, but never show how to implement them in this paradigm? My quick assessment is that one time use links cannot be made without a bit in a database.

  8. Huh, I was doing a similar thing in early 2000s 🙂 But URLs do really become ugly for the user.

  9. @Umer HMAC requires a private key, so you can guess all you want, but the point of HMAC is that you cannot fake anything.

    @Brad That is a valid concern, but I’d say it’s just as good to have 3 counters. One for people who requested an account, one for people who made one, one for people who closed their account. If you need more information than that, consider a more BI like structure and caputure what you need – just don’t let it convolute your actual core business logic.

    @David That is the point. You cannot create a security token without knowing the private key. So any user presenting you a valid token, means you gave it to them. If you include things like the old password hash, once they upgrade the password they will have a new hash, and by definition the old token is no longer valid because one of the inputs (old password hash) changed.

  10. @David “My quick assessment is that one time use links cannot be made without a bit in a database.”

    That’s right, but the point is that the “bit in the database” doesn’t need to be a specially-created token; in particular, if the one-time-use link has some observable effect, we can use that observation as our “bit in the database”.

    In the password-reset example, performing the reset alters the stored hash, which we can observe; to prevent multiple uses, we just check the hash as part of the verification.

    Another way of looking at it is to consider the hash-checking as making the handler idempotent; running the handler 100 times gives the same result as running it once (since the hash check fails after the first usage). Rather than having a single-use link, we have a multiple-use link which doesn’t do anything after the first use.

  11. One thing that this does that matching on a random token in a database table does not, is potentially expose more internal information. For example, in case study one, I now see the users internal id number whereas giving me a simple token gives me nothing except some short-lived random flag. This is taking Defense in Depth rather far, but if your approach were taken up in a wider range of scenarios there are cases where one might be sharing internal information one otherwise wouldn’t for the sake of being able to sign it.

    An obvious refinement would be to encrypt instead of merely sign. That gives you more flexibility in what you’re passing out to the client in order to avoid committing information to the database. Something less applicable to simple examples like this and more applicable to complex scenarios, but worth having as an option.

    Obviously the URLs become a lot more unwieldy but that’s what HTML emails are for. 😉

    Nice article.

  12. I really disagree that this is a good idea. Using crypto in this way, without a good reason, potentially causes more security problems than it solves.

    One possible problem is that you will need a “master key” – some kind of secret IV for the hash or something. If somsone obtains this (possibly a developer or operations engineer), they can now reset anyone’s password forever, unless you build-in a mechanism to change this master key, which involves significant engineering.

    For example, in a distributed infrastructure which must tolerate partitions, it may be convenient to hard-code this secret key for all time – but if it ever escapes, you now have no mechanism to change it.

    This is not the only problem, but one which immediately comes to mind. I have worked on such a system.

  13. @Mark Robson An employee with access to this key, will have access to the database as well (or rather, if you have access to it, I find it hard to believe that you do not have access to the production database), so the point is rather moot. Nothing prevents you from storing the key in a secure key store, and hand it out to partititions as they need it. Or if you require really tight security, use a central authority to do the generation of security tokens for you.

  14. I agree, in normal case even if someone got read only access to database they cant change password becuase user get unique url in his email or figure out password without significant trouble. But in this case all someone has to do is read the secret key, old hash and userid then some guessing with expiration date and password can be changed

  15. @umer That is true, which is why you don’t want to let people know your key. This really isn’t much different from making sure people don’t have write access to you database. It’s a hardening issue.

  16. So, I’m no lawyer, but it seems to me that with a couple thousands email reset requests (which you wouldn’t have a way of tracking, and therefore wouldn’t find at all suspicious), I’d have more than enough information to crunch out your secret key in a couple hundred hours’ computation time, because I have all necessary data, including my own “current” password, which is the only other possibly secret piece of information there’s in your HMAC generation.

  17. I don’t want to be pedantic, but there is some terminology misuse going on surrounding HMAC. HMAC is not a hash algorithm nor a signature algorithm.

    HMAC is a Message Authentication Code (MAC) algorithm, using a hash function internally (as opposed to a block cipher, which yields CMAC).

    The distinction between a hash and a MAC are that a hash has no keyed material; SHA-256 works the same for everyone because that is what it does, by definition. In contrast, a MAC requires both parties to have the same key data.

    In terms of an HMAC “signature”, that is also incorrect. The term “signature” implies an asymmetric cipher, where the person possessing the private key can sign messages and anyone with the public key can verify signatures. However, HMAC requires both parties to have the same key; it is symmetric, rather than asymmetric.

  18. This is a really interesting post! I like it.

    One thing I would caution you about though, is that you hash this: `HMAC256(“userId=johnnysmith&expirationTime=1356156000&oldBcryptHash=$oldBcryptHash&clientIpAddress=$clientIpAddress”, $mySuperSecretKey)`

    It might be safer to hash some serialization of that data, because technically there are more than a single hash of the url parameters, as the order doesn’t matter etc. It may not seem like a big deal, but what if you got into some situation where there were duplicate parameters and your hashing tool used/verified the first, but the thing for the db uses the last? In that situation someone might have gone to `http://myapp.com/resetPassword?userId=johnnysmith&expirationTime=1356156000&token=%SECURITYHASH%&userid=admin`

    The chances might be slim, but it’s worth considering. Either way, good post!

  19. @Tomas: “I’d have more than enough information to crunch out your secret key in a couple hundred hours’ computation time, because I have all necessary data, including my own “current” password, which is the only other possibly secret piece of information there’s in your HMAC generation.”

    You have your current password – you *don’t* have the old bcrypt hash, which includes a per-password IV that you can’t guess.

    If you *somehow* managed to guess that, you’d have to solve for `token=H((K xor opad) | H((K xor ipad) | message))`. You’re trying to find some x such that token=H(x) — but your hash is SHA-256; the best attack you can make as of today is a collision attack which is of complexity 2^128. “A few hundred hours” is risible. Think “a few billion years”.

    Having said all that this is still a terrible idea because it’s extremely easy to screw up and difficult to audit. But it’s not a bad idea because of what you said.

  20. Wouldn’t it be better if you didn’t store a password hash at all? Have login work like SSH. At signup, the client produces a client side certificate and the server stores the client’s certificate file. Reused, this solves single sign on without a central database of passwords for the NSA^w^w hackers to breach.

  21. @nobody: client-side TLS certificates are probably the best idea, agreed. The problem is chicken/egg – browsers bury the feature in baroque interfaces behind advanced setting tabs, so servers don’t support it. Servers don’t support it well, and it doesn’t scale too well, so there’s no impetus for clients to support it.

    I’d really like to see lightweight Ed25519 exchange. Oh well.

  22. Your suggestions are mostly correct and some of us are already doing this. But:

    Do not reuse your secret key in this way! Derive it for every user separately as hmac(login|expiration, key). Next, include a piece of random data in every request. Always include expiration.

  23. awesome!!
    I am working on authentication for my site right now and this really changed my approach. Thanks a lot man.

  24. > $clientIpAddress

    Are you serious? LOL
    Welcome to the world with mobile devices, VPNs, mobile networks and.. moving users. Yes, they can move (for example: from work to home) and IP will be changed while they are closing office door.

    And if somebody will get your secretKey, all users will lose their data in seconds.

  25. Evgeniy OZ, you should publish your break of SHA256 that recovers hash(x) -> x in “hundreds of hours”. The cryptographic community would be INCREDIBLY interested to see how you manage it.

    (you’re as wrong as he is!)

  26. @nobody Won’t using ssh certificate bond user to one device and mobility would completely suffer. For example, If you go to library that means you can’t login. Or if you want to then you need to install certificate which means you have to delete it too when you leave.

  27. Am I the only one that thought the $clientIpAddress stuff was just for the sake of illustrating how you can add unlimited checks to the password reset URL without exposing them to the user (by including them in the server-side hash)? I didn’t get the impression the author meant to specifically do the IP checks in particular. Anyway, it’s orthogonal to the point, that’s an implementation detail and does not add or detract at all from the idea of using HMAC to reset passwords.

  28. I’d second @David. It doesn’t seem one-time-use link from the case study 3 can be generalized without having a persistent state on the database.

    With the password reset example, oldBcryptHash was used to carry the pre-use state, but not all one-time-use link is used to change state of an existing record.

    Great article nonetheless, thanks!

    I also find it amusing that people make cynical comments when they’re just nit-picking on their own (even pointless) business assumptions. Probably thought provoking articles make wannabes feel envious and want to look smarter than they are. 😉

  29. I’ve been reading and re-reading this post since it was posted because I am beyond certain there is a way to accomplish true one-time use tokens with this approach and a bit of math and it is at the very tip of my tongue but I can not figure it out.

    It seems like there should be a simple trick.

  30. The ultimate goal of completely getting rid of persistence of such events can bring along hidden risks. Take for instance password reset: if server does not persist anything at issuing tokenised link it has to trust the signature of submitted request. Regardless of the signature algorithm (regardless SHA-256 or SHA-256-MAC or RSA/DSA based) the security of the operation becomes dependent solely on the secrecy of the signature generation key (i.e mySuperSecretKey above). This is not trivial even for simple web apps. Doing it in scalable manner is more complicated. And thinking of potential impact of compromise of mySuperSecretKey: attacker will be able to reset passwords for all of your clients!

    I think there is sensible compromise between simplifying architecture and code complexity and using crypto by having some persistence.

    P.S And yes – you can change MySuperSecretKey when it gets compromised. But in real life server compromises will often get unnoticed for long time. And last 2 years have shown that almost every major online service provider has been compromised.

  31. Interesting, thanks for sharing.

    Small typo in case study one; a few lines below

    “http://myapp.com/resetPassword?userId=johnnysmith&expirationTime=1356156000&token=%SECURITYHASH%”

    it says

    “In this case, the %SECURITYTOKEN% in the link above could be formulated like this:”

    %SECURITYHASH% was ment.

Leave a Reply

Your email address will not be published. Required fields are marked *