User Account Security Using Password-Based KDF’s

User Account Security Using Password-Based KDF’s

So you landed on this page because you are looking for a way to implement user account security in an online system.  There are different ways to implement security in a system based on many different factors.  When implementing security for your user accounts, we give lots of thought to security in relation to a non-compromised system, where attackers are still trying to penetrate and gain access to account data.  How about security playing a role to systems that have been compromised?  Where the attacker has gained access to resources such as the database and source code?  What can the security we put in place now, help with then?

We first will be talking about some of the core components of a Password Based Key Derivation Function, types of attacks that online and compromised databases might undergo and finally what solutions we can implement to help secure our user accounts and impede attacks mentioned here. In this article we are going to cover:

  1. What is hashing
  2. What is Salting and is its purpose
  3. What are some types of attacks we need to secure against
  4. User account security using a Password-Based Key Derivation Function like PBKDF2 and Bcrypt

 

So let’s get started….

[headline tag=”div” css_class=”h1″ color=”color2″]Hashing [/headline]

Hashing is the act of mapping data of arbitrary length to data of a fixed length through the use of an algorithm called a hash function. This is also coined a One-way hash function as it is generally designed not to be able to derive the original value from the hash value (the output from the function).

Kerckhoffs’s principle tells us that we should assume that the enemy knows the system but that our security measures should still secure classified information. Keeping that in mind, a number of our security measures will be under the assumption that an attacker has garnished access directly to our database, source code and other attributes of our security. This article is about user account security in a system and for the sake of the user account based system that we are talking about, the user’s password is the user’s key to their account.

Therefore, as mentioned earlier the user account security we are implementing also pertains to a compromised system where an attacker has gained access to the database.  Storing the passwords in plain text in the database would allow instant access to any user’s account, possibly including the “Key’s to the Kingdom” in the sense of the site admin account.  This is where the idea of hashing comes into play in the use of hashing the user’s password and storing it in place of their actual password.   As a reminder, we haven’t discussed salting the password or the use of a password-based Key Derivation Function yet, as we will get to those shortly.

There is a number of existing hash algorithms that have been around for a long time, some of which are no longer recommended due to discovered vulnerabilities even years after their publication.  In addition, we can also go as far as saying that it would not be recommended practice to use even the most secure algorithms listed here exclusively – We’ll be explaining shortly.  Some examples are: [list type=”circle-empty” color=”color1″]

  • SHA3
  • SHA2
  • RipeMD
  • WHIRLPOOL
  • SHA1
  • MD5
[/list] [section_headline tag=”h2″ lined=”yes” color=”color2″]So What Are We Hashing? [/section_headline]

So assuming all necessities were in place and our system was generating and saving hashes of a user’s password in the database in place of their actual password, what would those hashes look like?  Let’s see some examples: [table align=”center” striped=”yes”]

Password

Hash Value

Algorithm

Password1 19513fdc9da4fb72a4a05eb66917548d3c90ff94d5419e1f2363eea89dfee1dd SHA256
Password2 1be0222750aaf3889ab95b5d593ba12e4ff1046474702d6b4779f4b527305b23 SHA256
[/table]

Here we have used the SHA-256 hash function to construct a 32-byte hash.  Notice how in the second row we have a complete different Hash Value, but the actual password value has only changed by a single value.  This is one of the significant advantages of the more secure algorithms in that a single change can invoke a complete different output from the algorithm.  This can help towards avoiding collisions.

[section_headline tag=”h2″ lined=”yes” color=”color2″]So Why are we Hashing[/section_headline]

At this point we have only masked the actual value of the user’s password, which means any attacker will actually have to work for what they have come for (as opposed to simply seeing the actual password in the database).  However, hashing the password in no way guarantees the security of a user’s password. So what is the work the attacker will have to perform on a database that has implemented hashed passwords?  Let’s look at some common ways that attackers use to gain access through a system that has implemented a hashed system.

Brute ForceIs the attempt to try every possible combination of letters and characters (up to a certain length) until a successful guess is made.  These are generally the least efficient way to find the correct guess and are very computationally expensive, but this attack will always find the correct answer.

Dictionary Attacks: A dictionary attack is a technique used to simply supply every possible guess (i.e. word or phrase) against a system until the correct guess is successful.  These guesses are generally assembled from large bodies of text.

Lookup Tables: These tables work very similar to dictionary attacks with the exception that they contain the computed hashes for different text.  Therefore, instead of having to compute a hash and then compare, a hash value can be compared to stored hashes in a lookup table for immediate identification.  To make clear, the point about lookup tables is they work because data will always have the same hash value when hashed by the same hash algorithm.

Reverse Lookup Table:  Reverse lookup tables store a selected number of hashes, that when reversed could generate longer chains of passwords.  This is also a trade off of computational time for a larger set of stored values.

Rainbow Tables:  The main difference between a Rainbow Table and a Lookup Table is that a Rainbow Table sacrifices hashing speed for space. This allows the storage of more hashes to be available for comparison at the same or smaller space.  Hash chains is a formula that was generated through the use of combining a hash function and reduction functions in order to store small amounts of data with the trade of of computation time.  However, hash chains also suffer from what is called a hash chain collision, when two or more hash chains produce the same output.   Rainbow tables were design to help resolve this problem.

Timing Attack: Constructing a valid hash in the system based on the time it takes to return a negative response that can then be cracked offline by an attacker.

So what can we do in regards to security to impede or thwart any of these attacks against our system?  In some cases we can completely nullify the attack, but in others, there is no complete protection, only the ability to impede and make the type of attack harder.  This is where one technique known as salting comes into play.

[headline tag=”div” css_class=”h1″ color=”color2″]Password Salting[/headline]

Remember when we said that a core point of a hash algorithm changes the hash value entirely when a single value has been changed in the data being hashed?  This is where the act of salting the password before being hashed comes into play.

The act of salting is the additive of an addition value to the original data being hashed.  (i.e. “MyPassword” with salt “38472” becomes “MyPassword38472”).   The salting of the password completely changes the hash value.  But this is obvious because we aren’t hashing the same value if we have prefixed or appended a salt to the password.  There is a little more than just adding additional data to a password before hashing.

In relation to user accounts in a system, there are 2 major steps we want to take in regards to salting a user’s password:

  1. [mark color=””]We want the salt to be different for each user[/mark]
  2. [mark color=””]The salt needs to be unique[/mark]

 

Let’s analyze these points:

In order to ensure that Sally’s password “p1tt5burg” and Fred’s password “p1tt5burg” are not the same hashed value in the database, we want to prefix or append a unique salt to their password.  This will render a different hash value in the database.

But wait! Why wouldn’t we want the values to be the same, their hashed right?  If one is cracked, then we have also cracked the other.

We’re all probably familiar with a particular language’s random number generator function.  Generally speaking, functions like these don’t provide the necessary levels of guaranteed uniqueness that functions derived for the use of uniqueness in cryptography provide.   These functions called Cryptographically Secure Pseudo-Random Number Generator (CSPRNG) ensure the necessary level of uniqueness needed by combining a high level of entropy and random source.   We can help with the uniqueness of a salt by ensuring it is long enough to avoid the pigeonhole principle.

These steps will render ineffective, the use of a Lookup, Rainbow and Reverse tables.  Remember, these tables are pre-computed and the attacker will not be able to pre-compute the hashes for passwords that contain unknown individually randomized added salts.  What is a long enough salt?  Again, this can come down to the hash algorithm being used, but some stick to a static size, or will make the salt size the same size as the hash value.   A common recommendation is 80-192 bits for a salt.

[headline tag=”div” css_class=”h1″ color=”color2″]But Here is the Kicker[/headline]

A great portion if not all of the above is completely mute due to the simple fact of hardware computing power that an attacker can get their hands on today.  Hardware today will allow attackers to brute force against any of the available hashing algorithms mentioned above with the applied techniques.  For all intents and purposes, today’s hardware computing capabilities have rendered the need or use of lookup and rainbow tables useless.  Remember we said brute force attacks will always render a successful guess eventually.   There’s basically only one true option you have to directly combat the computing capabilities that are available.  And what is that option? Time.

This brings up what we’re here to discuss and that is password-based key derivation functions.  But first, Key Derivation Functions (KDF) was originally designed for creating a secret key for encryption such as a multi-party key-agreement protocol.  Password Based Key Derivation Functions (PBKDF) were designed to take advantage of the KDF’s and implement a process called key stretching with the use of a iteration count.  

Key stretching is the action of deliberately making the key derivation function slow in order to impede attacks such as brute force attacks.  It does this by iterating through the hashing algorithm however the number of times was provided by the iteration count. There are a number of different recommendations on what that iteration count should be set to which is talked more about under “Things to Be Aware Of”.  But as an example, if the iteration count was set to 64,000, whatever the amount of time it takes to hash a password one time would be multiplied by 64,000 theoretically speaking.  This time constraint would also apply to the attacker in their attempt to crack this single hash as they will need to run through the hashing of the password the same number of times.

I say theoretically as resource requirements for some PBKDF are low and allow for parallel attacks to be formed by attackers.  Some PBKDF’s have attempted to offset this by raising their hardware requirements.  You can read more about it here for a more in-depth reading.

This finally leads us to the core point of this whole article and that is the use of tools such as PBKDF2 and Bcrypt.  They both use secure hashing algorithms, salts and configurable byte length for the hash value.  But at the core, they provide the key stretching technique that deliberately slows the time it takes to hash a password.  Ultimately, this directly affects the time it takes the attacker to crack the hash.

So when we talk about brute force and dictionary attacks you also have to appreciate how fast today’s hardware is in relation to the number of guesses an attacker can make in relatively short time. There is a slew of articles out there  like this one from ARS Technica or Coda Hale article here that gives clarity to just how many brute force passwords can be cracked in relatively no time.

Therefore, PBKDF2 and Bcrypt can offset the hardware’s processing speed, seriously impeding the attacker’s ability to acquire the correct guess.

[section_headline tag=”h2″ lined=”yes” color=”color2″]Things to Be Aware Of [/section_headline]

There are some very important factors that need to be understood when implementing functions like PBKDF2 and Bcrypt or other key stretching tools.  By nature of design, they lend themselves to Denial of Services (DoS) attacks.  Not too long ago Django experience an exploit of this exact nature in their framework that you can read about here where an attacker can submit a password large in content size, causing the system to take extreme long periods of time to compute (i.e. 1mb would require 1 min of processing time before results would be returned). In Django’s case, it was a matter of not having a size limitation on their passwords.  It’s an interesting read and very insightful of what you need to be aware of when designing and implementing your security measures.

Therefore, depending on your system and its uses, determining the “iteration count” comes down to the computational resources you have available, tuning to the longest time possible without negatively affecting the end user as well as other possible limitations such as password size limit and number of possible attempts.  This is not an exhaustive list by any means and you will need to do a thorough investigation.

[section_headline tag=”h2″ lined=”yes” color=”color2″]Using Challenge-Responses [/section_headline]

Many systems will implement challenge-response mechanisms such as CAPTCHA’s and similar puzzles to offset the DoS vulnerability of Key Stretching.  However, this also lends to usually a more negative user experience.  Some systems even go as far as determining whether it’s under a DoS attack and dynamically implement a challenge-response at those times only.  But we’re not going to go into much detail on finding that perfect balance or what is best for your system.

[headline tag=”div” css_class=”h1″ color=”color2″]Timed Attacks [/headline]

When an attacker does not have access to a database and therefore doesn’t have the hashed password they can use a timing attack that will garnish the hash of the password that they can then crack offline with their own hardware.

Going back to Kerckhoffs’s principle assuming the attacker has some information, hash algorithm being used and the salt, but doesn’t have access to the database, they can gather 256 passwords that when hashed, each hash’s first character would contain 1 of the 256 unique characters.   Submitting each password, they could determine which contained the correct first character by which one took the longest.  They could continue this same process with the second, third, fourth and so on, until they have constructed a hash of a password.   From this point, they could then take the cracking of the hash offline with their own hardware. The reason this is possible is because of how strings are generally compared where we might return a response back immediately if the length is not the same or upon the first byte that does not match.

However, if we ensure that we step through all possible characters before returning a response, we vanquish the opportunity for garnishing this information.  I have read allot of suggestions for how to compare the hashes to minimize the ability of the attacker to garnish any information based on the time it takes to compare.  Some of these suggestions have been around ensuring that the comparison will run the length of the provided guess of the attacker.  But remember, the definition of a hash is mapping arbitrary data to a fixed length output.  In short, unless you are choosing different output lengths per password hashing (as in the derived key length of a PBKDF), the supplied password hash value and the actual password hash value should be equal in length.

[headline tag=”div” css_class=”h1″ color=”color2″]Putting It All Together [/headline]
  • [list type=”tick” color=”color1″]
    • If your system contains user accounts and you want to provide user account security, your best option will be using a Password Based Key Derivation function such as Bcrypt or PBKDF2.
    • If you need to provide the salt (such as with PBKDF2), use a Cryptographically Secure Pseudo-Random Number Generator (CSPRNG) to generate the salt with enough size to help ensure its uniqueness.
    • Ensure your salt is long enough (128 – 192 bits)
    • Ensure passwords are a minimum of 12 characters or use pass-phrases
    • Ensure the passwords/pass-phrases use the complex password requirements (whether passwords or pass-phrases)
[/list]

 

About the author

Max McCarty

Max McCarty is a software developer with a passion for breathing life into big ideas. He is the founder and owner of LockMeDown.com and host of the popular Lock Me Down podcast.