Storing Username and salt in separate table

Before getting into the analysis of the process to slow down cracking the hashes, I want to address something far more important first:

If I log in, and my hash happens to match some other user, I will get authenticated to that user. So your whole "look in the Users database to blindly find any match because I don't tie password hashes to users" is a horrifying approach to authentication.

Please don't do this.

Kirchoff's Principle suggests that a system must be secure even if an attacker knows how you do something. So, let's assume the attacker knows that you added fake usernames. Fine, now all the attacker has to do is to look for valid usernames and tie it to UserID before starting to crack hashes.

And to do that, I would look at the logged user activity in the database. I do not know what is logged in your app, but one has to assume that the user's activity will suggest the username associated with it, if it is not stored, specifically at some point in the database. Things like timestamps can make correlation easy.

And since your threat model includes the assumption that the attacker has access to the codebase and the entire database, your approach appears to do nothing but increase your design overhead and database size.

So, your entire approach relies on an attacker never being able to correlate UserId and Username. This is known as "Security by Obscurity" and, while it has its place, it is not a basis for a secure control.

Now let's tie my first point to my second. Let's say that I want to log into UserID 1 because I can see that it's the admin (or an account of interest). I know the password hash. Now I can take all the usernames and their salts to find a hash that might match User 1's hash. It no longer matters which username I use. It might be unlikely to find an exact match like this using Argon2, but this highlights the larger problem with your approach.

After some thinking, I will suggest that there is no significant security improvement.

Let's put the standard account protection: salting the password with a time-consuming algorithm (bcrypt, and so one). What a attacker can do :

  • Reverse the hash: almost impossible
  • Bruteforce the hash: almost impossible if the password is longer than 6 chars (because of bcrypt)
  • wordlist attack: as difficult as the password is far in the wordlist attack (impossible if it is not present)
  • reuse a cracked password against the target: possible
  • reuse a cracked password against another target: possible if the user reuses his password in multiple places (which is a bad practice).

With your solution, the attacks against the hashes are quite identical. For each password attempt, the attacker tries every salt+username and if the result is equal to one of the passwords stored in the User table, he succeeds.

It is correct to say that the dummy entries will slow down his work, but the same level of difficulty could be achieved by simply increasing the number of rounds of bcrypt or Argon2.

Your method permits to add operations for the attacker without adding ones for the real users (if we increase the number of rounds of bcrypt, the normal login will be slowed down too) which is good. But the price is an overcomplicated database representation. Not sure it worths it.

I think it is not interesting to consider the case where only the Username table is compromised and not the User table. As they are stored in a similar way, we must consider that someone able to view one, can see the other one.

Also consider the case when David is a regular user with password UnBr3Akable. The database stores with

UserID=12, password hash=1a2b3c, salt=67890

Adding dummy entries could lead to a case when hash(username=toto, salt=1234, password=helloworld) = 1a2b3c.
Then an attacker could log into David's account without knowing the real password.

The case is as rare as finding a hash collision and I'm not sure it is a real problem. But as every fake account could lead to login with a real account if a collision occurs, I am not sure that we can consider tham as fake as you think.

A random list of concerns without actual security threat estimation:

  • GDPR and similar data protection regulation might be an issue in that it might require you to also delete the username entry when a user requests full deletion of their data; how do you identify both entries? are you asking for the username and the password in the deletion form? or for the user id? if the user can know their user id, an attacker likely can as well
  • you open a separate attack vector with the approach, in that suddenly new users can be a threat to existing users. If the right username entry can be generated an attacker can log into an account of another user on the live system without knowing their actual password and without you knowing which account is used for this, unless you track the connection from login to userid being used -> which then is also accessible to an attacker with system access; yes finding the right combination to insert is likely difficult, but in a normal system this isn't a threat at all.
  • bugs (or deliberate code changes) have a greater risk too to run into the issue that one user might accidentally (or on purpose) log into the account of another user, do you have a way to notice this? In a "normal" system it's easy to have a generic test that makes sure the user id in a user session corresponds to the one associated with the provided username during authentication. In your approach this seems not possible.
  • "The fake users would always have 0 InvalidLogin and NULL lockeduntil. The valid users would be cleared daily." Assuming the clearing happens for all entries and does not distinguish (otherwise that code would tell an attacker who is fake), this means the longer an attacker can listen in to your database the larger the likelihood they can identify all active users by checking the invalid login field for a change.
  • are usernames email addresses? how does password reset work? do you send out mails for the fake users? can attackers identify the real users by trying your recovery method for each username?
  • Notice that usernames are normally not considered high value by endusers or software, they can relatively easily be noticed by glancing over someone's shoulder and are not necessarily encrypted in password stores. So getting hold of them to identify a targeted real user might not be that difficult in targeted attacks.
  • Many non-targeted attacks simply use username+password lists and thus avoid all the fake ones that don't appear in the lists they use, this is not a weakness for your approach,just cases where the additional effort does not pay off, though.
  • if this is a project in a bigger company where responsibility changes, having fake users in the database seems something that someone easily would consider some legacy data that needs to be cleared away; to prevent this additional documentation would need to be written; either that identifies all the fake accounts or just says there are some. In the first case an attacker can use this information too. In the latter case nobody can identify real rubbish entries that got added by some bug.
  • while you save time on the encryption methods compared to just making them more complex to achieve the same cost for attackers, you also spend more time with database inserts and selects - depending on your database of choice and you need to spend more harddrive space
  • this seems in general non-straight forward to understand so if you're not the sole maintainer, you might have additional documentation/mentoring cost and/or risk to accidentally introduce bugs when people try "fixing" things that are not meant to be fixed

After all, I think too that the design identifying the user by the generated hash is risky and the approach will surprise many developers - and surprise always means more cost due to documentation/handover/mentoring and potential bugs. Security wise,yes it can help in some scenarios, but you need to cover a lot of other edge cases that partially wouldn't even exist with a "normal" approach. You have undocumented fake data lying around that could anytime be removed by someone cleaning up and that needs to always get taken care of separately (e.g. no table constraints can be used to clean up the username table). Any log entry or other operation, introduced perhaps later for a cool new feature, that accidentally or on purpose provides a way to connect the two tables makes your approach moot. So unless you have a very specific scenario in mind, I'd say the additional overhead and potential risks that need to be evaluated outweigh the benefit.