Accessing document using a 6 letter token

Brute forcing

So you have an alphabet of size 36 and 6 characters. That gives you about two billion different tokens. Lets say you have a thousand different documents. That gives you a chanse of one in two million of guessing a token associated with a document. Trying from a thousand different IP:s every hour for a year would give you almost ten million guesses - that should give you a couple of documents.

Sure, the CAPTCHA makes this harder. But they are not perfect, and they can always be cracked by humans.

The problem here is that since you only enter a token and no document ID you can only rate limit on IP and not on document. That makes it very hard to protect against brute forcing unless you have a very large space to pick tokens from.

Sharing

A password is personal and you are encouraged not to share it. That means it can be easily changed if it is compromised, and you have some control over who gets their hands on it.

A document token like this is supposed to be shared by design. You have very little control over who gets it. It will end up on mailservers and backups and post its on peoples desktops all over the world.

You have no idea who has access to the token, and if you need to change it you will need to redistribute it to all the persons who are supposed to have it. That is neither secure not practical.

Conclusion: There must be a better way

This will not give you very good security. If the resource you are protecting is not very important it might be enough, but I would not use it for anything of value.

I do not know your exact use case, but whatever it is there must be a better way to solve this problem than rolling your own API. Using an existing solution would also save you the problem of having to write your own code.

Use an existing cloud storage service, a VPN connection into the company intranet, or something else. Just don't fire up your IDE and start coding away.

Update: Your use case

This is one of the cases where an access token is probably a good idea. But to get around the problems mentioned above I would do this:

  • Keep both the CAPTCHA and the rate limit by IP. (You might want to reconsider how the rate limiting is done in order to prevent accidental or deliberate DOS.)
  • To deal with the brute forcing, I would increase the size of the token. Google Drive uses 49 characters with both upper case letters, lower case letters and numbers. That should be enough for you as well.
  • To get around the sharing issue, print the URL with the token in a QR-code on the document itself. This brings the hole problem into the domains of physical papers that peoplpe are used to dealing with. The people who see the paper will have access to the digitial original. That is easy to grasp.
  • Consider setting a limit on how many times the document can be accessed, or at least a maximum time for how long the token can be used. If the car should be registered within one week, there is no reason for the token to work after two.
  • Do not store the tokens in plain text in your database. Hash them. (Something fast like SHA256 should be enough here - no need to roll out bcrypt when you have large random tokens.)
  • Use a CSPRNG to generate the tokens, otherwise they could be guessed by an attacker having access to a few tokens.

Since you say "any person holding viewing the 6 letter token can access to the original document", I assume there is nothing really secret in these (i.e. I couldn't commit fraud by simply finding a token in my neighbour's trash). Otherwise pick a regular authentication scheme with e-mail registration and passwords.

Many token-based systems are used the way you describe, though in your case the token length is strikingly small. I suggest you use tokens at least twice as long: this would make brute-force attacks impractical without making the system much harder to use.

PS. Oh, and please exclude letters O and I from your alphabet if you haven't already.


Limiting calling the API from a single IP more than 3 times/hour

First things first, this is a huge denial-of-service risk. Getting locked out for an hour just because someone mixed up "l" and "1" is unacceptable.

Keep in mind that pretty much all office computers are behind NAT44. There are several users behind each IP. With CGNAT (NAT444) and NAT464, you'll also see many home users in different buildings using the same IP.