How I can ensure that a link sent via email is opened only via user clicks from a mail client and not by bots?

In short: you can't.

Answer depends on the definition of "bots", but if a bot can crawl the e-mail, the bot can hit the link. A bot doesn't necessarily respect robots.txt, or mentioned html meta tags.

__

However (a bit outside of the scope):

  • You should have the <token> be a long format random as stated on a comment to your question.

  • You could have the link protected by a captcha or any other Turing test.

  • You could also explain what you've been observing in order for us to understand what you would like to avoid.


Incorrectly used meta tags

Instead of using two meta tags, you should put both values into a single tag. With two tags, some search engines may choose to obey only one of the two.

<meta name="robots" content="noindex,nofollow">

Robots.txt and robots meta tags are mutually exclusive

Disallowing pages in robots.txt prevents robots from downloading the page and seeing the meta tags. There are approximately zero bots that wouldn't obey robots.txt but would obey meta tags. You should choose one or the other, but not both.

Robots.txt won't prevent search engines from indexing URLs

If Google finds enough links to a URL, it may include that URL in its search index, even if that URL is disallowed in robots.txt. If your fear is that some of these URLs will get indexed in search engines, you should allow crawling in robots.txt but disallow indexing via the meta tag.

Meta tags won't prevent bots from hitting the URLs

If your fear is that bots will mess up your stats or cause other undesired effects when bots hit them, then you should use robots.txt. Search engines might still index a URL occasionally, but most bots will obey robots.txt and not even request the URLs

No way to prevent indexing and bot hits with robots.txt and meta tags

If you want to prevent indexing of the URLs and bot hits to the URLs, you are out of luck. There is no way to use robots.txt and meta tags to do both at the same time.

Are your tokens long enough?

A five byte token gives you 256^5 or 1.1E12 (1.1 trillion) possible URLs. If you send out a million email messages, that leaves a 1 in 1 million chance of getting an in-use token for each guess. If you send out a billion email messages, the odds of getting a in-use token rise to 1 in 1 thousand. You'd certainly want to increase the length of your token after sending out 1 million emails.

You could also get a lot more security without increasing the length of your URLs. Five bytes hex encoded is 10 characters. It would be smarter to use 10 characters that are randomly chosen from:

If you did that you could increase the possible number of tokens to 66^10 or 1.5E18 (1.5 quintillion). That would give you enough token space no matter how many emails you sent out.

Other ways to increase security

You could also employ any of the following tactics to further ensure that bots don't get access to this content:

  • Use server side configuration that gives an error code if a suspected robot hits these URLs. You could detect robots based on:
    • User agent
    • What other things that IP address is requesting
    • Can it pass captcha?
  • Require that users log in to see this content when clicking from an email.
  • Expire old tokens: require that clicks happen in the hours or days after email is sent out.

Not only can you not ensure it, it's highly likely that the link will be retrieved by programs not under the user's control. Many spam and phishing filters will pre-fetch any web pages linked to in an email to scan them for possible threats (I've had as many as five non-user hits recorded for a single link).

The solution is to make sure nothing sensitive is displayed on the page, and when the link is opened, nothing is changed on the server. Sensitive content should be placed behind a login barrier or equivalent. Actions that make changes (such as a password reset) should require the user to take an action on the page, such as clicking a "submit" button.