Is there a good method for parsing the user-agent string?

For Java, take a look at User-Agent-Utils. It's fairly compact (< 50kB) and has no dependencies.

Note although the latest release is quite recent (1.21, released 2018-01-24), the library's page states:

Warning: This project is end-of-life and will not be updated regularly any longer

And on the github page it says:

EOL WARNING

This library has reached end-of-life and will not see regular updates any longer.

Version 1.21 was the last official release in 2018.


Have a look at the Java library I wrote for this purpose: Yauaa

I made a very simple servlet where you can try it out to see if it gives the answers you are looking for: https://try.yauaa.basjes.nl/

It is Apache 2 licensed and published into Maven so using it in a Java application is really easy. It is currently used in production on one of the busiest websites of the Netherlands (where I work).

See this blog about this https://techlab.bol.com/making-sense-user-agent-string/


  1. Is the structure of the User-Agent well defined? If yes - where can I find it exactly? (From my understanding of the RFC there is not much standardization here).

No, the structure of an User-Agent string is not standardized but is very similar between different agents. Although they are very similar, it is still necessary to use multiple patterns for detection.

  1. Assuming the question for #1 is No - is there a proper way to parse it to get the info I need?

You can try the library UADetector. It is a wrapper for the User-Agent-Database of user-agent-string.info.

  1. Is there a better way to get the info I need other than the User-Agent string?

I would not say it is a better or worse way, but another way to detect user agents is the client-side use of JavaScript to collect informations about the User-Agent and submitting it via hidden HTML inputs or XmlHttpRequest to your backend. It all depends on what you want to identify. For accurate detection of webcrawlers JavaScript won't be able to help.