Get list of user-agents from nginx log

Solution 1:

awk -F'"' '/GET/ {print $6}' /var/log/nginx-access.log | cut -d' ' -f1 | sort | uniq -c | sort -rn
  • awk(1) - selecting full User-Agent string of GET requests
  • cut(1) - using first word from it
  • sort(1) - sorting
  • uniq(1) - count
  • sort(1) - sorting by count, reversed

PS. Of course it can be replaced by one awk/sed/perl/python/etc script. I just wanted to show how rich unix-way is.

Solution 2:

While the one liner by SaveTheRbtz does the job, it took several hours to parse my nginx access log.

Here is a faster version based on his, which takes less than 1 minute per 100MB of log file (corresponding to about 1 million lines):

sed -n 's!.* "GET.* "\([[:alnum:].]\+/*[[:digit:].]*\)[^"]*"$!\1!p' /var/log/nginx/access.log | sort | uniq -c | sort -rfg

It works with the default access log format of nginx, which is the same as the combined format of Apache's httpd and has the User-Agent as the last field, delimited by ".


Solution 3:

This is a slight variation of the accepted answer, using fgrep and cut.

cat your_file.log | fgrep '"GET ' | cut -d'"' -f6 | cut -d' ' -f1 | sort | uniq -c | sort -rn

There is something appealing about using "weaker" commands when it is possible.


Solution 4:

Awstats should do the trick, but will supply far more information. I hope this helps...


Solution 5:

Webalizer can do it.

Example:

webalizer -o reports_folder -M 5 log_file
  • -o reports_folder specifies folder where report is generated
  • -M 5 displays only the browser name and the major version number
  • log_file specifies log file name
  • source: ftp://ftp.mrunix.net/pub/webalizer/README