How to parse user agent string using Python

There is a library called httpagentparser for that:

import httpagentparser
>>> s = "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/532.9 (KHTML, like Gecko) Chrome/5.0.307.11 Safari/532.9"
>>> print httpagentparser.simple_detect(s)
('Linux', 'Chrome 5.0.307.11')
>>> print httpagentparser.detect(s)
{'os': {'name': 'Linux'},
 'browser': {'version': '5.0.307.11', 'name': 'Chrome'}}

The answer I am about to give is not about an open-source project, but it does provide information that whoever is researching how to parse the HTTP user-agent string to obtain device intelligence will want to know about.

WURFL is a time-honored tool to do User-Agent (and more generally HTTP request) analysis and obtain easily consumable device/browser information. This is the de-facto standard in the Ad Tech industry to squeeze the last drop of information out of HTTP requests, thanks to a proprietary database. In practice, code will look something like:

from pywurfl.wurfl import Wurfl

# Create a WURFL Engine. Please note that the installed wurfl.zip path may change.
# for example, on OS X systems, it will be in `/usr/local/share/wurfl/wurfl.zip`
# on Linux systems, it will be in `/usr/share/wurfl/wurfl.zip`.
wurfl = Wurfl('/usr/share/wurfl/wurfl.zip')

# Lookup an HTTP request
http_request = {
    "accept-encoding": "gzip, deflate, br",
    "accept-language": "en-US,en;q=0.9",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp",
    "user-agent": " Mozilla/5.0 (Linux; Android 10; SM-G981U1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Mobile Safari/537.36",
}
dev = wurfl.parse_headers(http_request)

# You can also lookup a device with just the user-agent string
# dev = wurfl.parse_useragent(user_agent)

# retrieve some properties and capabilities values

# WURFL device ID:
print("device id =", dev.id)

# Some static capabilities:
static_capabilities = ["model_name", "brand_name", "device_os"]

# Retrieve the value of a single static capability:
print("get_capability('model_name') =",
      dev.get_capability(static_capabilities[0]))

# Retrieve the value of many static capabilities at once:
print("get_capabilities(static_capabilities) =",
      dev.get_capabilities(static_capabilities))

# Some virtual capabilities:
virtual_capabilities = ["complete_device_name", "form_factor"]

# Retrieve the value of a single virtual capability:
print("get_virtual_capability('complete_device_name') =",
      dev.get_virtual_capability(virtual_capabilities[0]))

# Retrieve the value of many virtual capabilities at once:
print("get_virtual_capabilities(virtual_capabilities) =",
      dev.get_virtual_capabilities(virtual_capabilities))

# Make sure you release the device when you are finished
dev.release()

The code above, would return:

device id = samsung_sm_g981u_ver1_subuau1
get_capability('model_name') = SM-G981U1
get_capabilities(static_capabilities) = {'model_name': 'SM-G981U1', 'brand_name': 'Samsung', 'device_os': 'Android'}
get_virtual_capability('complete_device_name') = Samsung SM-G981U1 (Galaxy S20 5G)
get_virtual_capabilities(virtual_capabilities) = {'complete_device_name': 'Samsung SM-G981U1 (Galaxy S20 5G)', 'form_factor': 'Smartphone'}

More info can be found here.

For those who want to try WURFL (and PyWURFL specifically) without obtaining a Trial license from the ScientiaMobile, my company has recently released a version of WURFL (called WURFL Microservice) that can be obtained from the major marketplaces of AWS, Azure and GCP (in addition to ScientiaMobile itself of course). Also for that product Pythion is fully supported, albeit the syntax is slightly different as that product relies on a server side component in the Cloud for updates:

from wmclient import *

try:
    client = WmClient.create("http", "localhost", 8080, "")
      :
    ua = "Mozilla/5.0 (Linux; Android 7.1.1; ONEPLUS A5000 Build/NMF26X) AppleWebKit/537.36 (KHTML, like Gecko) " \
         "Chrome/56.0.2924.87 Mobile Safari/537.36 "

    client.set_requested_static_capabilities(["brand_name", "model_name"])
    client.set_requested_virtual_capabilities(["is_smartphone", "form_factor"])
    print()
    print("Detecting device for user-agent: " + ua);

    # Perform a device detection calling WM server API
    device = client.lookup_useragent(ua)
           :
        # Let's get the device capabilities and print some of them
        capabilities = device.capabilities
        print("Detected device WURFL ID: " + capabilities["wurfl_id"])
        print("Device brand & model: " + capabilities["brand_name"] + " " + capabilities["model_name"])
        print("Detected device form factor: " + capabilities["form_factor"])
        if capabilities["is_smartphone"] == "true":

Fully-fledged example and reference to GitHub client-code can be found here.

Disclosure: I work for the company that provides the library described here.


Werkzeug has a user agent parser built in.

http://werkzeug.pocoo.org/docs/quickstart/?highlight=user_agent#header-parsing

from werkzeug.test import create_environ
from werkzeug.wrappers import Request

environ = create_environ()
environ.update(HTTP_USER_AGENT=('Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
    ' AppleWebKit/537.36 (KHTML, like Gecko)'
    ' Chrome/76.0.3809.100 Safari/537.36'))
request = Request(environ)

request.user_agent.browser
'chrome'