What is the correct regex for matching values generated by uuid.uuid4().hex?

To be more specific. This is the most precise regex for catching uuid4 both with and without dash, and that follows all the rules of UUID4:

[a-f0-9]{8}-?[a-f0-9]{4}-?4[a-f0-9]{3}-?[89ab][a-f0-9]{3}-?[a-f0-9]{12}

You can make sure it also catches capital letters with ignore case. In my example with re.I. (uuid's do not have capital letters in it's output, but in input it does not fail, just ignores it. Meaning that in a UUID "f" and "F" is the same)

I created a validater to catch them looking like this:

def valid_uuid(uuid):
    regex = re.compile('^[a-f0-9]{8}-?[a-f0-9]{4}-?4[a-f0-9]{3}-?[89ab][a-f0-9]{3}-?[a-f0-9]{12}\Z', re.I)
    match = regex.match(uuid)
    return bool(match)

Then you can do:

if valid_uuid(my_uuid):
    #Do stuff with valid my_uuid

With ^ in the start and \Z in the end I also make sure there is nothing else in the string. This makes sure that "3fc3d0e9-1efb-4eef-ace6-d9d59b62fec5" return true, but "3fc3d0e9-1efb-4eef-ace6-d9d59b62fec5+19187" return false.

Update - the python way below is not foolproof - see comments:

There are other ways to validate a UUID. In python do:

from uuid import UUID
try:
    UUID(my_uuid)
    #my_uuid is valid and you can use it
except ValueError:
    #do what you need when my_uuid is not a uuid

As far as I know, Martijn's answer is not 100% correct. A UUID-4 has five groups of hexadecimal characters, the first has 8 chars, the second 4 chars, the third 4 chars, the fourth 4 chars, the fifth 12 chars.

However to make it a valid UUID4 the third group (the one in the middle) must start with a 4:

00000000-0000-4000-0000-000000000000
              ^

And the fourth group must start with 8, 9, a or b.

00000000-0000-4000-a000-000000000000
              ^    ^

So you have to change Martijn's regex to:

import re
uuid4hex = re.compile('[0-9a-f]{12}4[0-9a-f]{3}[89ab][0-9a-f]{15}\Z', re.I)

Hope this helps!

Tags:

Python

Regex