Reasons not to assume device MAC address is unique

Based on your statements that you can confirm during provisioning that the manufacturer MAC is in fact unique within the network of devices you are creating (which is not in and of itself a certainty, even though it should be), you are probably fine proceeding, but consider the following questions:

  • Are you using the MAC for security checks (authentication, authorization)? If so, a MAC is not sufficient. Don't even consider it. Use a cryptographic structure, and transmit any auth requests securely.

  • Is 48 bits wide enough? It probably is, but worth asking.

  • Will you ever need to repair a device by replacing its nic?

  • If you do replace a device in its entirety, or replace its nic, will you need to be able to associate the new address to the existing key in your database in order to assure continuity of data collection for the deployment location?

  • Will there be any maintenance interface by which a user (authorized or not) might change the nic at the ROM, driver, or OS level? An attacker could introduce flaws in your data if they were to modify the MAC.

  • Will your data ever be joined with other datasources using MAC as a key?

  • Will you ever use the MAC for any networking purpose other than simply navigating the layer2 LAN the device is connected to (wired or wireless)?

  • Will the LAN your devices are connected to, be a private network, or one that large numbers of transient clients (like employees cellphones) will connect to?

If your answers are

NO, yes, no, no, no, no, no, private

then I can't think of any real flaw in your plan.

Keep in mind, you don't need globally unique MACs to pull this off; you just need to make sure that the subset of Internet devices that call your API are unique. Just like a duplicate nic assigned in two different cities can't collide because they are on different LANs, you can't have a database key collission on a MAC if it doesn't ever call your API.


MAC addresses are not unique

There can be, and will be duplicates with MACs. There are several reasons for that, one being that they need not be (globally) unique.

The MAC must be unique on the local network, so ARP/NDP can do its job, and the switch knows where to send incoming datagrams to. Usually (not necessarily) that precondition is fulfilled and things work just fine, simply because the likelihood of having two identical MACs on the same LAN, even if they are not unique, is quite low.

Another reason is that there simply exist more devices than there are addresses. While 48 bit addresses sounds like there's enough addresses for everybody until the end of days, that's not the case.

The address space is divided into two 24-bit halves (it's slightly more complicated, but let's ignore the petty details). One half is the OUI that you can register with the IEEE and assign to your company for around 2000 dollars. The remaining 24 bits, you do whatever you want. Of course you can register several OUIs, which is what the bigger players do.

Take Intel as an example. They have registered a total of 7 OUIs, giving them a total of 116 million addresses.
My computer's mainboard (which uses a X99 chipset) as well my laptop's mainboard as well as the mainboard of every x86 based computer that I have owned during the last 10-15 years had an Intel network card as part of the chipset. Certainly there's a lot more than 116 million Intel-based computers in the world. Thus, their MACs cannot possibly be unique (in a sense of globally unique).

Also, cases have been reported of uh... cheaper... manufacturers simply "stealing" addresses from someone else's OUI. In other words, they just used some random address. I've heard of manufacturers that just use the same address for a complete product range, too. Neither of that is really conforming or makes a lot of sense, but what can you do about it. These network cards exist. Again: The likelihood that it becomes a practical problem is still very low if addresses are used for what they're intended, you need to have two of them on the same LAN to even notice.

Now, what to do about your problem?

The solution is maybe simpler than you think. Your IoT devices will most probably need some notion of time, usually time is automatically obtained via NTP. The typical precision of NTP is in the microsecond range (yes, that's micro, not milli). I just ran ntpq -c rl to be sure and was told 2-20.

The likelihood of two of your devices being turned on for the first time at the precise same microsecond is very low. It's generally possible to happen (especially if you sell millions of them in a very short time, congratulations on your success!), sure. But it's not very likely -- in practice it will not happen. Thus, save the time after first booting up on permanent store.

The boot time of your IoT device will be the same on every device. Except that's not true at all.
Given a high resolution timer, boot times are measurably different even on the same device, every time. It's maybe only a few clock ticks different (or a few hundred thousand, if you read something like the CPU's time stamp counter), so not very unique altogether, but it sure adds some entropy.
Similarly, the time it takes connect to return the first time you access your API site will be slightly, but measurably, different every time. Similarly, getaddrinfo will take a slightly different, measurable amount of time for every device when looking up your web API's hostname for the first time.

Concatenate those three or four sources of entropy (MAC address, time of first power-on, time to boot for the first time, connect time) and calculate a hash from that. MD5 will do just fine for that purpose. There, you're unique.

While that does not truly guarantee uniqueness, it "pretty much" guarantees it, with a neglegible chance of failure. You would have to have two devices with identical MACs that are turned on for the first time on the same microsecond, and took the exact same time to boot, and to connect to your site. That isn't going to happen. If it does happen, you should immediately start playing the lottery because to all appearances, you're guaranteed to win.

If, however, "will not happen" is not good enough as a guarantee, simply pass each device a sequentially increasing number (generated on the server) the first time they access your web API. Let the device store that number, done.


Since the problem here is really an XY Problem, I'm going to address solving that: how to get a unique identifier for a piece of hardware the first time it boots without having to preload identifiers onto them. All the good methods really boil down to one thing: having a source of entropy.

If your hardware has something designed to be a hardware entropy source (note: this is basically a requirement for any proper IoT device implementation since it's needed for TLS, so your hardware should be designed with that in mind), just use that. If not, you have to get creative.

Fortunately, almost every computer ever made has an excellent source of entropy: crystal oscillators (clocks). The rate of a given crystal is not just dependent on subtle temperature changes, but is subject even to temperature hysteresis in nonlinear ways. However, to measure the entropy, you need a second clock to time the first. What this means is that, whenever your computer has at least two clocks you can sample, you can use the rate of one as measured by the other as a very high quality entropy source.