What reference clock is an atomic clock measured against?

This is a good and somewhat tricky question for a number of reasons. I will try to simplify things down.

SI Second

First, let's look at the modern definition of the SI second.

The second, symbol s, is the SI unit of time. It is defined by taking the fixed numerical value of the caesium frequency ∆νCs, the unperturbed ground-state hyperfine transition frequency of the caesium 133 atom, to be 9192631770 when expressed in the unit Hz, which is equal to s−1.

Emphasis mine

The key word here is unperturbed. This means, among other things, that the Cs atom should have no motion and there should be no external fields. We'll come back to why these systematic effects are very important shortly.

How an Atomic Clock Works

How do we build a clock based on this definitions of the second? We do it as follows. The Cs transition frequency is about 9.19 GHz. This is a microwave signal. Using analog electronics, engineers are able to make very very precise electric signals at these frequencies and these frequencies can be tuned to address the Cs atomic transition. The basic idea is to bath the Cs atoms in microwave radiation in the vicinity of 9.192631770 GHz. If you are on resonance the atoms will be excited to the excited state. If not they will stay in the ground state. Thus, by measuring whether the atoms are in the ground or excited state you can determine if your microwave signal is on or off resonance.

What we actually end up using as the clock (the thing which ticks off periodic events that we can count) is actually the 9.19 GHz microwave signal which is generated by some electronics box*. Once we see 9192631770 oscillations of this microwave signal (counted by measuring zero crossing of the microwave signal using electronics) we say that one second has passed. The purpose of the atoms is to check that the microwave frequency is just right. This is similar to how you might reset your microwave or oven clock to match your phone occasionally. We calibrate or discipline one clock to another.

So an atomic clock works by disciplining a microwave signal to an atomic transition frequency. Now, suppose you build a clock based on this principle and I also build one and we start our clocks at the same time (turn on our microwave oscillators and start comparing to the atoms occasionally). There are two possibilities. The first is that our two clocks always tick at the exact same time. The second is that there is noise or fluctuations somewhere in the system that cause us to get ticks at slightly different moments in time. Which do you think happens? We should be guided by the principle that nothing in experimental physics is ever exact. There is always noise. Atomic clock physics is all about learning about and understanding noise.

Clock Accuracy

This is the main topic of the OP's question. This is also where the key word unperturbed comes back into play. The Zeeman effect says that if the atom is in a magnetic field its transition frequency will shift slightly. This means a magnetic field constitutes a perturbation. This is one reason why your clock and my clock might tick at different moments in time. Our atoms may experience slightly different magnetic fields. Now, for this reason you and I will try really hard to ensure there is absolutely no magnetic field present in our atomic clock. However, this is difficult because there are magnetic materials that we need to use to build our clock, and there are magnetic fields due to earth and screwdrivers in the lab and all sorts of things. We can do our best to eliminate the magnetic field, but we will never be able to remove it entirely. One thing we can do is we can try to measure how large the magnetic field is and take this into account when determining our clock frequency. Suppose that the atoms experience a linear Zeeman shift of $\gamma = 1 \text{ MHz/Gauss}$**. That is

$$ \Delta f = \gamma B $$

Now, if I go into my atomic clock I can try to do my best to measure the magnetic field at the location of the atoms. Suppose I measure a magnetic field of 1 mG. This means that I have a known shift of my Cs transition frequency of $\Delta f = 1 \text{ MHz/Gauss} \times 1 \text{ mG} = 1 \text{ kHz}$. This means that, in absence of other perturbations to my atoms, I would expect my atoms to have a transition frequency of 9.192632770 GHz instead of 9.192631770 GHz.

Ok, so if you and I both measure the magnetic fields in our clocks and compensate for this linear Zeeman shift, we now get our clocks ticking at the same frequency, right? Wrong. The problem is that however we measure the magnetic field, that measurement itself will have some uncertainty. So I might actually measure the magnetic field in my clock to be

$$ B = 1.000 \pm 0.002\text{ mG} $$

This corresponds to an uncertainty in my atomic transition frequency of

$$ \delta f = 2 \text{ Hz} $$

So that means because of uncertainty about my systematic shifts I don't exactly know the transition frequency for my atoms. That is, I don't have unperturbed ground state Cs atoms so my experiment doesn't exactly implement the SI definition of the second. It is just my best guess.

But, we do have some information. What if we could compare my atoms to perfect unperturbed Cs atoms? How much might my clock differ from that ideal clock? Suppose I decrease the frequency of my clock by 1 kHz to account for the magnetic field shift so that my clock runs at

$$ f_{real} = 9192631770 \pm 2 \text{ Hz} $$

While the ideal Cs clock runs (by definition of the SI second) at exactly

$$ f_{ideal} = 9192631770 \text{ Hz} $$

Let’s run both of these for $T= 1 \text{ s}$. The ideal clock will obviously tick off $$ N_{ideal} = f_{ideal} T = 9192631770 $$ oscillations since that is the definition of a second. How many times will my clock tick? Let's assume the worst case scenario that my clock is slow by 2 Hz. Then it will tick

$$ N_{real} = f_{real} * T = 91926317\textbf{68} $$

It was two ticks slow after one second. Turning this around we can ask if we used my clock to measure a second (that is if we let it tick $N_{real} = 9192631770$ under the assumption - our best guess - that the real clock's frequency is indeed 9.192631770 GHz) how long would it really take?

$$ T_{real} = 9192631770/f_{real} \approx 1.00000000022 \text{ s} $$

We see that after one second my clock is slow by about 200 ps after 1 s. Pretty good. If you run my clock for $5 \times 10^9 \text{ s} \approx 158.4 \text{ years}$ then it will be off by one second. This corresponds to a fractional uncertainty of about

$$ \frac{1 \text{ s}}{5 \times 10^9 \text{ s}} \approx \frac{2 \text{ Hz}}{919263170 \text{ Hz}} \approx 2\times 10^{-10} = 2 \text{ ppb} $$

Frequency Uncertainty to Seconds Lost

Here I want to do some more mathematical manipulations to show the relationship between the fractional frequency uncertainty for a clock and the commonly referred to "number of seconds needed before the clock loses a second" metric.

Suppose we have two clocks, an ideal clock which has unperturbed atoms which runs at frequency $f_0$ and a real clock which we've calibrated so our best guess is that it runs at $f_0$, but there is an uncertainty $\delta f$, so it really runs at $f_0 - \delta f$. We are now going to run these two clocks for time $T$ and see how long we have to run it until they are off by $\Delta T = 1 \text{ s}$.

As time progresses, each clock will tick a certain number of times. The $I$ subscript is for the ideal clock and $R$ is for real.

\begin{align} N_I =& f_0T\\ N_R =& (f_0 - \delta f)T \end{align}

This relates the number of ticks to the amount of time that elapsed. However, we actually measure time by counting ticks! So we can write down what times $T_I$ and $T_R$ we would infer from each of the two clocks (by multiplying the observed number of oscillations by the presumed oscillation frequency $f_0$).

\begin{align} T_I =& N_I/f_0 = T\\ T_R =& N_R/f_0 = \left(\frac{f_0 - \delta f}{f_0}\right) T_I = \left(1 - \frac{\delta f}{f_0}\right)T_I \end{align}

These are the key equations. Note that in the first equation we see that the time inferred from the ideal clock $T_I$ is equal $T$ which of course had to be the cause because time is actually defined by $T_I$. Now, for the real clock we estimated its time reading by dividing its number of ticks, $N_R$ (which is unambiguous) by $f_0$. Why didn't I divide by $f_0 + \delta f$? Remember that our best guess is that the real clock ticks at $f_0$, $\delta f$ is an uncertainty, so we don't actually know the clock is ticking fast or slow by amount $\delta f$, we just know that it wouldn't be so statistical improbable that we are off by this amount. It is this uncertainty that leads to the discrepancy in the time reading between the real and ideal clocks.

We now calculate

\begin{align} \Delta T = T_I - T_R = \frac{\delta f}{f_0} T_I \end{align}

So we see

\begin{align} \frac{\Delta T}{T_I} = \frac{\delta f}{f_0} \end{align}

So we see that the ratio of the time difference $\Delta T$ to the elapsed time $T$ is given exactly by the ratio of the frequency uncertainty $\delta f$ to the clock frequency $f_0$.

Summary

To answer the OP's question, there isn't any perfect clock against which we can compare the world's best atomic clocks. In fact, the world's most accurate atomic clocks (optical clocks based on atoms such as Al, Sr, or Yb) are actually orders of magnitude more accurate than the clocks which are actually used to define the second (microwave Cs clocks).

However, by measuring systematic effects we can estimate how far from ideal a given real clock is from an ideal clock. In the example I gave above, if we know the magnetic field is less than .002 mG then we know that the clock is less than 2 Hz from an ideal clock frequency. In practice, every clock has a whole zoo of systematic effects that must be measured and constrained to quantify the clock accuracy.

And one final note. Another important clock metric which we haven't touched on here is clock stability. Clock stability is related to the fact that the measurement we use to determine if there is a frequency detuning between the microwave oscillator and the atomic transition frequency will always have some statistical uncertainty to it (different from the systematic shift I described above) meaning we can't tell with just one measurement exactly what the relative frequency between the two is. (In absence of drifts) we can reduce this statistical uncertainty by taking more measurements, but this takes time. A discussion of clock stability is outside of the scope of this question and would require a separate question.

Reference Frames

Here is a brief note about reference frames because they're mentioned in the question. Special and general relativity stipulate that time is not absolute. Changing reference frames changes the flow of time and even sometimes the perceived order of events. How do we make sense of the operation of clocks, especially precision atomic clocks, in light of these facts? Two steps.

First, see this answer that convinces us we can treat the gravitational equipotential surface at sea level as an inertial frame. So if all of our clocks are in this frame there will not be any relativistic light shifts between those clocks. To first order, this is the assumption we can make about atomic clocks. As long as they are all within this same reference frame, we don't need to worry about it.

Second, however, what if our clocks are at different elevations? The atomic clocks in Boulder, Co are over 1500 m above sea level. This means that they would have gravitational shifts relative to clocks at sea level. In fact, just like the magnetic field, these shifts constitute systematic shifts to clock frequencies which must be estimated and accounted. That is, if your clock is sensitive (or stable) enough to measure relativistic frequency shifts then part of the job of running the clock is to estimate the elevation of the clock relative to the Earth's sea level equipotential surface. Clocks are now so stable that we are able to measure two clocks running at different frequencies if we lift one clock up just a few cms relative to another one in the same building or room. See this popular news article.

So the answer to any question about reference planes and atomic clocks is as follows. When specifying where "time" is defined we have to indicate the gravitational equipotential surface or inertial frame that we take as our reference frame. This is typically conventionally the surface of earth. For any clocks outside of this reference (remember that the GPS system uses atomic clocks on satellites) we must measure the position and velocity of these clocks relative to the Earth reference frame so that we can estimate and correct for the relativistic shifts these clocks experience. These measurements will of course come with some uncertainty which results in additional clock inaccuracies as per the rest of my answer.

Footnotes

*You might wonder: Why do we need an atomic clock then? Can't we just take our microwave function generator and set it to 9.192631770 GHz and use that as our clock? Well sure, you can dial in those number on your function generator, but what's really going to bake your noodle is "how do we know the function generator is outputting the right frequency?" The answer is we can't truly know unless we compare it to whatever the modern definition of the second is. The microwave signal is probably generated by multiply and dividing the frequency of a mechanical oscillator such as a quartz oscillator or something which has some nominal oscillation frequency, but again, we can't truly know what the frequency of that thing is unless we compare it to the definition of the second, an atom.

**I made this number up. Cs transition which is used for Cs atomic clocks actually doesn't have a linear Zeeman shift, just a quadratic Zeeman shift, but that doesn't matter for purposes of this calculation.


BIPM and TAI

The International Bureau of Weights and Measures (BIPM) in France computes a weighted average of the master clocks from 50 countries. That weighted average then gives International Atomic Time (TAI), which forms the basis of the other international times (e.g., UTC, which differs from TAI by the number of leap seconds that have been inserted, currently 37).

There isn't, however, a single source that gives TAI in real time. Rather, BIPM basically collects statistics from each national lab, computes a worldwide average, and publishes a monthly circular showing how each differed from the average over the course of the previous month. The national labs then use this data to adjust their clocks so they all stay in tight synchronization.

Most of the statistics are collected by using GPS for dissemination. That is, a laboratory will periodically compare their local time to the time they receive via GPS, and send the difference they observed to BIPM. A few links (8, as of the current circular) use two-way transmission of their current time and frequency instead.

BIPM also publishes a weekly "rapid UTC" report with similar information to give national labs slightly more up to date information to help stay in sync better.

To assist the GPS based comparisons, BIPM periodically (most recently in late 2018) does trips around the world to the various national labs with a couple of GPS receivers that are used to calibrate the receivers at each lab.

Individual Labs

The master clocks from those countries are themselves an average of a number of atomic clocks, all stored in vaults to keep them in the most constant environment possible.

These are not, however, all identically constructed though. Let me give the US Naval Observatory's master clock as one example:

The atomic clock timescale of the Observatory is based on an ensemble of cesium-beam frequency standards, hydrogen masers, and rubidium fountains. Frequency data from this ensemble are used to steer the frequency of another such maser, forming our designated Master Clock (MC), until its time equals the average of the ensemble, thereby providing the physical realization of this "paper timescale."

Specifically, the frequency of a device called an Auxiliary Output Generator is periodically adjusted so as to keep the time of this maser synchronized as closely as possible with that of the computed mean timescale USNO timescale UTC (USNO), which in turn adjusted to be close to the predicted UTC. The unsteered internal reference timescale is designated as A.1, while the reference of the actual Master Clock is called UTC (USNO).

UTC (USNO) is usually kept within 10 nanoseconds of UTC. An estimate of the slowly changing difference UTC - UTC (USNO) is computed daily.

GPS

The most easily available reference clock for many people is a GPS signal, so it's probably worth mentioning a bit about it. Each GPS satellite has at least one atomic clock on board (and most have two). These are (occasionally) adjusted by a ground station (Schriever Air Force Base, Colorado), ultimately based on the master clock from the US Naval Observatory.

Also note, however, that most typical GPS receivers will use time from other satellite systems (e.g., GLONASS) interchangeably with actual GPS satellites. In fact, at any given time it's pretty routine that you're using signals from some some satellites from each system. From the user's viewpoint, the two are identical, but GLONASS is a Russian system so (unsurprisingly) it's controlled from a Russian base station and they use their own master clock as the basis for its time, though the US and Russia both contribute to TAI, so the clocks remain tightly synchronized.

Another mildly interesting point: the clocks on GPS satellites have to be adjusted due to relativistic effects--both special and general relativity affect the time (i.e., they're affected both by the fact that they're moving fast, and the fact that they're at high enough altitude that they're much less affected by the earth's gravity than ground-based clocks).

As noted in the section on BIPM and TAI, the various laboratories themselves also use GPS (and GLONASS) for their internal comparisons to help them stay in sync with each other.

Summary

The international standard is based on a weighted average of the standards from 50 different countries, each of which is (in turn) based on a weighted average of a number of separate clocks. The individual clocks are of at least three distinct types (cesium, hydrogen and rubidium).

At least for the US Naval Observatory, the official final output is actually via a hydrogen maser, which is occasionally adjusted to synchronize its current time/frequency with that of the rest of the ensemble.

The unofficial final output used by most people is GPS (or equivalently, GLONASS, etc.) These also include their own atomic clocks, but those are adjusted to maintain synchronization with the ground-based reference clocks.

TAI is approximates the SI second about as closely as current technology supports (and will probably be updated when technology improves substantially--though such a substantial change may easily lead to a change in the SI definition of the second as well). Although it's based on measurements, TAI is never really current--it's based on collecting data, averaging it, and then (after the fact) publishing information about how each laboratory's master clock differed from the weighted average of all the clocks.

References

BIPM

USNO Master Clock

USNO Time Scale

2018 group 1 calibration trip

Explanatory Supplement to BIPM Circular T


However, if there is no absolute reference frame to measure "real time" for, what is the reference clock that an atomic clock can be measured against?

They are measured against an ensemble of other identically constructed atomic clocks (all at rest with respect to each other and under identical operating conditions). The $10^{-16}$ means that two such clocks will on average drift apart from each other at a rate on the order of a picosecond every few hours.