Verify internal NTP server is sending the correct time?

TL;DR:

  1. Configure your NTP server according to best current practices.
  2. (Shameless self-promotion warning.) Use my ntpmon check if your monitoring solution uses collectd, Nagios, or telegraf.

Long version:

Configuration

The most important foundation for good NTP monitoring is good NTP configuration. For best understanding this, read the NTP Best Current Practices (BCP 223/RFC 8633). Here's a condensed summary of its configuration recommendations:

  1. Keep your NTP software up-to-date
  2. Use between 4 and 10 sources
  3. Ensure you have a diversity of reference clocks represented in those sources
  4. Don't allow unauthenticated remote control (should be the default on most distros)
  5. Use the pool responsibly (should also be the default on most distros)
  6. Don't mix leap-smeared and non-leap-smeared sources
  7. Don't use unauthenticated broadcast mode
  8. Don't use anycast or load-balancing when you're serving time

Where to measure

Once you have a good local configuration, the main thing to remember is that your check should query the local NTP server for its metrics, rather than trying to manually measure offset from remote servers. The major NTP servers (ntpd and chronyd) already collect all the metrics you need, so checks which compare the clock against remote servers are ignoring a lot of NTP's built-in goodness.

Metric selection

So to your question, the metrics you should be most interested in are:

  • system offset: the calculated best guess of the local clock's offset from the one true time
  • root dispersion: the calculated maximum offset of the local clock from the stratum 0 sources

Monitoring

There are a few monitoring solutions for NTP - depending on what monitoring you already have in place, some might suit you better than others. I wrote an overview of these on my blog, here's a summary:

  1. Nagios:
    • check_ntp_peer: decent basic check; doesn’t check a wide enough variety of metrics; a little too liberal in what offsets it allows
    • check_ntp_time: not recommended; checks only the offset from a given remote NTP server
    • check_ntpd: reasonable check coverage; use it if you prefer perl over python.
    • ntpmon's nagios check
  2. collectd:
    • NTP plugin: some of the metrics it collects are unclear
    • ntpmon in collectd mode
  3. prometheus/influxdb
    • prometheus node exporter: not recommended; checks only the offset from a given remote NTP server
    • telegraf ntpq input plugin: a direct translation of ntpq output to telegraf metrics; this is probably too detailed if you just want to know, "Is my NTP server OK?"
    • ntpmon in telegraf mode

Caveats

  1. The above is a summary of the state as at October 2016 when I did my alerting and telemetry review. Things may have improved since.
  2. ntpmon is my project which I think overcomes the deficiencies of the checks which were available at the time. It supports both ntpd and chronyd, and the above-listed alerting and telemetry systems.

Sure, the standard approach is to use the bundled NTP client called ntpq. This utility can be used to display the connected servers, their reachability, time difference and jitter. Here's the example:

# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*metasntp12.admi .MRS.            1 u  274 1024  377   64.445    1.086   0.450
+cecar.ddg.lth.s 130.149.17.8     2 u  811 1024  377   48.143   -0.810   0.175
 dir.mcc.ac.uk   85.199.214.100   2 u   7d 1024    0   76.708   -1.654   0.000

Here you can see that three servers are configured, two are okay (377 reachability expands to binary 11 111 1111, where 1 means successful answer and 0 mean no answer - so 377 means 100% reachability), and the last one is probably dead for some reason. Offset stands for time offset in milliseconds and jitter is the variability.