Why is NTP syncing to LOCAL rather than remote server?
With only one NTP server configured, the algorithm isn't entirely sure who to trust. Even though, stratum is lower with the remote host, I bet the algorithm thinks local time is more trustworthy.
Try using the
prefer keyword with your
server statement to set that as a preferential time source.
So, it looks like this is a duplicate of This question, but I don't feel that poster got a sufficient answer, so I would still like to know why the local time is being preferred over the server.
For a truly sufficient answer, you are going to be digging into the bowels of a very complex algorithm. The documentation doesn't even get too specific but I am sure there's a white paper or specification out there.
If I do remove all of the "local" lines in the config as the answer to the other question suggest, what will happen if the server is unreachable? Does NTP die or does it just keep trying?
The NTP daemon doesn't die or stop, but it does quit synchronizing time after it fails to reach the remote server. This is why best practices will suggest minimum of three remote servers and not to use the LCL unless you are disconnected from the network. Three servers are suggested because when there are only two, and they disagree, which will it choose? The third server should help the algorithm eliminate the bogus server.
Lastly, I just noticed that you do not define a
driftfile. This might help?
It looks to me like the interval of offset (difference between your system time and that of the NTP hosttime) is too far different for NTP to properly set it.
1. Stop the NTP service 2. As root ntpdate -bs 10.130.33.201 to reset your time to something close 3. Start the NTP service
You should have no problems after that.
The stratum of 10.130.33.201 as LOCAL server is 9, which makes the local stratum calculated from this (9+1=10) compete with the local LOCAL server at stratum 10. Since the local LOCAL stratum has no network delays or jitter, it may look slightly better to ntpd than the remote one.
If you want this config to work, set the 'master' LOCAL server to a stratum lower than 9. Not too low if you want a time traceable to a stratum 1 server to be preferred.
I know this is old, but I think you are right. No one shows any way to debug ntpd issues. Turns out it is doable.
I think you were on the right track when you suspected that use of LOCAL(0) locally and on upstream server may be an issue.
It certainly was on a time island of 4 servers I had a similar issue with. These were all set to be peers of each other, so possibly a different issue to yours.
First though, there is a better way of handling time islands called orphan mode that is supported with ntpd versions of the last few years:
Orphan mode on doc.ntp.org
Initially all 4 servers had the same stratum of 10 and preferred their local clock. I fixed that and still they preferred their local clock (the stratum does seem to be important though).
I used ntpq command pe (peer), as, rv to get a handle on what was happening. You need to use rv (readvar) on the association number for the server to dump the information. pe and as seem to be sorted by the same index so you can get the as number that way. as has a field called condition that may show the value reject if it doesn't like the server.
In the rv output is a field called flash. If all is well this will be zero. If not it is a bitmask (displayed in hex) of the issues. They can be looked up here:
ntpd internal decodes
The issue I had was 0800 peer_loop. It turned out that refid of the clock is important. Seeing LOCAL(0) both on local clock and from remote server had ntpd thinking there was a loop. David Mills confirms that in posts on comp.protocols.time'How to avoid loop in NTP' (I have reached my limit of 2 links, sorry!)
Using the refid argument to fudge to set unique refid did not work - it still shows up as LOCAL(0) at recipient.
What did seem to work was using unique instance numbers for the local driver. 127.127.1.[0-3]. Use the same ID on both server and fudge line. When I did this this the servers generally synced to the lowest stratum server which usually used its local clock. However it occasionally tried to use one of the other servers that was using it as source. However times got in sync and seem to be staying that way.
Probably far too late to help, but I offer it up to show NTP is amenable to logic and troubleshooting. I took hours reaching the answer by trial and error and then found the docs later.