Warning: Time moved backwards by 314 seconds.
Thanks for taking the time to read this post, and thanks in advance for any help you can offer. I opened a ticket with Linode Support, but they "politely" told me to come here for support. Without dwelling too much on it, I'll just say I was insulted and disappointed that they wouldn't take the time to assist a paying customer. That was not what I've come to expect from Linode.
So I'm hoping you guys can help pick up their slack. Again, any help you can provide is appreciated.
–----
I don't recall exactly when this started, but if I had to guess I'd say sometime in the last month or so.
On reboot my Atlanta Linode comes up with the time (presumably) synchronized to the host system's clock and then ntpdate/ntp fires and corrects the time. This involves a backwards time jump which seems to affect Dovecot the most, but I've also had blocked SSH connections until the system settles out.
I have a second Linode in another data center which after checking I found is off by 2 minutes. I compared the ntp.conf files between the two and found them to be identifical. To prove to myself that the issue is not directly related to a ntp configuration I booted the Atlanta node to a configuration profile that has init=/bin/bash and ran 'date'. This was an attempt to boot the Linode without networking support and before other processes had loaded so I could see what the host clock was set to. The assumption on my part is that the VM and the host are synchronized at boot, although I have no proof either way.
To the point, I took the value given from running the date command and compared it against my local laptop (which also uses NTP), several phones that sync against cell networks and my other Linode with a properly synchronized clock and found the Atlanta node to be off about 5 minutes.
From everything I've read this means the time has to be corrected in "BIOS", which for the Xen VPS setup I assume Linode runs would mean the host system that my Atlanta Linode runs on. Linode Support tells me that they've confirmed the host system's clock is correct.
By this point I've read various forum postings, searched via Google and read the official Dovecot wiki page here:
I've been running ntpd for years now, presumably without this issue (although I wasn't running many services that would have immediately complained), so installing/configuring ntpd has already been taken care of.
Here's an original log snippet before I made changes to ntp.conf:
May 3 20:21:12 atlanta ntpd[1226]: ntpd 4.2.6p3@1.2290-o Tue Jun 5 20:12:11 UTC 2012 (1)
May 3 20:21:12 atlanta ntpd[1227]: proto: precision = 0.972 usec
May 3 20:21:12 atlanta ntpd[1227]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
May 3 20:21:12 atlanta ntpd[1227]: unable to bind to wildcard address 0.0.0.0 - another process may be running - EXITING
May 3 20:21:12 atlanta dovecot: master: Dovecot v2.0.19 starting up (core dumps disabled)
May 3 20:21:13 atlanta dovecot: lmtp(1271): Connect from local
May 3 20:21:13 atlanta dovecot: auth-worker: mysql(127.0.0.1): Connected to database mailserver_v1
May 3 20:15:59 atlanta ntpdate[671]: step time server 50.116.38.157 offset -314.012644 sec
May 3 20:15:59 atlanta ntpd[1302]: ntpd 4.2.6p3@1.2290-o Tue Jun 5 20:12:11 UTC 2012 (1)
May 3 20:15:59 atlanta ntpd[1303]: proto: precision = 0.974 usec
May 3 20:15:59 atlanta ntpd[1303]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
May 3 20:15:59 atlanta ntpd[1303]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
May 3 20:15:59 atlanta ntpd[1303]: Listen and drop on 1 v6wildcard :: UDP 123
May 3 20:15:59 atlanta ntpd[1303]: Listen normally on 2 lo 127.0.0.1 UDP 123
May 3 20:15:59 atlanta ntpd[1303]: Listen normally on 3 eth0 74.207.228.106 UDP 123
May 3 20:15:59 atlanta ntpd[1303]: Listen normally on 4 eth0 fe80::fcfd:4aff:fecf:e46a UDP 123
May 3 20:15:59 atlanta ntpd[1303]: Listen normally on 5 lo ::1 UDP 123
May 3 20:15:59 atlanta ntpd[1303]: peers refreshed
May 3 20:15:59 atlanta ntpd[1303]: Listening on routing socket on fd #22 for interface updates
May 3 20:15:59 atlanta dovecot: log: Warning: Time moved backwards by 314 seconds.
May 3 20:15:59 atlanta dovecot: master: Warning: Time moved backwards by 314 seconds, waiting for 180 secs until new services are launched again.
May 3 20:15:59 atlanta dovecot: ssl-params: Warning: Time moved backwards by 313 seconds.
May 3 20:16:00 atlanta dovecot: anvil: Warning: Time moved backwards by 313 seconds.
May 3 20:16:00 atlanta dovecot: auth: Warning: Time moved backwards by 313 seconds.
May 3 20:16:00 atlanta dovecot: config: Warning: Time moved backwards by 313 seconds.
May 3 20:16:00 atlanta dovecot: lmtp(1271): Fatal: Time just moved backwards by 313 seconds. This might cause a lot of problems, so I'll just kill myself now. http://wiki2.dovecot.org/TimeMovedBackwards
I attempted to work around the early exit and restart of ntpd by adding a line to ignore the wildcard interface, but that caused synchronization to fail. I have confirmed that by reverting to the previous configuration that synchronization does work, even if the earlier exit and restart of ntpd returns.
root@atlanta:~# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+palpatine.steve 231.146.174.254 3 u 9 64 377 61.853 0.154 1.585
+samur.ulak.net. 192.36.144.23 2 u 5 64 377 145.905 -1.958 1.479
*time3.chpc.utah 132.163.4.103 2 u 67 64 376 56.198 1.018 1.111
-time01.muskegon 64.113.32.5 2 u 13 64 377 60.461 3.472 1.545
-europium.canoni 140.203.204.77 2 u 1 64 377 87.009 -4.037 0.870
root@atlanta:~# grep -Ev '#|^/r> /etc/ntp.conf
driftfile /var/lib/ntp/ntp.drift
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
server 0.ubuntu.pool.ntp.org
server 1.ubuntu.pool.ntp.org
server 2.ubuntu.pool.ntp.org
server 3.ubuntu.pool.ntp.org
server ntp.ubuntu.com
restrict -4 default nomodify nopeer noquery notrap
restrict -6 default nomodify nopeer noquery notrap
restrict 127.0.0.1
restrict ::1
I'm not sure what else to try. I found the /etc/init/hwclock-save upstart job and ran it, but it error'd out. Light research seems to show that the /dev/rtc0 device does not exist for Xen VMs. Too bad as I was hoping that I could save the updated time to a local clock (however it is virtualized) so the next boot would be current.
15 Replies
- Les
@akerl:
Xen provides the initial clock point to the Linode. That clock is initially based on the host's time, but it can skew over time and there isn't really a good way to stop that.
Hi Les, thanks for the reply.
As I noted in my post the clock is 5 minutes fast immediately upon reboot of my Atlanta Linode. To prove it's not my setup that is the problem, I booted from Finnix and it also reports the time as off by five minutes.
@akerl:
All our hosts keep their clocks in sync with NTP; that doesn't stop Xen from skewing the initial boot time under some circumstances. There really isn't another fix for this besides running NTP inside your Linode.
When you say under some circumstances, do you know of certain scenarios that yield worse time skews than others? I currently run a 32-bit OS and use a distro kernel via pv_grub. As part of the guide I originally followed I disabled the Xenify Distro option. Does that have any bearing?
@akerl:
If you have services that don't properly start before NTP has updated your Linode's internal clock, you may want to tune your init system to start them only after NTP has done its thing.
- Les
No offense to you, but that seems more like a workaround than an actual fix to the problem. I'll look modifying the init job order like you suggested, but I can't help but feel something is being overlooked.
- Les
@Guspaz:
You really ought to run ntpd inside your linode
I know it's long, but the OP mentions that I have been running ntpd in the Linode for years already. The problem I have with the workarounds mentioned thus far is finding a way to get ntpdate to run before all other services that depend on reliable time.
Dovecot has an Upstart job, it looks like Postfix and ntpd have init.d scripts and the ntpdate command is run as soon as eth0 comes online (/etc/network/if-up.d/ntpdate).
I have not had time to work on this again today, but i was going to try calling ntpdate from rc.local. I do not recall where that falls in the boot order, but I figured it was worth a try before I had to try and write an Upstart job (which I have no experience with).
For the record, this is Ubuntu 12.04 that I am working with.
@Guspaz:
I'm not sure I understand the problem then. The host system runs ntpd too.
That's what I was told via the support ticket. They also mentioned that dom0 runs ntpd, but the hypervisor is handing over the wrong time to the guest. I don't have Xen training, so I don't know the different layers that make up a Xen infrastructure. Googling around I saw mention of some older versions of Xen having a bug which causes the problem described here, but I've no way of knowing how Linode staff have their host systems configured.
As I also mentioned in this thread I booted the VM using Finnix which had the same initial time that my VM's current OS has: 5 minutes too fast. Since I'm unable to set the equivalent of BIOS/CMOS time like I would in a VMware or VirtualBox VM, this tells me the problem lies with the host configuration. In this situation I would suspect that either a software or a misconfiguration issue is to blame. Linode Support says otherwise.
@Guspaz:
The only way you'd see a time jump is if your linode has drifted, which it normally would. If you also run ntpd on your linode, it won't drift, so there won't be any time jump when you reboot.
My earlier comments address this, but for whatever reason (again, my ignorance of Xen doesn't help things) the current time isn't saved properly on shutdown/reboot like it would on another setup (VMware/VirtualBox/real hardware), so the time given by the host (or hypervisor as I've been told) is what the VM has to start with when it boots.
I noticed that there are Upstart jobs dealing with loading and saving the system clock on startup/reboot, but they reference a Real Time Clock device that it appears Xen doesn't support.
All in all, the time given to my Linode on boot is incorrect and to work around it I have to have ntpdate run before any other service that requires accurate time (all of them?). Until I can figure that out I'll continually have problems every time the box is rebooted.
@obs:
I had the same problem on one node. You could try asking Linode to move you to another host
That's a good suggestion. Is that what you did to resolve your issue? I'm a little surprised that Linode Support hasn't made that offer to me as a workaround for the problems with their setup.
In my testing so far it is looking like I'd have to use a combination of a new init.d script and modifications to the Dovecot Upstart job to ensure that correct time is set before Postfix, Dovecot and the like come online. It seems that Dovecot is the only daemon to raise such a fuss, but reading over their notes on the issue I imagine that a backwards time jump can't be good for other daemons either.
- Les
Anyone know how to safely key off of that job?
While I already have what I need to workaround just Dovecot having the wrong initial time I'd like to correct the issue for all of the daemons that I can.
What I had:
start on (
runlevel 2
and static-network-up
and starting cron
and starting mysql
and starting dovecot
and starting ssh
and starting rc
)
What I have now:
start on (
runlevel 2
and static-network-up
and starting cron
and starting dovecot
and starting ssh
)
I removed rc because of the hanging issue and I removed mysql because Postfix gets upset when it starts via the SysV script and Upstart hasn't yet started the MySQL instance it needs to resolve addresses. Once I find a safe way to key off of rc so I can delay the SysV scripts I'll add it back to the list to delay that daemon's launch as well.
I tried creating a wrapper around the ntpdate command (only run on runlevel 2), but didn't have any luck. I thought about creating a SysV script that only paused execution of the other scripts there, but didn't think it would have any effect.
I keep telling myself that this should be really simple.
Thanks in advance!
@vonskippy:
Just curious, how often are you rebooting to make this such a big problem - and why?
Mainly just when applying kernel, OpenSSL or similar patches. However I've had unexpected results because of the time jumps and I'm tired of it, so I'd like to go ahead and knock out a solid workaround and be done with it.
If you have any tips/suggestions I'm all ears. Thanks.
First, I nuked the contents of the /etc/network/if-up.d/ntpdate file as we don't need ntpdate to be called twice, particularly within a short period. When left in place I noticed that the clock was adjusted by X amount twice, which sent the clock into the past 2x what was needed. Once I removed the contents of that file I replaced with:
#!/bin/sh
# Comments here describing why the file was gutted
exit 0
Second, I dropped this Upstart job file into the /etc/init/ directory as /etc/init/ntpdate.conf. The minified version:
description "Sync clock prior to launching time-sensitive services"
task
start on (
startup
and static-network-up
)
script
/usr/sbin/ntpdate -b -s ntp.ubuntu.com
end script
what I'm using:
# $Id$
# $HeadURL$
description "Sync clock prior to launching time-sensitive services"
################################################################################
# References:
#
# http://upstart.ubuntu.com/cookbook/#run-a-job-before-another-job
# http://upstart.ubuntu.com/cookbook/#start-depends-on-another-service
# http://upstart.ubuntu.com/cookbook/#start-must-precede-another-service
# http://upstart.ubuntu.com/cookbook/#really-understanding-start-on-and-stop-on
# http://upstart.ubuntu.com/cookbook/#method-2
# http://upstart.ubuntu.com/cookbook/#task
# http://askubuntu.com/questions/21378/how-can-i-make-sure-one-upstart-job-starts-before-other-upstart-jobs/22110#22110
# http://superuser.com/questions/502536/how-to-depend-on-an-upstart-job-from-an-init-script-on-ubuntu-12-04
#
# Set a VMware Player VM to NOT sync time between host/guest so I could set
# guest's clock to a false value so time syncing would be forced
#
# http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1189
#
# http://askubuntu.com/questions/62812/why-isnt-my-upstart-service-starting-on-system-boot
# http://askubuntu.com/a/71426
################################################################################
# Execute this Upstart job at startup and once all network interfaces listed in
# /etc/network/interfaces have been brought online.
task
start on (
startup
and static-network-up
)
# Debug output to prove that this Upstart job is executing properly
pre-start script
logger -p daemon.info -t "ntpdate.conf" "pre-start for ntpdate.conf"
end script
post-start script
logger -p daemon.info -t "ntpdate.conf" "post-start for ntpdate.conf"
end script
script
logger -p daemon.info -t "ntpdate.conf" "About to execute ntpdate"
# Use hard-coded NTP server instead of parsing /etc/ntp.conf
/usr/sbin/ntpdate -b -s ntp.ubuntu.com
# Parse NTP config file to sync against multiple servers. Borrowed from
# /etc/network/if-up.d/ntpdate
#/usr/sbin/ntpdate-debian -s -b 2>/dev/null || :
logger -p daemon.info -t "ntpdate.conf" "Just finished executing ntpdate"
end script
On my system this still leaves these four daemons starting with the wrong time, but unless they begin to show abnormal behavior I'm just going to leave things well enough alone (too much else to worry about):
postgrey
opendkim
mysql
slapd
I hope this helps someone.