Warning: Time moved backwards by 314 seconds.

Hi all,

Thanks for taking the time to read this post, and thanks in advance for any help you can offer. I opened a ticket with Linode Support, but they "politely" told me to come here for support. Without dwelling too much on it, I'll just say I was insulted and disappointed that they wouldn't take the time to assist a paying customer. That was not what I've come to expect from Linode.

So I'm hoping you guys can help pick up their slack. Again, any help you can provide is appreciated.

–----

I don't recall exactly when this started, but if I had to guess I'd say sometime in the last month or so.

On reboot my Atlanta Linode comes up with the time (presumably) synchronized to the host system's clock and then ntpdate/ntp fires and corrects the time. This involves a backwards time jump which seems to affect Dovecot the most, but I've also had blocked SSH connections until the system settles out.

I have a second Linode in another data center which after checking I found is off by 2 minutes. I compared the ntp.conf files between the two and found them to be identifical. To prove to myself that the issue is not directly related to a ntp configuration I booted the Atlanta node to a configuration profile that has init=/bin/bash and ran 'date'. This was an attempt to boot the Linode without networking support and before other processes had loaded so I could see what the host clock was set to. The assumption on my part is that the VM and the host are synchronized at boot, although I have no proof either way.

To the point, I took the value given from running the date command and compared it against my local laptop (which also uses NTP), several phones that sync against cell networks and my other Linode with a properly synchronized clock and found the Atlanta node to be off about 5 minutes.

From everything I've read this means the time has to be corrected in "BIOS", which for the Xen VPS setup I assume Linode runs would mean the host system that my Atlanta Linode runs on. Linode Support tells me that they've confirmed the host system's clock is correct.

By this point I've read various forum postings, searched via Google and read the official Dovecot wiki page here:

http://wiki2.dovecot.org/TimeMovedBackwards

I've been running ntpd for years now, presumably without this issue (although I wasn't running many services that would have immediately complained), so installing/configuring ntpd has already been taken care of.

Here's an original log snippet before I made changes to ntp.conf:

May  3 20:21:12 atlanta ntpd[1226]: ntpd 4.2.6p3@1.2290-o Tue Jun  5 20:12:11 UTC 2012 (1)
May  3 20:21:12 atlanta ntpd[1227]: proto: precision = 0.972 usec
May  3 20:21:12 atlanta ntpd[1227]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
May  3 20:21:12 atlanta ntpd[1227]: unable to bind to wildcard address 0.0.0.0 - another process may be running - EXITING
May  3 20:21:12 atlanta dovecot: master: Dovecot v2.0.19 starting up (core dumps disabled)
May  3 20:21:13 atlanta dovecot: lmtp(1271): Connect from local
May  3 20:21:13 atlanta dovecot: auth-worker: mysql(127.0.0.1): Connected to database mailserver_v1
May  3 20:15:59 atlanta ntpdate[671]: step time server 50.116.38.157 offset -314.012644 sec
May  3 20:15:59 atlanta ntpd[1302]: ntpd 4.2.6p3@1.2290-o Tue Jun  5 20:12:11 UTC 2012 (1)
May  3 20:15:59 atlanta ntpd[1303]: proto: precision = 0.974 usec
May  3 20:15:59 atlanta ntpd[1303]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
May  3 20:15:59 atlanta ntpd[1303]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
May  3 20:15:59 atlanta ntpd[1303]: Listen and drop on 1 v6wildcard :: UDP 123
May  3 20:15:59 atlanta ntpd[1303]: Listen normally on 2 lo 127.0.0.1 UDP 123
May  3 20:15:59 atlanta ntpd[1303]: Listen normally on 3 eth0 74.207.228.106 UDP 123
May  3 20:15:59 atlanta ntpd[1303]: Listen normally on 4 eth0 fe80::fcfd:4aff:fecf:e46a UDP 123
May  3 20:15:59 atlanta ntpd[1303]: Listen normally on 5 lo ::1 UDP 123
May  3 20:15:59 atlanta ntpd[1303]: peers refreshed
May  3 20:15:59 atlanta ntpd[1303]: Listening on routing socket on fd #22 for interface updates
May  3 20:15:59 atlanta dovecot: log: Warning: Time moved backwards by 314 seconds.
May  3 20:15:59 atlanta dovecot: master: Warning: Time moved backwards by 314 seconds, waiting for 180 secs until new services are launched again.
May  3 20:15:59 atlanta dovecot: ssl-params: Warning: Time moved backwards by 313 seconds.
May  3 20:16:00 atlanta dovecot: anvil: Warning: Time moved backwards by 313 seconds.
May  3 20:16:00 atlanta dovecot: auth: Warning: Time moved backwards by 313 seconds.
May  3 20:16:00 atlanta dovecot: config: Warning: Time moved backwards by 313 seconds.
May  3 20:16:00 atlanta dovecot: lmtp(1271): Fatal: Time just moved backwards by 313 seconds. This might cause a lot of problems, so I'll just kill myself now. http://wiki2.dovecot.org/TimeMovedBackwards

I attempted to work around the early exit and restart of ntpd by adding a line to ignore the wildcard interface, but that caused synchronization to fail. I have confirmed that by reverting to the previous configuration that synchronization does work, even if the earlier exit and restart of ntpd returns.

root@atlanta:~# ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+palpatine.steve 231.146.174.254  3 u    9   64  377   61.853    0.154   1.585
+samur.ulak.net. 192.36.144.23    2 u    5   64  377  145.905   -1.958   1.479
*time3.chpc.utah 132.163.4.103    2 u   67   64  376   56.198    1.018   1.111
-time01.muskegon 64.113.32.5      2 u   13   64  377   60.461    3.472   1.545
-europium.canoni 140.203.204.77   2 u    1   64  377   87.009   -4.037   0.870

root@atlanta:~# grep -Ev '#|^/r> /etc/ntp.conf

driftfile /var/lib/ntp/ntp.drift
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
server 0.ubuntu.pool.ntp.org
server 1.ubuntu.pool.ntp.org
server 2.ubuntu.pool.ntp.org
server 3.ubuntu.pool.ntp.org
server ntp.ubuntu.com
restrict -4 default nomodify nopeer noquery notrap
restrict -6 default nomodify nopeer noquery notrap
restrict 127.0.0.1
restrict ::1

I'm not sure what else to try. I found the /etc/init/hwclock-save upstart job and ran it, but it error'd out. Light research seems to show that the /dev/rtc0 device does not exist for Xen VMs. Too bad as I was hoping that I could save the updated time to a local clock (however it is virtualized) so the next boot would be current.

15 Replies

Xen provides the initial clock point to the Linode. That clock is initially based on the host's time, but it can skew over time and there isn't really a good way to stop that. All our hosts keep their clocks in sync with NTP; that doesn't stop Xen from skewing the initial boot time under some circumstances. There really isn't another fix for this besides running NTP inside your Linode. If you have services that don't properly start before NTP has updated your Linode's internal clock, you may want to tune your init system to start them only after NTP has done its thing.

  • Les

@akerl:

Xen provides the initial clock point to the Linode. That clock is initially based on the host's time, but it can skew over time and there isn't really a good way to stop that.

Hi Les, thanks for the reply.

As I noted in my post the clock is 5 minutes fast immediately upon reboot of my Atlanta Linode. To prove it's not my setup that is the problem, I booted from Finnix and it also reports the time as off by five minutes.

@akerl:

All our hosts keep their clocks in sync with NTP; that doesn't stop Xen from skewing the initial boot time under some circumstances. There really isn't another fix for this besides running NTP inside your Linode.

When you say under some circumstances, do you know of certain scenarios that yield worse time skews than others? I currently run a 32-bit OS and use a distro kernel via pv_grub. As part of the guide I originally followed I disabled the Xenify Distro option. Does that have any bearing?

@akerl:

If you have services that don't properly start before NTP has updated your Linode's internal clock, you may want to tune your init system to start them only after NTP has done its thing.

  • Les

No offense to you, but that seems more like a workaround than an actual fix to the problem. I'll look modifying the init job order like you suggested, but I can't help but feel something is being overlooked.

No, the kernel choice, architecture, or other options internal to your Linode don't impact this. The reality is that the clock used by the hypervisor to give guests their initial time can and will skew over time. We run NTP, which corrects the dom0's clock, not the hypervisor.

  • Les

You really ought to run ntpd inside your linode, since your linode's clock will drift otherwise. This will ensure that your clock is always accurate regardless of what's going on on the host system. If you get a big skew right on boot, you can set ntpdate to run on boot, which will give you the gross correction, and then rely on ntpd to cancel out any drift (ntpd basically adjusts how fast time passes to counter drift, without making any sudden adjustments).

@Guspaz:

You really ought to run ntpd inside your linode

I know it's long, but the OP mentions that I have been running ntpd in the Linode for years already. The problem I have with the workarounds mentioned thus far is finding a way to get ntpdate to run before all other services that depend on reliable time.

Dovecot has an Upstart job, it looks like Postfix and ntpd have init.d scripts and the ntpdate command is run as soon as eth0 comes online (/etc/network/if-up.d/ntpdate).

I have not had time to work on this again today, but i was going to try calling ntpdate from rc.local. I do not recall where that falls in the boot order, but I figured it was worth a try before I had to try and write an Upstart job (which I have no experience with).

For the record, this is Ubuntu 12.04 that I am working with.

I'm not sure I understand the problem then. The host system runs ntpd too. The only way you'd see a time jump is if your linode has drifted, which it normally would. If you also run ntpd on your linode, it won't drift, so there won't be any time jump when you reboot.

@Guspaz:

I'm not sure I understand the problem then. The host system runs ntpd too.

That's what I was told via the support ticket. They also mentioned that dom0 runs ntpd, but the hypervisor is handing over the wrong time to the guest. I don't have Xen training, so I don't know the different layers that make up a Xen infrastructure. Googling around I saw mention of some older versions of Xen having a bug which causes the problem described here, but I've no way of knowing how Linode staff have their host systems configured.

As I also mentioned in this thread I booted the VM using Finnix which had the same initial time that my VM's current OS has: 5 minutes too fast. Since I'm unable to set the equivalent of BIOS/CMOS time like I would in a VMware or VirtualBox VM, this tells me the problem lies with the host configuration. In this situation I would suspect that either a software or a misconfiguration issue is to blame. Linode Support says otherwise.

@Guspaz:

The only way you'd see a time jump is if your linode has drifted, which it normally would. If you also run ntpd on your linode, it won't drift, so there won't be any time jump when you reboot.

My earlier comments address this, but for whatever reason (again, my ignorance of Xen doesn't help things) the current time isn't saved properly on shutdown/reboot like it would on another setup (VMware/VirtualBox/real hardware), so the time given by the host (or hypervisor as I've been told) is what the VM has to start with when it boots.

I noticed that there are Upstart jobs dealing with loading and saving the system clock on startup/reboot, but they reference a Real Time Clock device that it appears Xen doesn't support.

All in all, the time given to my Linode on boot is incorrect and to work around it I have to have ntpdate run before any other service that requires accurate time (all of them?). Until I can figure that out I'll continually have problems every time the box is rebooted.

I had the same problem on one node. You could try asking Linode to move you to another host, I've only see this happen once so odds are you'll move to a node that has the correct time or a time skewed in the past which should cause less problems.

@obs:

I had the same problem on one node. You could try asking Linode to move you to another host

That's a good suggestion. Is that what you did to resolve your issue? I'm a little surprised that Linode Support hasn't made that offer to me as a workaround for the problems with their setup.

In my testing so far it is looking like I'd have to use a combination of a new init.d script and modifications to the Dovecot Upstart job to ensure that correct time is set before Postfix, Dovecot and the like come online. It seems that Dovecot is the only daemon to raise such a fuss, but reading over their notes on the issue I imagine that a backwards time jump can't be good for other daemons either.

Moving to a new host is not a solution to this issue. Any other host is just as likely to experience this issue, and statistically all hosts will eventually experience this issue.

  • Les

I didn't move host I just ignored it since it only caused issues with fsck on boot up which I could work around by running fsck via finnix. However even though Les states that moving host isn't the solution I've only encountered this once and I've had a lot of nodes over the years.

As a workaround to the hypervisor issue giving the wrong time I've put together an Upstart job, but it still needs work. It successfully sets the time prior to starting up key daemons, but on reboot it appears to hang the system. I've tracked that down to me using the 'rc' Upstart job as a start event.

Anyone know how to safely key off of that job?

While I already have what I need to workaround just Dovecot having the wrong initial time I'd like to correct the issue for all of the daemons that I can.

What I had:

start on (
    runlevel 2 
    and static-network-up 
    and starting cron 
    and starting mysql 
    and starting dovecot 
    and starting ssh
    and starting rc
)

What I have now:

start on (
    runlevel 2 
    and static-network-up 
    and starting cron 
    and starting dovecot 
    and starting ssh
)

I removed rc because of the hanging issue and I removed mysql because Postfix gets upset when it starts via the SysV script and Upstart hasn't yet started the MySQL instance it needs to resolve addresses. Once I find a safe way to key off of rc so I can delay the SysV scripts I'll add it back to the list to delay that daemon's launch as well.

I tried creating a wrapper around the ntpdate command (only run on runlevel 2), but didn't have any luck. I thought about creating a SysV script that only paused execution of the other scripts there, but didn't think it would have any effect.

I keep telling myself that this should be really simple.

Thanks in advance!

Just curious, how often are you rebooting to make this such a big problem - and why?

@vonskippy:

Just curious, how often are you rebooting to make this such a big problem - and why?

Mainly just when applying kernel, OpenSSL or similar patches. However I've had unexpected results because of the time jumps and I'm tired of it, so I'd like to go ahead and knock out a solid workaround and be done with it.

If you have any tips/suggestions I'm all ears. Thanks.

Alright, I'm finally at a point where I feel I have this "good enough" so I can move on to other things. I'm sharing my findings here in case they can help someone else that may be affected by this in the future. Particularly as Les mentioned (paraphrasing) all of their hosts will likely have this problem eventually.

First, I nuked the contents of the /etc/network/if-up.d/ntpdate file as we don't need ntpdate to be called twice, particularly within a short period. When left in place I noticed that the clock was adjusted by X amount twice, which sent the clock into the past 2x what was needed. Once I removed the contents of that file I replaced with:

#!/bin/sh

# Comments here describing why the file was gutted

exit 0

Second, I dropped this Upstart job file into the /etc/init/ directory as /etc/init/ntpdate.conf. The minified version:

description "Sync clock prior to launching time-sensitive services"

task

start on (
    startup
    and static-network-up 
)

script
    /usr/sbin/ntpdate -b -s ntp.ubuntu.com
end script

what I'm using:

# $Id$
# $HeadURL$

description "Sync clock prior to launching time-sensitive services"

################################################################################
# References:
#
#   http://upstart.ubuntu.com/cookbook/#run-a-job-before-another-job
#   http://upstart.ubuntu.com/cookbook/#start-depends-on-another-service
#   http://upstart.ubuntu.com/cookbook/#start-must-precede-another-service
#   http://upstart.ubuntu.com/cookbook/#really-understanding-start-on-and-stop-on
#   http://upstart.ubuntu.com/cookbook/#method-2
#   http://upstart.ubuntu.com/cookbook/#task
#   http://askubuntu.com/questions/21378/how-can-i-make-sure-one-upstart-job-starts-before-other-upstart-jobs/22110#22110
#   http://superuser.com/questions/502536/how-to-depend-on-an-upstart-job-from-an-init-script-on-ubuntu-12-04
#
# Set a VMware Player VM to NOT sync time between host/guest so I could set
# guest's clock to a false value so time syncing would be forced
#
#   http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1189
#
#   http://askubuntu.com/questions/62812/why-isnt-my-upstart-service-starting-on-system-boot
#   http://askubuntu.com/a/71426
################################################################################

# Execute this Upstart job at startup and once all network interfaces listed in 
# /etc/network/interfaces have been brought online.

task

start on (
    startup
    and static-network-up 
)

# Debug output to prove that this Upstart job is executing properly
pre-start script
  logger -p daemon.info -t "ntpdate.conf" "pre-start for ntpdate.conf"
end script

post-start script
  logger -p daemon.info -t "ntpdate.conf" "post-start for ntpdate.conf"
end script

script
    logger -p daemon.info -t "ntpdate.conf" "About to execute ntpdate"

    # Use hard-coded NTP server instead of parsing /etc/ntp.conf
    /usr/sbin/ntpdate -b -s ntp.ubuntu.com

    # Parse NTP config file to sync against multiple servers. Borrowed from 
    # /etc/network/if-up.d/ntpdate
    #/usr/sbin/ntpdate-debian -s -b 2>/dev/null || :

    logger -p daemon.info -t "ntpdate.conf" "Just finished executing ntpdate"
end script

On my system this still leaves these four daemons starting with the wrong time, but unless they begin to show abnormal behavior I'm just going to leave things well enough alone (too much else to worry about):

  • postgrey

  • opendkim

  • mysql

  • slapd

I hope this helps someone.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct