RFT: Kernels 2.6.29-linode18 and 2.6.29-x86_64-linode6

Linode Staff

Kernels 2.6.29-linode18 and 2.6.29-x86_64-linode6 have been pushed out. I'd appreciate another round of testing. I'm also curious if time drifts on these kernels and if I should ping the devs about it.

Thanks!

-Chris

14 Replies

I can confirm that it boots :)

Time drift looks to be still present with 2.6.29-x86_64-linode6. On boot time was dead-on, but I'm seeing about 130ms/hr linear drift relative to NIST.

$ grep ^server /etc/ntp.conf
server time-b.nist.gov minpoll 8 noselect
$ uptime
 19:10:50 up  3:09,  1 user,  load average: 0.00, 0.00, 0.00
$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 time-b.nist.gov .ACTS.           1 u   51  256  273   19.989  525.194  39.836
$ cat /var/lib/ntp/ntp.drift
0.000

Got the following repeated five times in my logs. This occurred within the first few seconds of a gentoo emerge –update. The emerge seems to have finished successfully, so it apparently wasn't a terminal error.

Apr  3 12:47:33 flup swapper: page allocation failure. order:0, mode:0x20
Apr  3 12:47:33 flup Pid: 0, comm: swapper Not tainted 2.6.29-x86_64-linode6 #1
Apr  3 12:47:33 flup Call Trace:
Apr  3 12:47:33 flup <irq>[<ffffffff802815c9>] __alloc_pages_internal+0x389/0x480
Apr  3 12:47:33 flup [<ffffffff802436e7>] local_bh_enable_ip+0x77/0xb0
Apr  3 12:47:33 flup [<ffffffff802a3449>] cache_alloc_refill+0x2f9/0x5a0
Apr  3 12:47:33 flup [<ffffffff802a37f7>] __kmalloc+0x107/0x110
Apr  3 12:47:33 flup [<ffffffff805d24c8>] __alloc_skb+0x68/0x140
Apr  3 12:47:33 flup [<ffffffff80642677>] tcp_send_ack+0x27/0xe0
Apr  3 12:47:33 flup [<ffffffff80640fb2>] tcp_rcv_established+0x162/0x600
Apr  3 12:47:33 flup [<ffffffff80646e8c>] tcp_v4_do_rcv+0xdc/0x230
Apr  3 12:47:33 flup [<ffffffff80244067>] local_bh_enable+0x77/0xb0
Apr  3 12:47:33 flup [<ffffffff80648ead>] tcp_v4_rcv+0x71d/0x810
Apr  3 12:47:33 flup [<ffffffff8062a984>] ip_local_deliver_finish+0xc4/0x240
Apr  3 12:47:33 flup [<ffffffff8062a669>] ip_rcv_finish+0x129/0x380
Apr  3 12:47:33 flup [<ffffffff8062ae39>] ip_rcv+0x219/0x300
Apr  3 12:47:33 flup [<ffffffff80589a41>] xennet_poll+0xb01/0xcb0
Apr  3 12:47:33 flup [<ffffffff80717ca6>] error_exit+0x36/0x70
Apr  3 12:47:33 flup [<ffffffff807176ee>] _spin_lock_irqsave+0x2e/0x40
Apr  3 12:47:33 flup [<ffffffff805dba88>] net_rx_action+0x78/0x140
Apr  3 12:47:33 flup [<ffffffff80243ae2>] __do_softirq+0x92/0x160
Apr  3 12:47:33 flup [<ffffffff8021279c>] call_softirq+0x1c/0x30
Apr  3 12:47:33 flup [<ffffffff80214239>] do_softirq+0x49/0x90
Apr  3 12:47:33 flup [<ffffffff80243a45>] irq_exit+0x85/0x90
Apr  3 12:47:33 flup [<ffffffff80214303>] do_IRQ+0x83/0x110
Apr  3 12:47:33 flup [<ffffffff80278fec>] __rcu_process_callbacks+0x15c/0x270
Apr  3 12:47:33 flup [<ffffffff805520e5>] xen_evtchn_do_upcall+0xf5/0x140
Apr  3 12:47:33 flup [<ffffffff802127ee>] xen_do_hypervisor_callback+0x1e/0x30
Apr  3 12:47:33 flup <eoi>[<ffffffff802093aa>] _stext+0x3aa/0x1000
Apr  3 12:47:33 flup [<ffffffff802093aa>] _stext+0x3aa/0x1000
Apr  3 12:47:33 flup [<ffffffff8020e400>] xen_vcpuop_set_next_event+0x0/0x70
Apr  3 12:47:33 flup [<ffffffff8020ddcc>] xen_safe_halt+0xc/0x20
Apr  3 12:47:33 flup [<ffffffff8020c1a6>] xen_idle+0x26/0x50
Apr  3 12:47:33 flup [<ffffffff8020ff49>] cpu_idle+0x49/0x90
Apr  3 12:47:33 flup Mem-Info:
Apr  3 12:47:33 flup DMA per-cpu:
Apr  3 12:47:33 flup CPU    0: hi:    0, btch:   1 usd:   0
Apr  3 12:47:33 flup CPU    1: hi:    0, btch:   1 usd:   0
Apr  3 12:47:33 flup CPU    2: hi:    0, btch:   1 usd:   0
Apr  3 12:47:33 flup CPU    3: hi:    0, btch:   1 usd:   0
Apr  3 12:47:33 flup DMA32 per-cpu:
Apr  3 12:47:33 flup CPU    0: hi:   90, btch:  15 usd:  76
Apr  3 12:47:33 flup CPU    1: hi:   90, btch:  15 usd:  35
Apr  3 12:47:33 flup CPU    2: hi:   90, btch:  15 usd:  85
Apr  3 12:47:33 flup CPU    3: hi:   90, btch:  15 usd:  13
Apr  3 12:47:33 flup Active_anon:8639 active_file:20588 inactive_anon:12730
Apr  3 12:47:33 flup inactive_file:19196 unevictable:0 dirty:3183 writeback:0 unstable:0
Apr  3 12:47:33 flup free:554 slab:23674 mapped:2095 pagetables:541 bounce:0
Apr  3 12:47:33 flup DMA free:1360kB min:36kB low:44kB high:52kB active_anon:0kB inactive_anon:28kB active_file:2352kB inactive_file:8kB unevictable:0kB present:5788kB pages_scanned:0 all_unreclaimable? no
Apr  3 12:47:33 flup lowmem_reserve[]: 0 339 339 339
Apr  3 12:47:33 flup DMA32 free:856kB min:2336kB low:2920kB high:3504kB active_anon:34556kB inactive_anon:50892kB active_file:80000kB inactive_file:76776kB unevictable:0kB present:347440kB pages_scanned:32 all_unreclaimable? no
Apr  3 12:47:33 flup lowmem_reserve[]: 0 0 0 0
Apr  3 12:47:33 flup DMA: 2*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1384kB
Apr  3 12:47:33 flup DMA32: 1*4kB 1*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 844kB
Apr  3 12:47:33 flup 39937 total pagecache pages
Apr  3 12:47:33 flup 139 pages in swap cache
Apr  3 12:47:33 flup Swap cache stats: add 251, delete 112, find 5043/5057
Apr  3 12:47:33 flup Free swap  = 736616kB
Apr  3 12:47:33 flup Total swap = 737272kB
Apr  3 12:47:33 flup 92160 pages RAM
Apr  3 12:47:33 flup 4086 pages reserved
Apr  3 12:47:33 flup 38289 pages shared
Apr  3 12:47:33 flup 54125 pages non-shared</ffffffff8020ff49></ffffffff8020c1a6></ffffffff8020ddcc></ffffffff8020e400></ffffffff802093aa></ffffffff802093aa></eoi></ffffffff802127ee></ffffffff805520e5></ffffffff80278fec></ffffffff80214303></ffffffff80243a45></ffffffff80214239></ffffffff8021279c></ffffffff80243ae2></ffffffff805dba88></ffffffff807176ee></ffffffff80717ca6></ffffffff80589a41></ffffffff8062ae39></ffffffff8062a669></ffffffff8062a984></ffffffff80648ead></ffffffff80244067></ffffffff80646e8c></ffffffff80640fb2></ffffffff80642677></ffffffff805d24c8></ffffffff802a37f7></ffffffff802a3449></ffffffff802436e7></ffffffff802815c9></irq> 

@Stever:

Apr 3 12:47:33 flup [] _allocpages_internal+0x389/0x480
Originated here, in the source. The stack says that Xen handled an IRQ (networking), the IP stack got it, began handling TCP, TCP intended to reply, attempted to allocate some memory to do so, and somehow you ended up at a label in the Linux source code that says "nopage". Reading the source code (I'm hazy on Linux) leads me to believe that this may be due to an OOM situation, which is odd because you have a large amount of swap available. Also in that function:
> /* The OOM killer will not help higher order allocs so fail */
It appears to circumvent the OOM killer, so the kernel may not have reported to you that you were OOM other than these log entries.
@Stever:

Apr 3 12:47:33 flup Free swap = 736616kB

Apr 3 12:47:33 flup Total swap = 737272kB

Apr 3 12:47:33 flup 92160 pages RAM

Apr 3 12:47:33 flup 4086 pages reserved

Apr 3 12:47:33 flup 38289 pages shared

Apr 3 12:47:33 flup 54125 pages non-shared
That's more pages allocated than are in RAM, so swap was just starting to get tickled (that is reflected there). I/O ops, maybe? Do earlier messages of this nature have a higher number for free swap? All of these are educated guesses, just throwing 'em out there and seeing what sticks.

The message I posted is repeated five times, exactly the same each time. However, it looks like the log only caught the tail end of these since there is a partial message at the beginning of the event.

No other signs of OOM - this is just my testing linode with only sshd and fail2ban running, and as far as I can tell everything kept running. The emerge –update command that triggered these log entires completed without errors, and I have been compiling on the node for the last couple hours without any recurrence.

FWIW:

I just migrated to Xen on dallas153 a few days ago. I've been using 2.6.29-linode18 for 27.5 hours, and my clock drift is 30 PPM, and dropping. That's not so bad, is it? My home computer is a lot worse.

30ppm isn't bad (about 2.5s/day), but the old kernels had NO drift because they stayed in sync with the host which was presumably running ntp.

So to keep good time on the newer kernels, you need to run your own ntpd.

FWIW, since the original issue I reported I've had this kernel running on my test machine for 22 days with no problems.

I've mentioned this in IRC, but I haven't experienced any time drift in my running kernel. Yesterday I set up time.jedsmith.org in the NTP pool, and it's a stratum 2 with minimal offset:

~~![](<URL url=)http://hosted.jedsmith.org/forum_hotlin … 292009.png">http://hosted.jedsmith.org/forumhotlinks/ntpgraph_04292009.png" />

Note that from the first moment I joined the pool and ran ntpd (I started them both at the same time), I was only about 6ms away from "the real time". I'm not sure what I'm doing differently, but my kernel isn't drifting at all.

jed@frodo:~$ uname -a
Linux frodo 2.6.29-linode18 #1 SMP Wed Apr 1 20:22:21 EDT 2009 i686 GNU/Linux
jed@frodo:~$ uptime
 20:34:51 up 13 days, 12:49,  2 users,  load average: 0.00, 0.00, 0.00
jed@frodo:~$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+clock.fmt.he.ne .PPS.            1 u   51   64  377    1.443    0.452   0.290
+clock.sjc.he.ne .CDMA.           1 u   18   64  377    1.838    0.219   0.033
xclock.isc.org   .GPS.            1 u   45   64  377    2.093   -6.272   0.086
*clock.via.net   .GPS.            1 u    5   64  377    0.976    0.188   0.026
-nist1-sj.witime .ACTS.           1 u    2   64  377    2.125    2.334   0.070

~~

@jed:

I'm not sure what I'm doing differently, but my kernel isn't drifting at all.
What do you have in your ntp.drift file? That should show the drift in ppm that ntpd is correcting for you. For example, on my linore running 2.6.29-x86_64-linode6 I get:````
$ cat /var/lib/ntp/ntp.drift
44.004

````
Unless yours is zero, then your kernel is drifting and ntpd is compensating.

The first thing ntpd will do if you are more than about 120ms off at startup is to perform a step correction. I don't think it will serve time to others until it considers itself locked, so an external monitoring system would probably not see the initial offset unless it was very small.

@Stever:

What do you have in your ntp.drift file?
18.429. Not terrible for a virtualized kernel, but I see your point.

I was merely noticing that other users had drifts on the order of several minutes and mine was not showing symptoms of that with this kernel.

@jed:

I was merely noticing that other users had drifts on the order of several minutes and mine was not showing symptoms of that with this kernel.
Yeah - I've never seen more than a few seonds per day of drift, but without ntpd running it always seems to be cumulative. After a few weeks of uptime you start to notice the time being off by minutes.

I don't think these drifts are any worse than I get on real hardware systems - it is just that before pv_ops we didn't have to worry about it at all. I suppose if I think of it that way, this is making a linode behave more like real hardware ;)

I have a linode 360 with jaunty 9.04, but when I change the kernel in the configuration and reboot the linode doesn't come up. De lish bootlog is empty. I've tried this one and 2.6.28.

@jzuijlek:

I have a linode 360 with jaunty 9.04, but when I change the kernel in the configuration and reboot the linode doesn't come up. De lish bootlog is empty. I've tried this one and 2.6.28.

Me too, except I'm running Gentoo.

I'm having no problems with mine running gentoo on 2.6.29-x86_64-linode6, and I just rebooted into the same kernel with no problem.

FWIW, 2.6.29-linode18 froze hard on me and required a hard reboot :-(

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct