Higher load average after upgrade
I recently upgraded one of my Linodes from a Linode 512 to 768 due to some memory limitations. After booting it up again, I noticed that while idling the load is still higher than it should be.
It's currently supposed to be idle, but the load avg are: load average: 0.49, 0.62, 0.59
It's staying around that number, I've seen it peak at 0.90, with nothing going on that should cause it.
CPU looks idle, IO seems low and RAM is fine as well. Here's the summary from top:
top - 16:07:26 up 4:11, 1 user, load average: 0.60, 0.64, 0.60
Tasks: 106 total, 2 running, 104 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 768316k total, 709116k used, 59200k free, 32348k buffers
Swap: 524284k total, 180k used, 524104k free, 478828k cached
I tried to troubleshoot myself a bit, but I don't have enough experience to say whether this could be the cause or not. With nmon I looked at the kernel statistics and saw this:
Kernel Stats ---------------------------------------------------------------------------------------------------------------------------------------- |
| RunQueue 1 Load Average CPU use since boot time |
| ContextSwitch 1776.8 1 mins 0.88 Uptime Days= 0 Hours= 4 Mins=13 |
| Forks 3.0 5 mins 0.71 Idle Days= 1 Hours= 9 Mins=21 |
| Interrupts 1050.6 15 mins 0.62 Average CPU use=-687.96% |
I thought that maybe the interupts/contextswitched are what causes the high load? I don't know what it should be, but from what I saw on screenshots from other people, it wasn't this high when the system should be idle.
I looked at the interupts and see this:
Average: INTR intr/s
Average: 0 0.00
Average: 1 0.00
Average: 2 0.00
Average: 3 0.00
Average: 4 0.00
Average: 5 0.00
Average: 6 0.00
Average: 7 0.00
Average: 8 0.00
Average: 9 0.00
Average: 10 0.00
Average: 11 0.00
Average: 12 0.00
Average: 13 0.00
Average: 14 0.00
Average: 15 0.00
Average: 16 183.40
Average: 17 1.70
Average: 18 47.70
Average: 19 0.00
Average: 20 0.00
Average: 21 0.00
Average: 22 179.05
Average: 23 1.85
Average: 24 43.25
Average: 25 0.00
Average: 26 0.00
Average: 27 0.40
Average: 28 180.35
Average: 29 1.45
Average: 30 44.70
Average: 31 0.00
Average: 32 0.00
Average: 33 1.35
Average: 34 178.05
Average: 35 1.80
Average: 36 42.55
Average: 37 0.00
Average: 38 0.00
Average: 39 1.05
Average: 40 0.00
Average: 41 0.00
Average: 42 7.35
Average: 43 0.00
Average: 44 1.30
Average: 45 0.00
Average: 46 0.00
Average: 47 0.00
root@nuvini:~# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
16: 2972148 0 0 0 xen-percpu-virq timer0
17: 21870 0 0 0 xen-percpu-ipi spinlock0
18: 750503 0 0 0 xen-percpu-ipi resched0
19: 146 0 0 0 xen-percpu-ipi callfunc0
20: 0 0 0 0 xen-percpu-virq debug0
21: 8028 0 0 0 xen-percpu-ipi callfuncsingle0
22: 0 2915213 0 0 xen-percpu-virq timer1
23: 0 22061 0 0 xen-percpu-ipi spinlock1
24: 0 695840 0 0 xen-percpu-ipi resched1
25: 0 92 0 0 xen-percpu-ipi callfunc1
26: 0 0 0 0 xen-percpu-virq debug1
27: 0 8167 0 0 xen-percpu-ipi callfuncsingle1
28: 0 0 2824373 0 xen-percpu-virq timer2
29: 0 0 21485 0 xen-percpu-ipi spinlock2
30: 0 0 692198 0 xen-percpu-ipi resched2
31: 0 0 225 0 xen-percpu-ipi callfunc2
32: 0 0 0 0 xen-percpu-virq debug2
33: 0 0 8251 0 xen-percpu-ipi callfuncsingle2
34: 0 0 0 2895170 xen-percpu-virq timer3
35: 0 0 0 20237 xen-percpu-ipi spinlock3
36: 0 0 0 678003 xen-percpu-ipi resched3
37: 0 0 0 202 xen-percpu-ipi callfunc3
38: 0 0 0 0 xen-percpu-virq debug3
39: 0 0 0 8396 xen-percpu-ipi callfuncsingle3
40: 433 0 0 0 xen-dyn-event xenbus
41: 405 0 0 0 xen-dyn-event hvc_console
42: 66505 0 0 0 xen-dyn-event blkif
43: 67 0 0 0 xen-dyn-event blkif
44: 90115 0 0 0 xen-dyn-event eth0
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 0 0 0 0 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 Performance monitoring interrupts
IWI: 0 0 0 0 IRQ work interrupts
RTR: 0 0 0 0 APIC ICR read retries
RES: 750503 695840 692198 678003 Rescheduling interrupts
CAL: 8174 8259 8476 8598 Function call interrupts
TLB: 0 0 0 0 TLB shootdowns
ERR: 0
MIS: 0
As for contextswitches, I have no idea on how to look up what's causing the high number. Or even -if- these things can be the cause for the high load.
EDIT: I tried the following, but it doesn't seem to show what's causing it. In this list there's nothing that's very high.
root@nuvini:~# pidstat -w
Linux 3.4.2-linode44 (nuvini) 07/28/2012 _i686_ (4 CPU)
04:17:56 PM PID cswch/s nvcswch/s Command
04:17:56 PM 1 0.33 0.00 init
04:17:56 PM 2 0.01 0.00 kthreadd
04:17:56 PM 3 0.37 0.00 ksoftirqd/0
04:17:56 PM 4 1.82 0.00 kworker/0:0
04:17:56 PM 5 0.11 0.00 kworker/u:0
04:17:56 PM 6 0.43 0.00 migration/0
04:17:56 PM 7 0.44 0.00 migration/1
04:17:56 PM 8 0.00 0.00 kworker/1:0
04:17:56 PM 9 0.34 0.00 ksoftirqd/1
04:17:56 PM 10 0.45 0.00 migration/2
04:17:56 PM 11 0.00 0.00 kworker/2:0
04:17:56 PM 12 0.34 0.00 ksoftirqd/2
04:17:56 PM 13 0.45 0.00 migration/3
04:17:56 PM 14 0.00 0.00 kworker/3:0
04:17:56 PM 15 0.38 0.00 ksoftirqd/3
04:17:56 PM 16 0.00 0.00 cpuset
04:17:56 PM 17 0.00 0.00 khelper
04:17:56 PM 18 0.04 0.00 kdevtmpfs
04:17:56 PM 19 0.00 0.00 kworker/u:1
04:17:56 PM 21 0.01 0.00 xenwatch
04:17:56 PM 22 0.03 0.00 xenbus
04:17:56 PM 155 0.17 0.00 sync_supers
04:17:56 PM 157 0.00 0.00 bdi-default
04:17:56 PM 159 0.00 0.00 kblockd
04:17:56 PM 168 0.96 0.00 kworker/1:1
04:17:56 PM 171 0.00 0.00 md
04:17:56 PM 255 0.00 0.00 rpciod
04:17:56 PM 257 1.40 0.00 kworker/0:1
04:17:56 PM 268 0.02 0.00 kswapd0
04:17:56 PM 269 0.00 0.00 ksmd
04:17:56 PM 270 0.00 0.00 fsnotify_mark
04:17:56 PM 274 0.00 0.00 ecryptfs-kthrea
04:17:56 PM 276 0.00 0.00 nfsiod
04:17:56 PM 277 0.00 0.00 cifsiod
04:17:56 PM 280 0.00 0.00 jfsIO
04:17:56 PM 281 0.00 0.00 jfsCommit
04:17:56 PM 282 0.00 0.00 jfsCommit
04:17:56 PM 283 0.00 0.00 jfsCommit
04:17:56 PM 284 0.00 0.00 jfsCommit
04:17:56 PM 285 0.00 0.00 jfsSync
04:17:56 PM 286 0.00 0.00 xfsalloc
04:17:56 PM 287 0.00 0.00 xfs_mru_cache
04:17:56 PM 288 0.00 0.00 xfslogd
04:17:56 PM 289 0.00 0.00 glock_workqueue
04:17:56 PM 290 0.00 0.00 delete_workqueu
04:17:56 PM 291 0.00 0.00 gfs_recovery
04:17:56 PM 292 0.00 0.00 crypto
04:17:56 PM 540 0.20 0.00 php5-fpm
04:17:56 PM 541 0.01 0.00 php5-fpm
04:17:56 PM 542 0.01 0.00 php5-fpm
04:17:56 PM 543 0.02 0.00 php5-fpm
04:17:56 PM 544 0.00 0.00 php5-fpm
04:17:56 PM 545 0.00 0.00 php5-fpm
04:17:56 PM 546 0.00 0.00 php5-fpm
04:17:56 PM 547 0.00 0.00 php5-fpm
04:17:56 PM 855 0.00 0.00 khvcd
04:17:56 PM 971 0.00 0.00 kpsmoused
04:17:56 PM 972 0.87 0.00 kworker/2:1
04:17:56 PM 1009 0.00 0.00 deferwq
04:17:56 PM 1011 0.89 0.00 kworker/3:1
04:17:56 PM 1013 0.37 0.00 kjournald
04:17:56 PM 1048 0.00 0.01 udevd
04:17:56 PM 1142 0.00 0.00 udevd
04:17:56 PM 1143 0.00 0.00 udevd
04:17:56 PM 1381 0.00 0.00 portmap
04:17:56 PM 1393 0.00 0.00 rpc.statd
04:17:56 PM 1618 0.00 0.00 rsyslogd
04:17:56 PM 1627 0.16 0.01 monit
04:17:56 PM 1687 0.01 0.00 mysqld_safe
04:17:56 PM 1831 0.11 0.00 mysqld
04:17:56 PM 1832 0.00 0.00 logger
04:17:56 PM 1887 1.00 0.01 apache2
04:17:56 PM 1891 0.00 0.00 apache2
04:17:56 PM 1892 0.00 0.00 apache2
04:17:56 PM 1898 0.33 0.00 apache2
04:17:56 PM 1900 0.00 0.00 atd
04:17:56 PM 1939 0.02 0.00 cron
04:17:56 PM 1948 0.20 0.00 flush-202:0
04:17:56 PM 1949 0.00 0.00 apache2
04:17:56 PM 1950 0.00 0.00 apache2
04:17:56 PM 2102 0.00 0.00 dhclient
04:17:56 PM 2158 0.03 0.00 sshd
04:17:56 PM 2166 0.09 0.00 master
04:17:56 PM 2169 0.01 0.00 qmgr
04:17:56 PM 2193 0.00 0.00 tlsmgr
04:17:56 PM 2550 1.02 0.00 ntpd
04:17:56 PM 2553 1.00 0.00 memcached
04:17:56 PM 2727 0.05 0.00 fail2ban-server
04:17:56 PM 2821 0.22 0.04 lfd
04:17:56 PM 2842 0.00 0.00 lfd
04:17:56 PM 3498 0.00 0.00 mingetty
04:17:56 PM 3655 0.00 0.00 sshd
04:17:56 PM 3657 0.42 0.01 sshd
04:17:56 PM 3660 0.01 0.00 bash
04:17:56 PM 3703 0.00 0.00 su
04:17:56 PM 3704 0.05 0.01 bash
04:17:56 PM 5250 0.00 0.00 sshd
04:17:56 PM 5252 0.01 0.00 sshd
04:17:56 PM 5253 0.00 0.00 sshd
04:17:56 PM 5710 0.00 0.00 teamspeak
04:17:56 PM 5711 1.47 0.14 ts3server_start
04:17:56 PM 5731 0.01 0.00 ts3server_linux
04:17:56 PM 11700 0.00 0.00 sleep
04:17:56 PM 11701 0.00 0.00 pidstat
04:17:56 PM 28375 0.01 0.00 smtpd
04:17:56 PM 29738 0.00 0.00 pickup
root@nuvini:~# sar -w 1 3
Linux 3.4.2-linode44 (nuvini) 07/28/2012 _i686_ (4 CPU)
04:18:19 PM proc/s cswch/s
04:18:20 PM 3.00 1899.00
04:18:21 PM 2.00 2031.00
04:18:22 PM 1.00 1959.00
Average: 2.00 1963.00
The server itself does seem to be running fine - but I would like to have a realistic value of the load, and this only started after the upgrade. It should be idle now so it shouldn't show a load like that.
Hopefully anyone has any ideas on how to get it back to how it should be. Thanks
6 Replies
James
Linode offered to migrate me to another host. The problem wasn't fixed with that however, so it's probably something else.
I shut down pretty much every service/process that was non-essential. Load average still stays off. Nothing very high, but it should be lower.
I also tried to verify if I have the same problem CPU-graph wise, however my CPU graphs top off at 400, not 200.
Is there anything I can do to see where this is coming from?
Here's a process-list:
root@nuvini:~# uptime
00:16:29 up 28 min, 2 users, load average: 0.27, 0.19, 0.15
root@nuvini:~# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Jul28 ? 00:00:00 init [2]
root 2 0 0 Jul28 ? 00:00:00 [kthreadd]
root 3 2 0 Jul28 ? 00:00:00 [ksoftirqd/0]
root 4 2 0 Jul28 ? 00:00:00 [kworker/0:0]
root 5 2 0 Jul28 ? 00:00:00 [kworker/u:0]
root 6 2 0 Jul28 ? 00:00:00 [migration/0]
root 7 2 0 Jul28 ? 00:00:00 [migration/1]
root 8 2 0 Jul28 ? 00:00:00 [kworker/1:0]
root 9 2 0 Jul28 ? 00:00:00 [ksoftirqd/1]
root 10 2 0 Jul28 ? 00:00:00 [migration/2]
root 11 2 0 Jul28 ? 00:00:00 [kworker/2:0]
root 12 2 0 Jul28 ? 00:00:00 [ksoftirqd/2]
root 13 2 0 Jul28 ? 00:00:00 [migration/3]
root 14 2 0 Jul28 ? 00:00:00 [kworker/3:0]
root 15 2 0 Jul28 ? 00:00:00 [ksoftirqd/3]
root 16 2 0 Jul28 ? 00:00:00 [cpuset]
root 17 2 0 Jul28 ? 00:00:00 [khelper]
root 18 2 0 Jul28 ? 00:00:00 [kdevtmpfs]
root 19 2 0 Jul28 ? 00:00:00 [kworker/u:1]
root 21 2 0 Jul28 ? 00:00:00 [xenwatch]
root 22 2 0 Jul28 ? 00:00:00 [xenbus]
root 155 2 0 Jul28 ? 00:00:00 [sync_supers]
root 157 2 0 Jul28 ? 00:00:00 [bdi-default]
root 159 2 0 Jul28 ? 00:00:00 [kblockd]
root 168 2 0 Jul28 ? 00:00:00 [kworker/3:1]
root 171 2 0 Jul28 ? 00:00:00 [md]
root 255 2 0 Jul28 ? 00:00:00 [rpciod]
root 257 2 0 Jul28 ? 00:00:00 [kworker/0:1]
root 268 2 0 Jul28 ? 00:00:00 [kswapd0]
root 269 2 0 Jul28 ? 00:00:00 [ksmd]
root 270 2 0 Jul28 ? 00:00:00 [fsnotify_mark]
root 274 2 0 Jul28 ? 00:00:00 [ecryptfs-kthrea]
root 276 2 0 Jul28 ? 00:00:00 [nfsiod]
root 277 2 0 Jul28 ? 00:00:00 [cifsiod]
root 280 2 0 Jul28 ? 00:00:00 [jfsIO]
root 281 2 0 Jul28 ? 00:00:00 [jfsCommit]
root 282 2 0 Jul28 ? 00:00:00 [jfsCommit]
root 283 2 0 Jul28 ? 00:00:00 [jfsCommit]
root 284 2 0 Jul28 ? 00:00:00 [jfsCommit]
root 285 2 0 Jul28 ? 00:00:00 [jfsSync]
root 286 2 0 Jul28 ? 00:00:00 [xfsalloc]
root 287 2 0 Jul28 ? 00:00:00 [xfs_mru_cache]
root 288 2 0 Jul28 ? 00:00:00 [xfslogd]
root 289 2 0 Jul28 ? 00:00:00 [glock_workqueue]
root 290 2 0 Jul28 ? 00:00:00 [delete_workqueu]
root 291 2 0 Jul28 ? 00:00:00 [gfs_recovery]
root 292 2 0 Jul28 ? 00:00:00 [crypto]
root 855 2 0 Jul28 ? 00:00:00 [khvcd]
root 969 2 0 Jul28 ? 00:00:00 [kpsmoused]
root 1006 2 0 Jul28 ? 00:00:00 [kworker/1:1]
root 1009 2 0 Jul28 ? 00:00:00 [deferwq]
root 1012 2 0 Jul28 ? 00:00:00 [kjournald]
root 1047 1 0 Jul28 ? 00:00:00 udevd --daemon
root 1091 2 0 Jul28 ? 00:00:00 [kworker/2:1]
root 1113 1047 0 Jul28 ? 00:00:00 udevd --daemon
root 1114 1047 0 Jul28 ? 00:00:00 udevd --daemon
root 1647 1 0 Jul28 ? 00:00:00 dhclient -v -pf /var/run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
root 1688 1 0 Jul28 ? 00:00:00 /usr/sbin/sshd
root 1926 2 0 Jul28 ? 00:00:00 [flush-202:0]
root 1971 1 0 Jul28 ? 00:00:00 /usr/sbin/cron
root 3545 1 0 Jul28 hvc0 00:00:00 /sbin/mingetty hvc0
root 3655 1688 0 Jul28 ? 00:00:00 sshd: niels [priv]
niels 3657 3655 0 Jul28 ? 00:00:00 sshd: niels@pts/0
niels 3658 3657 0 Jul28 pts/0 00:00:00 -bash
root 3687 1688 0 Jul28 ? 00:00:00 sshd: niels [priv]
niels 3689 3687 0 Jul28 ? 00:00:00 sshd: niels@pts/1
niels 3690 3689 0 Jul28 pts/1 00:00:00 -bash
root 3719 3690 0 Jul28 pts/1 00:00:00 su
root 3720 3719 0 Jul28 pts/1 00:00:00 bash
niels 3984 3658 0 Jul28 pts/0 00:00:04 htop
root 4250 3720 0 00:16 pts/1 00:00:00 ps -ef
@Azathoth:
I see the same thing since Jul 1st reboot (fixing leap second problems). Average load is around 1, previously was less than 0.5. I also see "steal" in Munin CPU graphs which I previously didn't. I attribute that to the new kernel. 3.4.2 (linode25), and before that it was 3.0.18 (linode24). CentOS 6.3 (currently, but it was the same with 6.2).
Thank you! Seems like the upgrade pushed me to the latest kernel, I was on an older 3.x(something) kernel before. It looks like it's a bug in 3.4.2. When I went back to the 3.0.18 it looked normal again. Awesome.
I'll switch back to the latest kernel when an update comes to fix that load avg calculations.
Thanks again.