High Load (possibly IO wait)
Is there something else I should be looking for on my server? I'm at a loss at this point for other things that I can do. Is anyone else having similar problems on Atlanta36?
23 Replies
$ vmstat 5
and see whether it is io or swap activity that is loading you down.
To be honest, if you are actively swapping you probably need to cut down your memory usage or upgrade to a bigger linode.
Linode dashbord graph shows it as flat but munin caught it.
Edit: The load average is now below 1.00. Whatever the problem was, it seems to be going away. But I don't like things like this, anyway!
Can somebody among the staff look into the issue, please? Now you have a list of hosts (newark62, atlanta12, atlanta36, fremont59) to look into, and a corresponding list of times at which the I/O problems seem to have occurred. What was going on on these hosts?
Operations that require even a moderate amount of CPU + IO are feeling very sluggish. All MySQL queries are slower than usual, even though my DB is sitting idle at this time. Public key based SSH logins take several seconds longer than usual. Apt-get update takes forever. Rsync takes ~100 times longer than usual to generate a file list.
My linode usually has a load average of 0.1 or lower. Right now my load average is 0.8, and a reboot doesn't do anything to solve the problem. I waited a few minutes after the reboot to check my load average, so that all those daemons starting up after reboot doesn't affect my short term load average. No noticeable drop. Oh, and it took several times longer than usual for my daemons to start up after the reboot.
I've been monitoring this using top, iotop, vmstat, and munin. I'm not using any more CPU or IO than usual – CPU average 4%, IO average 200 according to the Dashboard. I'm not doing anything unusual with my linode, nor does anyone else seem to be doing something nasty with my account. Nonetheless, many of my processes are waiting for IO to complete. As if I were trying to do some work while a degraded RAID array was being rebuilt!
Which makes me suspect……….
Is somebody fsck'ing a disk on fremont59 (and other nodes mentioned above) ?
Doesn't sound like the problem's on your linode. I'd open a ticket about it.
Already did, three hours ago. I'll post in this thread if I get an answer that might be helpful to others as well. Or better yet, a staff member should let us know what's going on.
Might be kernel related. I was running the new 2.6.27.4 kernel when I came across the anomalies described above. I switched back to "Latest 2.6 series" twelve hours ago, and the problem has disappeared since then. (I've been monitoring my load averages at 10 minute intervals throughout the night.)
Caker also says that Linux kernels newer than 2.6.20 may have obscure IO issues, so I for one is willing to attribute my problems to my premature adoption of the bleeding edge kernel.
But what about others? @patrickpkt, oliver, astrashe3 : Which kernel version have you been running?
I'm running 2.6.18.8-linode16. I didn't run iotop because it complained that my kernel was too old.
I reinstalled my linode's OS yesterday with a 32 bit ubuntu 8.10 -- I had the problem before and after the reinstall.
I don't know about everyone else, but mine is running pretty well now. I don't know what was going on, but I haven't changed anything to fix it. It got better on its own.
3 weeks ago this was less than 1K and I'm not doing anything different on my server it just seems to use up swap space more than normal!
CPU usage is currently at 10% (5% average for the last 30 days).
@richard.scott:
I'm on newark61 and over the last 30 days my IO has averaged 10K by looking at the graphs in my control panel.
3 weeks ago this was less than 1K and I'm not doing anything different on my server it just seems to use up swap space more than normal!
CPU usage is currently at 10% (5% average for the last 30 days).
If you're regularly using swap, you have a problem. It also hurts the performance of whatever you have on the server, so find a convenient time to reboot your server and see if you can control your swap usage.
@hybinet:
If you're regularly using swap, you have a problem.
I am using swap space, but I don't know why!
I used to run my email server on a co-located Mini-itx VIA PD10000 1GHz Motherboard with only 512MB of ram and that never ever touched swap space.
I would have thought that a linode was quicker than that?? But since moving that all to a Linode 540 its done nothing but swap!
…but, and here's the really annoying part….
i've only been swapping like crazy for the last 3 weeks and I've been using the server since the start of December as my mail email server!
I don't understand why the performacne has degraded so much in the past 3 weeks
Rich
A linode 540 obviously has more RAM than your old colo, and it's also many times more powerful. Also, it was fine until three weeks ago, right? Then try to find out what happened when the problems began. Did you change some configurations? Did you update your programs from your distro's repository? Did you suddenly get a lot of visitors to your web site or a lot of spam to your mail server? Perhaps someone hacked your server without your knowledge?
There are many diagnostic tools that can tell you which program is using the most CPU and RAM. top is the most simple of all; htop looks prettier; munin is a bit more sophisticated. You can get halfway towards a solution if you can pinpoint the bad guy.
I've been poking about and found that my grey-listing daemon on my mail server is configured to keep connections open to mysql.
It doesn't seem to re-use connections, but just keeps opening them up to the 100 limit set in gld.conf
Hopefully I've fixed it and my Linode can go back to being awesome!
using 'htop' i've found that clamd is using 311MB of ram! WTF
No wonder nothing else has room to work nicely without swapping all the time
If I change my "vm.swappiness" value from the default of 60 to 95 it seems to help?
I run this at boot time:
sysctl -q -w vm.swappiness=95
However, it totally fills my swap space up
top - 12:34:33 up 20 min, 1 user, load average: 0.24, 0.14, 0.10
Tasks: 104 total, 1 running, 103 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.0%us, 1.0%sy, 0.1%ni, 95.9%id, 1.9%wa, 0.0%hi, 0.0%si, 0.2%st
Mem: 553176k total, 537280k used, 15896k free, 2480k buffers
Swap: 262136k total, 218204k used, 43932k free, 297812k cached
But it seems to have reduced my disk IO
I'll keep an eye on it.
Rich
If none of the programs you're running are using abnormal amounts of memory, I would suggest just getting a bigger linode.
@btmorex:
Using up most of the swap space during normal system operation is probably not a great idea. If your load increases even slightly for whatever reason, your machine is going to quickly slow to a crawl.
I apprecieate that its not a good position to be in but its better than it was with regards to performance.
@btmorex:
Is there anything specific in the configuration of the Zen Host that would make this use swap space more often?
If none of the programs you're running are using abnormal amounts of memory, I would suggest just getting a bigger linode.
I've changed my vm.swappiness value to be 5 to test both ends of the scale and my swap space isn't used but my load average goes up to around 3!
Reverting vm.swappiness back to 95 results in a load average of 0.40!
Rich
do not edit your vm.swappiness value unless you have a good reason to do so