Linode getting stuck on iowait
The system is unresponsive, but one of the things I have on it responds to a connection stating that the connection is refused due to high load (33 this time)
I was on the box during one of the occurrences, I ran top, and nothing was using any CPU, it was all in iowait. I left a user logged into the console, and when it just happened now, I couldn't even get w to run to see the current load.
I have apache on this box, but it's very lightly used. It wasn't tuned before, but I went ahead and tuned it just in case.
I'm at a loss trying to troubleshoot this one. Ideas for what I should put in to help trace this?
6 Replies
@glg:
I was on the box during one of the occurrences, I ran top, and nothing was using any CPU, it was all in iowait. I left a user logged into the console, and when it just happened now, I couldn't even get w to run to see the current load.
Sounds like you could have been heavily swapping - did you save the top output? If not, the next time it occurs I'd look more towards memory usage than cpu.
An untuned Apache configuration could certainly in theory cause this - even if the box is usually unloaded, a brief spike in traffic that was enough to push your box into swapping due to Apache processes might take a while to clear.
I suppose alternatively it could be that other guests on your host are getting into periods of heavy disk use which in turn is blocking your Linode, but that shouldn't have too much impact if your Linode is lightly loaded unless you're still trying to do a decent amount of I/O yourself.
– David
@db3l:
Sounds like you could have been heavily swapping - did you save the top output? If not, the next time it occurs I'd look more towards memory usage than cpu.
yeah, possibly an OOM situation.
@db3l:
An untuned Apache configuration could certainly in theory cause this - even if the box is usually unloaded, a brief spike in traffic that was enough to push your box into swapping due to Apache processes might take a while to clear.
That can be ruled out though, as I did tune apache on Monday and it's happened again.
@db3l:
I suppose alternatively it could be that other guests on your host are getting into periods of heavy disk use which in turn is blocking your Linode, but that shouldn't have too much impact if your Linode is lightly loaded unless you're still trying to do a decent amount of I/O yourself.
It could be users. I guess I'm looking for suggestions of something I can look at now or something install that would capture some information later.
I installed munin, but it's not showing anything abnormal other than a gap right when it happened.
Sorry, I did forget to mention one thing. I did upgrade this server from ubuntu 9.10 to 10.04 on 10/22. First occurrence of this lockup was 10/30.
Try running iotop (apt-get install iotop)
Also what kernel are you running? (uname -a)
@obs:
Can you tell us what else is on the box? i.e. databases? wordpress?etc.
inn2 is the big thing and user shell accounts.
@obs:
Try running iotop (apt-get install iotop)
Also what kernel are you running? (uname -a)
installing iotop now, thanks.
The 64-bit latest paravirt:
Linux ftupet 2.6.35.4-x8664-linode16 #1 SMP Mon Sep 20 16:03:34 UTC 2010 x8664 GNU/Linux
Just happened again. Here's the upper part of top:
top - 13:07:37 up 3:23, 1 user, load average: 60.59, 57.33, 49.50
Tasks: 247 total, 1 running, 245 sleeping, 0 stopped, 1 zombie
Cpu(s): 0.1%us, 0.0%sy, 0.0%ni, 0.0%id, 99.9%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 504916k total, 459008k used, 45908k free, 27772k buffers
Swap: 524284k total, 3772k used, 520512k free, 128632k cached
Doesn't look like it's swapping much, if at all.
here's iostat:
avg-cpu: %user %nice %system %iowait %steal %idle
0.09 0.15 0.19 29.39 0.03 70.16
Device: tps Blkread/s Blkwrtn/s Blkread Blkwrtn
xvda 0.67 17.01 1.52 207842 18624
xvdb 0.01 0.06 0.63 768 7680
xvdc 0.16 2.92 0.83 35696 10096
xvdd 4.41 66.78 31.58 816104 385944