Linode getting stuck on iowait

In the last week or so, one of my linodes has been repeatedly getting stuck in some kind of iowait loop.

The system is unresponsive, but one of the things I have on it responds to a connection stating that the connection is refused due to high load (33 this time)

I was on the box during one of the occurrences, I ran top, and nothing was using any CPU, it was all in iowait. I left a user logged into the console, and when it just happened now, I couldn't even get w to run to see the current load.

I have apache on this box, but it's very lightly used. It wasn't tuned before, but I went ahead and tuned it just in case.

I'm at a loss trying to troubleshoot this one. Ideas for what I should put in to help trace this?

6 Replies

@glg:

I was on the box during one of the occurrences, I ran top, and nothing was using any CPU, it was all in iowait. I left a user logged into the console, and when it just happened now, I couldn't even get w to run to see the current load.
Sounds like you could have been heavily swapping - did you save the top output? If not, the next time it occurs I'd look more towards memory usage than cpu.

An untuned Apache configuration could certainly in theory cause this - even if the box is usually unloaded, a brief spike in traffic that was enough to push your box into swapping due to Apache processes might take a while to clear.

I suppose alternatively it could be that other guests on your host are getting into periods of heavy disk use which in turn is blocking your Linode, but that shouldn't have too much impact if your Linode is lightly loaded unless you're still trying to do a decent amount of I/O yourself.

– David

@db3l:

Sounds like you could have been heavily swapping - did you save the top output? If not, the next time it occurs I'd look more towards memory usage than cpu.

yeah, possibly an OOM situation.

@db3l:

An untuned Apache configuration could certainly in theory cause this - even if the box is usually unloaded, a brief spike in traffic that was enough to push your box into swapping due to Apache processes might take a while to clear.

That can be ruled out though, as I did tune apache on Monday and it's happened again.

@db3l:

I suppose alternatively it could be that other guests on your host are getting into periods of heavy disk use which in turn is blocking your Linode, but that shouldn't have too much impact if your Linode is lightly loaded unless you're still trying to do a decent amount of I/O yourself.

It could be users. I guess I'm looking for suggestions of something I can look at now or something install that would capture some information later.

I installed munin, but it's not showing anything abnormal other than a gap right when it happened.

Sorry, I did forget to mention one thing. I did upgrade this server from ubuntu 9.10 to 10.04 on 10/22. First occurrence of this lockup was 10/30.

Can you tell us what else is on the box? i.e. databases? wordpress?etc.

Try running iotop (apt-get install iotop)

Also what kernel are you running? (uname -a)

@obs:

Can you tell us what else is on the box? i.e. databases? wordpress?etc.

inn2 is the big thing and user shell accounts.

@obs:

Try running iotop (apt-get install iotop)

Also what kernel are you running? (uname -a)

installing iotop now, thanks.

The 64-bit latest paravirt:

Linux ftupet 2.6.35.4-x8664-linode16 #1 SMP Mon Sep 20 16:03:34 UTC 2010 x8664 GNU/Linux

Just happened again. Here's the upper part of top:

top - 13:07:37 up 3:23, 1 user, load average: 60.59, 57.33, 49.50

Tasks: 247 total, 1 running, 245 sleeping, 0 stopped, 1 zombie

Cpu(s): 0.1%us, 0.0%sy, 0.0%ni, 0.0%id, 99.9%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 504916k total, 459008k used, 45908k free, 27772k buffers

Swap: 524284k total, 3772k used, 520512k free, 128632k cached

Doesn't look like it's swapping much, if at all.

here's iostat:

avg-cpu: %user %nice %system %iowait %steal %idle

0.09 0.15 0.19 29.39 0.03 70.16

Device: tps Blkread/s Blkwrtn/s Blkread Blkwrtn

xvda 0.67 17.01 1.52 207842 18624

xvdb 0.01 0.06 0.63 768 7680

xvdc 0.16 2.92 0.83 35696 10096

xvdd 4.41 66.78 31.58 816104 385944

That's not very high. I'd say it's probably trouble ticket time.

@hoopycat:

That's not very high. I'd say it's probably trouble ticket time.

I opened one, but they looked and said OOM. I think I'll open another.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct