My linode360 not responding and all websites down

The system seems to be very busy and not responding to anything: ssh and web. And it takes like 10 minutes to just reboot it from my linode control panel by clicking 'Reboot' button. And the progress is stale at 0% for 5 minutes and then at 25% for another 5 minutes and so forth….

It's happened several times so apparently something wrong with my system or something else.

After reboot everything's back to normal and websites are fast and responsive. But approx 24 hours later, it suddenly goes into the same trouble again - very busy and occupied not responding to anything, took dozens of minutes to just reboot.

I've no idea what may be the trigger of this? Misconfiguration? My PHP code? But all my websites have been fine on the previous host.

My distro is Debian 5.0. Things I've installed:

1. apache, mysql, php

2. rsnapshot (some cron jobs)

3. postfix

4. vsftpd (though automatically stopped)

5. chkrootkit

6. fail2ban

Anyone have any clue? I can provide logs for analysis. Thanks a lot! It's really annoying. :|

P.S. What logs do I need to see to find out what php script may have caused the problem?

9 Replies

Sounds like an OOM, as described here: http://www.linode.com/forums/viewtopic.php?t=4460

Run 'top', hit 'M', and then leave that open in an SSH terminal. When your box fails, the processes using the most memory at the time of failure will be visible. If you really are running out of memory, this will give you a snapshot of the state of things at the point of failure.

This helped me track down a similar problem that I was having, which turned out to be an incompatibility between two specific versions of Apache and PHP (that was known to the respective developers).

@mwalling:

Sounds like an OOM, as described here: http://www.linode.com/forums/viewtopic.php?t=4460

Thanks mwalling. It sounds like it. I've made the changes according to the thread. Also I've modified the swappiness to 25:

http://www.linode.com/wiki/index.php/Swappiness

Do you think it's a good move?

Fingers crossed. But I see no spikes of I/O rate before the server went irresponsive in the control panel performance graphs. So still not sure if it's the disk I/O that has slowed things down, though very likely I think.

You should really run munin or something that regularly logs key resource consumption and performance metrics so you can go back and see what's getting driven over the edge. guspaz's suggestion to leave top running is good too, but if you have munin running all the time then you already have useful information the next time something like this happens.

Are you sure that the box is becoming unresponsive to everyone? I wondered if you might somehow be tripping fail2ban and end up having it lock you out for a while. Perhaps a cron job on your local machine that is trying to rsync or otherwise ssh into your linode?

@eas:

You should really run munin or something that regularly logs key resource consumption and performance metrics so you can go back and see what's getting driven over the edge. guspaz's suggestion to leave top running is good too, but if you have munin running all the time then you already have useful information the next time something like this happens.

Are you sure that the box is becoming unresponsive to everyone? I wondered if you might somehow be tripping fail2ban and end up having it lock you out for a while. Perhaps a cron job on your local machine that is trying to rsync or otherwise ssh into your linode?

The problem is I can't leave an SSH tunnel window on (I use putty) all day long. The problem occurs every 1 or 2 days, unexpectedly. :( Does munin record all the performance variables even when I'm off SSH?

Yes.

http://you.dontlike.us/munin/dontlike.u … ke.us.html">http://you.dontlike.us/munin/dontlike.us/you.dontlike.us.html

@mwalling:

Yes.

http://you.dontlike.us/munin/dontlike.u … ke.us.html">http://you.dontlike.us/munin/dontlike.us/you.dontlike.us.html

Thank you, it helped a lot. My server has been working happily for the last week. Seems it's indeed OOM that's causing the trouble.

So Munin. Will it automatically start recording and graphing the performance data of my machine after I install it? I don't know how to configure it.

> So Munin. Will it automatically start recording and graphing the performance data of my machine after I install it? I don't know how to configure it.
No, you'll need to edit /etc/munin/munin*.conf first, create a virtual host (apache in your case) profile for it, and most likely select a few plugins to load.

See any one of the hundreds of Debian/Munin tutorials on the web, and post back if you're still having problems. :)

@mjrich:

> So Munin. Will it automatically start recording and graphing the performance data of my machine after I install it? I don't know how to configure it.
No, you'll need to edit /etc/munin/munin*.conf first, create a virtual host (apache in your case) profile for it, and most likely select a few plugins to load.

See any one of the hundreds of Debian/Munin tutorials on the web, and post back if you're still having problems. :)

Thank you too, mjrich, I'll give it a try and let you guys know. :)

Linode is a great place for learning to become a server admin!

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct