CPU hits 400% on graphs and cannot access my Linode

My Linode is running CentOS 5 which hosts my website (LAMP), and over the past few months it has been going down due to very high CPU activity. On the graphs (Linode dashboard) I see the CPU at 400%, but I can't login via SSH and AJAX (lish) console shows the following screen which I cannot exit. The only way is to reboot via the Linode dashboard and then it all comes up as normal for a few days before happening again.

~~![](<URL url=)http://i812.photobucket.com/albums/zz42 … u-high.jpg">http://i812.photobucket.com/albums/zz42/l9nux/Linode-console-cpu-high.jpg" />

Has anyone seen this screen before, or have any ideas to what could be causing this?~~

13 Replies

It looks like you are running low on memory and hitting swap. I know you rebooted, so the information may not be of much help, but what do the following commands show:

free -m
vmstat 1 20

Do you have any server monitoring software installed to track your memory usage over time (Munin or Cacti)?

  • http://library.linode.com/server-monitoring

The graphs from either may show you the cause.

-Tim

No I don't have any server monitoring as it's not really critical, and the Linode dashboard graphs have been good enough for me. I might look at Cacti at some point.

When I run free -m now I get the following:

             total       used       free     shared    buffers     cached
Mem:           512        430         81          0         33        170
-/+ buffers/cache:        226        285
Swap:          511          0        511

So not an issue now, since the reboot. Incidently I'm running kernel 2.6.18.8-linode22, is it worth updating to 3.0.0?

Assuming you're using Apache and mod_php, what is your MaxClients set to in apache2.conf?

post whats in your php.ini file here!

@l9nux:

So not an issue now, since the reboot. Incidently I'm running kernel 2.6.18.8-linode22, is it worth updating to 3.0.0?

Yikes, that's pretty ancient. You're running the old non-paravirt kernel… 2.6.18.8 came out in February 2007, so you're running a 4-year-old kernel. Even if Linode backported security fixes, you really should be running something modern.

Change Apache's MaxClients setting to 15. This will probably help curb your memory usage.

This happened to me few days ago .. CPU usage 400%, load average was 10 to 15.

The reason was a php script accessing GD to generate images on the fly.

php.ini: memory_limit = 128M

It's a 1024 linode. My questions is how can I limit the php or apache to use fix amount of CPU or RAM?

So that the server does not hang.

Thanks

Richard

@richardvc:

The reason was a php script accessing GD to generate images on the fly.
@richardvc:

My questions is how can I limit the php or apache to use fix amount of CPU or RAM?
I can think of three options. All of them involve compromises of one sort or another.

1) Reduce the memory_limit in php.ini to something more reasonable, like 32M. But this will cause your image processing script to fail.

2) Reduce MaxClients (the maximum number of scripts that can run at the same time) to a very low value, so that you don't run out of RAM even if some scripts use a lot of RAM. But this can cause slowdowns if a lot of people access your site at the same time, because they'll queue up.

3) Get rid of that on-the-fly image processing script. Use a command-line script, cron job, background process, or some other non-web-accessible mechanism to process your images. But this requires some programming and sysadmin knowledge, may be difficult to integrate into an existing program, and may cause image generation to be delayed by a few seconds to a few minutes.

Image processing in PHP is extremely RAM-intensive. Ever since StackScripts more ore less solved the problem of default MaxClients being too high, most of the "I ran out of RAM!!!" threads in this forum have involved image processing in one way or another. You just can't keep doing it in real-time if you want to keep your precious RAM. You've gotta drop it, slow it down, or separate it into its own background job.

Thanks for the responses, yes my kernel is old…. a good reflection on how busy my life is (kids, study, work!)

My website has an image processing script to convert text headings to images, must be that so I'll ditch it. Looks good, but not if the site goes down!

I'll have to update the kernel, is it as simple as changing it in the latest paravirt in the Linode boot settings?

Thanks

Ray

@l9nux:

My website has an image processing script to convert text headings to images, must be that so I'll ditch it. Looks good, but not if the site goes down!

Just don't do it on the fly. Setup some background process to run via cron every now then to do the work.

@l9nux:

I'll have to update the kernel, is it as simple as changing it in the latest paravirt in the Linode boot settings?

Yes

Hi all,

This just happened to me. Running centos 5.4 and was running Latest 2.6 kernel (2.6.39.1-x86_64-linode19). On Sunday at 6pm it went south very rapidly. No response to ssh, pings or lish console.

System graphs in Munin were completely normal, 795% idle, until the host dropped off the face of the earth. I really doubt it was a problem with the application software, but if it was, then it happened so quickly that we couldn't monitor it. Usually even a machine under extremely heavy load will respond to pings, as that happens entirely within the kernel.

I've switched to a 3.4 kernel to see if that helps. But in the mean time, I thought this data point might be useful, that l9nux is not the only one seeing 400% CPU hangs apparently caused by the kernel.

There was nothing on the Lish console before you rebooted? Could have just been something that caused the OS to become unresponsive. Even so, getting off that kernel was a very wise decision.

l9nux's issue is that they were running out of free RAM (as per the Free swap line in their Lish console output). If you saw nothing on your Lish console, it's unlikely that your problem was a similar one. But hard to say.

-Tim

I don't remember seeing anything at all on the Lish console. Both lish and ssh were completely unresponsive and I had to reboot the machine.

Cheers, Chris.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct