OOM assistance for a Linux beginner.

Hi All,

Hope you can help. My server technician is away until October, and I noticed today that our server had gone down. I raised a ticket after rebooting the VPS and got a detailed response with a Lish Code Snippet, unfortunately this to me is alien.

Essentially I was told that the cause of the server freezing was because it was OOMing - which I do understand means something had been eating the virtual memory, but I have no idea how to go about diagnosing the cause of this and would be much obliged if someone could provide me with some step by step assistance to working out the cause, and hopefully preventing it happening again?

Thanks in advance.

David

18 Replies

Hi David,

It's a bit difficult to help diagnose if you don't supply any information, although it does not seem like you are able to. Assuming that you have no knowledge (and now seems like a bad time to learn), I'd suggest upgrading your Linode temporarily, or adding on more memory. The latter would be cheaper for you, plus it can be pro-rated (you can downgrade when he gets back and have the remainder of the month reversed). It can be a bit of lengthy procedure and difficult to diagnose an OOM problem.

Thanks for your quick, response. I'll post the LISH snippet I was sent by Linode Technical support, perhaps that's a good starting point? I'd be surprised if I needed to upgrade as I have very few services running on the VPN at very low bandwidth (maybe 5GB in/out per month). Nothing has changed with the Linux install either, so I'm struggling to understand how it can suddenly OOM, so am obviously happy to learn more.

Here is the information provided to me by support:

Thank you for contacting Linode support. It appears that your Linode was OOMing, meaning something inside your node is consuming all of the available virtual memory. Typically, you can see this for yourself by logging into Lish and viewing the console:

http://library.linode.com/linode-manage … node-shell">http://library.linode.com/linode-manager/using-lish-the-linode-shell

Console Snippet

Out of memory: Kill process 3165 (clamd) score 155 or sacrifice child

Killed process 3165 (clamd) total-vm:220732kB, anon-rss:87628kB, file-rss:64kB

------------[ cut here ]–----------

kernel BUG at mm/swapfile.c:2527!

invalid opcode: 0000 [#1] SMP

last sysfs file: /sys/devices/vbd-51712/block/xvda/removable

Modules linked in:

Pid: 19561, comm: apache2 Not tainted 2.6.39.1-linode34 #1

EIP: 0061:[] EFLAGS: 00010246 CPU: 0

EIP is at swapcountcontinued+0x176/0x180

EAX: f57bad84 EBX: ed25b680 ECX: f57ba000 EDX: 00000000

ESI: ed3d8240 EDI: 00000080 EBP: 00000d84 ESP: c2619e3c

DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069

Process apache2 (pid: 19561, ti=c2618000 task=ebe9d400 task.ti=c2618000)

Stack:

eb9a3e40 0000cd84 00000040 00000000 c01a9601 0000cd84 ed332d60 00000000

00000000 c01a9868 c261cec0 c019b1aa 00000000 8000001b 0000001b ece2da2c

ed3d0760 b7666fc0 cb5e3c58 00000000 00000000 e5d14025 d2054f80 bb18b00c

Call Trace:

[] ? swapentryfree+0xf1/0x120

Please let me know what other information you need and I'll do my best to provide it.

Absent any details of your OS or what applications (other than Apache and clamd) you're running, the best I can suggest is to reduce MaxClients in your Apache configuration. The default value is usually way too large for most Linodes.

Log in as root, and run apache2ctl -M | grep mpm - if the output contains "prefork" then open the file /etc/apache2/httpd.conf in an editor and find the section that looks like this:

 <ifmodule mpm_prefork_module="">StartServers          5
    MinSpareServers       5
    MaxSpareServers      10
    MaxClients          256
    MaxRequestsPerChild   0</ifmodule> 

Change the value of MaxClients from 256 to 15 or so and save the file. Then restart Apache (or simply reboot the server). See other forum threads relating to MaxClients for more background.

Of course, if Apache isn't the problem this won't help.

Thanks for the suggestion, I'll give that a go. Is there any way I can check what effect it has? Since rebooting the VPS all my sites are running okay, so it appears as though there's no further memory leaks. Though I'm not sure how they manifest themselves, so without a way of checking, I guess it could reoccur.

With regard to the OS, I'm using Linux Ubuntu 10.04 on a Linode 1024. I don't believe anything other than LAMP has been installed, but if someone can give me an outline to getting a server log which is of use to diagnosing the problem, I'd be very grateful.

@Vance:

Log in as root, and run apache2ctl -M | grep mpm - if the output contains "prefork" then open the file /etc/apache2/httpd.conf in an editor and find the section that looks like this:

 <ifmodule mpm_prefork_module="">StartServers          5
    MinSpareServers       5
    MaxSpareServers      10
    MaxClients          256
    MaxRequestsPerChild   0</ifmodule> 

I followed the above steps, ran the command in terminal and received the output - mpmpreforkmodule (static). So I logged into the root directory of the server via FTP located the httpd.conf file but it is 0 bytes and contains no data.

Edit: I have an apache2.conf in the same directory which does have a similar code snippet as above, but the MaxClients is 150 - should I change that?

@Sienco:

Edit: I have an apache2.conf in the same directory which does have a similar code snippet as above, but the MaxClients is 150 - should I change that?

Yes you should and then restart apache. Then yell at your server admin they should have done that when they set it up.

Some distributions use different names for the configuration files - as you discovered, it's apache2.conf on your system, not httpd.conf. You would lower the value of MaxClients in that file and restart Apache.

I don't know if it's yet time to yell at the admin - we're not even sure if Apache was the problem. It's possible that 150 was chosen after careful consideration of the web application resource needs (though admittedly this seems unlikely).

What you can do to monitor the situation is to log in to the server, run vmstat 10 and monitor the output. Every 10 seconds it will print a line of statistics. You want to look at the "si" and "so" values in the swap column. If these are both zero the vast majority of the time, then you aren't experiencing a shortage of memory. You can end the vmstat program by pressing Ctrl-C.

Just to note 150 is the default on Ubuntu/Debian so I'd say it's not been touched.

@obs:

Just to note 150 is the default on Ubuntu/Debian so I'd say it's not been touched.

Exactly. 150 is the default and it's way too high for your average linode 1024 running php.

@glg:

@obs:

Just to note 150 is the default on Ubuntu/Debian so I'd say it's not been touched.

Exactly. 150 is the default and it's way too high for your average linode 1024 running php.

What if you got Slashdott'ed or HN'ed, just curious how many would be needed for a massive spike like that.

If you have KeepAlives turned off, and your page doesn't take multiple seconds to load, a MaxClients setting of 10-15 will go farther than you think.

I'd recommend dropping the MaxClients down to something like 15. Then watch. Keep tabs on memory usage, and use some sort of monitoring to get an idea of request response times on your site. You'd then want to adjust as needed.

@jebblue:

@glg:

@obs:

Just to note 150 is the default on Ubuntu/Debian so I'd say it's not been touched.

Exactly. 150 is the default and it's way too high for your average linode 1024 running php.

What if you got Slashdott'ed or HN'ed, just curious how many would be needed for a massive spike like that.

Having it too high (and 150 is usually too high) will just cause an OOM when traffic spikes, not help.

@Vance:

What you can do to monitor the situation is to log in to the server, run vmstat 10 and monitor the output. Every 10 seconds it will print a line of statistics. You want to look at the "si" and "so" values in the swap column. If these are both zero the vast majority of the time, then you aren't experiencing a shortage of memory.

Thanks for this suggestion. I logged into the server via Terminal on my Mac and ran the vmstat 10 command as directed. The below is what I received:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0 117252 188720  32328 379172    0    0     1     2    2    5  0  0 99  0
 0  0 117252 188652  32336 379172    0    0     0     3  103   63  0  0 100  0
 0  0 117252 188652  32344 379172    0    0     0     2  104   64  0  0 100  0
 1  0 117252 163676  32440 375016    0    0     0  1656 1021  272  3  0 93  0
 1  0 117252 127824  32496 379176    0    0     0  3145 1152  185  4  1 91  0
 0  0 117252  65560  32512 379176    0    0     0    30 1265  165  3  2 93  0
 0  0 117252 163904  32548 379180    0    0     1    38  529  143  1  0 98  0
 1  0 117252 150200  32588 379172    0    0     1    45  417  131  1  0 98  0
 0  0 117252 153972  32596 379188    0    0     0     5  116   65  0  0 100  0
 0  0 117252 154012  32596 379188    0    0     0    25  114   66  0  0 100  0
 0  0 117252 154012  32604 379188    0    0     0    13  106   64  0  0 100  0
 0  0 117252 154012  32612 379188    0    0     0     3  112   66  0  0 100  0
 0  0 117252 154020  32620 379188    0    0     0     4  104   65  0  0 100  0
 0  0 117252 154136  32628 379188    0    0     0     2  122   63  0  0 100  0
 0  0 117252 154144  32636 379188    0    0     0     8  102   64  0  0 100  0
 0  0 117252 154268  32636 379188    0    0     0     2  111   66  0  0 100  0

Both si & so were 0 the whole time. So does this look like Apache was the cause and selecting fewer MaxClients has resolved it, or is it too difficult to be sure with such a small sample of stats?

As glg hinted at, the number of Apache processes will only grow when you get lots of web traffic. So if you are running vmstat during a quiet period, you won't see anything notable even if Apache is poorly configured.

You can try simulating how your web site will perform under load using a tool like ab. There are also commercial load testing services that will generate traffic if you're uncomfortable doing this on your own; I have no experience with these.

If you see zero swap activity under load, then you're likely safe from Apache causing an out-of-memory problem.

Thanks for your support. I'll keep an eye, monitoring at different intervals and see how it goes. Can I assume that I can run the vmstat command if the apache server is unresponsive (if OOM for example) and still receive stats, or will the server fail to respond?

There's no guarantee of that; the OOM killer can target any process, including your ssh login. It tries to target the processes causing the trouble, but that can be surprisingly difficult for a machine to decide.

Then again, if your machine is OOMing then you don't need vmstat to tell you that you have a problem. At that point, ps auwx is probably more useful to show what processes are consuming memory.

Great, thanks again for all the advice/assistance.

FWIW, clamd can be a memory hog. When I ran it on a smaller linode, i had to disable quite a few of the databases it uses to make it fit in memory.

for me, it's currently running using ~300Mb of memory…

clamav 1932 0.1 14.6 362836 301104 ? Ssl Aug01 109:19 clamd

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct