swap of death

I think this image speaks for itself…

~~![](<URL url=)http://i.imgur.com/kPcRmyu.png" />

Basically - this is one of the production servers running cronjobs, web server, and mysql. Sometimes too many cron jobs run at once, or too many apache processes run at once, or something. The result is an unusable server as the disks are maxed out carrying things in and out of swap. It's unusable for half an hour or until the server is rebooted. The server has 1G swap space and 1.5G ram.

Are there any short-term configuration changes i can make to mitigate this issue?

I have some long-term plans to refactor the application, switch to using nginx instead of apache, use gearman to queue cron jobs more evenly, split the load onto two servers, etc. but that's going to take me some time.~~

4 Replies

Make sure you've got mysql and apache configured appropriately for the amount of RAM you have; your maxclients in Apache, if using mpm_prefork (the default I believe), should not be greater than the RAM available to Apache divided by max ram per Apache process.

Other than that, throw money at the problem and upgrade the RAM of the box until you have time to fix it. It's all pro-rated to the day, and the upgrade takes a few minutes.

Guspaz is correct, but it doesn't look like swap is your major problem; the swap line only spikes twice on the graph.

What are you running every hour that hits the disk so hard? If it's a web logfile analyzer, try deleting older logfiles or moving them out of the way to another location. The quiet periods in between seem to indicate that your normal workload doesn't require gobs of I/O.

You could also put ionice -c 3 at the front of your cron jobs; this will not reduce their disk usage, but will ensure that disk requests from the web and database server get priority. Note that it may extend the time that the jobs take to run - since your jobs appear to be taking 40 minutes or so, this could cause them to run for longer than an hour, meaning overlapping jobs would be kicked off before the previous ones completed. Try to take care of the massive disk usage aspect first.

I don't know what exactly is running on your server or how it's configured so I can only guess.

If I was you I'd go though the various crontabs and see what can be removed or tuned. See how many worker processes you have for your various daemons and maybe eliminate a few. Check PHP-APC, memcached, or a database isn't eating large amounts of ram.

Throwing memory at the problem is the quickest way to fix the performance if you are spending someone else's money.

Thanks everyone - definitely cron jobs are the culprit here. I've rescheduled jobs a little more evenly and it helped somewhat.

I think the cron jobs are maybe causing some big full-table scans (the DB is much larger than my available RAM) but i can't quite pick them up in the slowlog. The jobs do need to run and mostly performance isn't an issue - until suddenly, everything falls into the swap cascade and thrashes for an hour.

The cron jobs don't do any disk stuff outside of DB calls (and maybe queueing email, but email queues are low), so i think if anything is swapping, it might be the database server, can't ionice it without affecting interactive users of the web application(?).

Is there a way of penalising the kernel scheduling of tasks undergoing high swap? Or maybe when IO load gets high, suspend a set of processes and run them one at a time instead of all at once? Then the jobs would actually proceed albeit slowly. (I'd write a script to do it, but when the server goes into swap of death, running another process seems counter-intuitive) Googling for a kernel scheduler that penalises processes with high swap gives me results about an IBM patent.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct