swap of death
~~![](<URL url=)http://i.imgur.com/kPcRmyu.png
Basically - this is one of the production servers running cronjobs, web server, and mysql. Sometimes too many cron jobs run at once, or too many apache processes run at once, or something. The result is an unusable server as the disks are maxed out carrying things in and out of swap. It's unusable for half an hour or until the server is rebooted. The server has 1G swap space and 1.5G ram.
Are there any short-term configuration changes i can make to mitigate this issue?
I have some long-term plans to refactor the application, switch to using nginx instead of apache, use gearman to queue cron jobs more evenly, split the load onto two servers, etc. but that's going to take me some time.~~
4 Replies
Other than that, throw money at the problem and upgrade the RAM of the box until you have time to fix it. It's all pro-rated to the day, and the upgrade takes a few minutes.
What are you running every hour that hits the disk so hard? If it's a web logfile analyzer, try deleting older logfiles or moving them out of the way to another location. The quiet periods in between seem to indicate that your normal workload doesn't require gobs of I/O.
You could also put ionice -c 3
If I was you I'd go though the various crontabs and see what can be removed or tuned. See how many worker processes you have for your various daemons and maybe eliminate a few. Check PHP-APC, memcached, or a database isn't eating large amounts of ram.
Throwing memory at the problem is the quickest way to fix the performance if you are spending someone else's money.
I think the cron jobs are maybe causing some big full-table scans (the DB is much larger than my available RAM) but i can't quite pick them up in the slowlog. The jobs do need to run and mostly performance isn't an issue - until suddenly, everything falls into the swap cascade and thrashes for an hour.
The cron jobs don't do any disk stuff outside of DB calls (and maybe queueing email, but email queues are low), so i think if anything is swapping, it might be the database server, can't ionice it without affecting interactive users of the web application(?).
Is there a way of penalising the kernel scheduling of tasks undergoing high swap? Or maybe when IO load gets high, suspend a set of processes and run them one at a time instead of all at once? Then the jobs would actually proceed albeit slowly. (I'd write a script to do it, but when the server goes into swap of death, running another process seems counter-intuitive) Googling for a kernel scheduler that penalises processes with high swap gives me results about an IBM patent.