Heavy I/O and Swap usage on 512

I'm building a Django app on my 512 Linode, and I am running into Swap and I/O problems. Right now, only two of us are using/testing the site. Every 30 minutes, I run a cron job that launches a number of celery tasks that fetch a lot of data online and store the new records in a MySQL database. The site then provides an interface to access that data.

If I don't reboot for a day or two, the site becomes very slow. I noticed that after a reboot, the swap usage slowly climbs after each time the cron job runs (in increments of 3 to 11 percentage points for each cron job), while the memory usage holds steady at about 44%. I also get I/O warning emails occasionally, especially if it's been awhile since a reboot.

It also seems like some tasks are hanging. For instance, I just ran:

ps -eo pmem,pcpu,rss,vsize,args | sort -k 1 -r | less

And got the following output:

%MEM %CPU   RSS    VSZ COMMAND
 8.2  1.1 40892 194712 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --l
oglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
 7.2  1.1 36376 190448 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --l
oglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
 7.1  1.2 35548 190368 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --loglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
 7.0  1.2 35192 189532 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --loglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
 7.0  1.1 35364 189356 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --loglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
 7.0  1.1 34992 189624 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --loglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
 6.8  1.1 33980 194196 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --loglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
 6.2  1.1 31292 187828 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --loglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
 5.6  0.3 28388 477988 /usr/sbin/apache2 -k start
 5.4  0.3 26988 189016 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --loglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
21.3  1.9 106192 1411040 /usr/sbin/mysqld
 1.9  0.1  9512 2140256 /usr/lib/erlang/erts-5.8.5/bin/beam.smp -W w -K true -A30 -P 1048576 -- -root /usr/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -noshell -noinput -sname rabbit@rs -boot /var/lib/rabbitmq/mnesia/rabbit@rs-plugins-expand/rabbit -kernel inet_default_connect_options [{nodelay,true}] -sasl errlog_type error -sasl sasl_error_logger false -rabbit error_logger {file,"/var/log/rabbitmq/rabbit@rs.log"} -rabbit sasl_error_logger {file,"/var/log/rabbitmq/rabbit@rs-sasl.log"} -os_mon start_cpu_sup true -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@rs"
 1.4  0.0  7460  27452 -bash
 0.5  0.0  2764 196076 /usr/bin/python /usr/bin/fail2ban-server -b -s /var/run/fail2ban/fail2ban.sock
 0.4  0.0  2116 1821748 /usr/sbin/apache2 -k start
 0.4  0.0  2016 1952708 /usr/sbin/apache2 -k start
 0.2  0.0  1192 330208 /usr/bin/memcached -m 64 -p 11211 -u memcache -l 127.0.0.1
 0.2  0.0  1000   9728 ps -eo pmem,pcpu,rss,vsize,args
 0.1  0.0   968  73360 sshd: dave@pts/0    
 0.1  0.0   900  73360 sshd: dave [priv]   
 0.1  0.0   844  21816 sort -k 1 -r
 0.1  0.0   724  90904 /usr/sbin/apache2 -k start
 0.1  0.0   712 185508 whoopsie
 0.1  0.0   612  37696 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 106:113
 0.1  0.0   608  90636 /usr/sbin/apache2 -k start

It seems to me that those celeryd processes shouldn't still be in memory (that snapshot was taken safely between cron jobs). And it looks to me as if it's the process of writing to the logs that is an issue. I see those logs show up when I run iotop too (as well as mysqld). In fact, here is a snap shot of iotop while the cron job is running (items bounc around, so I don't know if this is "representative" or not… before, I had seen many more celery log tasks):

Total DISK READ:      29.78 K/s | Total DISK WRITE:     420.20 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                
 2761 be/4 mysql       3.31 K/s   52.94 K/s  0.00 %  7.77 % mysqld
 3482 be/4 mysql       0.00 B/s   26.47 K/s  0.00 %  5.09 % mysqld
 4051 be/4 mysql       3.31 K/s   36.40 K/s  0.00 %  2.36 % mysqld
 1267 be/4 root        0.00 B/s    6.62 K/s  0.00 %  1.79 % [kjournald]
 3481 be/4 mysql       0.00 B/s   36.40 K/s  0.00 %  1.68 % mysqld
 4390 be/4 mysql       3.31 K/s   46.32 K/s  0.00 %  1.10 % mysqld
 3487 be/4 mysql       0.00 B/s   16.54 K/s  0.00 %  0.30 % mysqld
 4448 be/4 mysql       3.31 K/s   29.78 K/s  0.00 %  0.30 % mysqld
 2538 be/4 mysql       0.00 B/s   29.78 K/s  0.00 %  0.19 % mysqld
 3083 be/4 root        0.00 B/s    3.31 K/s  0.00 %  0.00 % python /srv/www/redacted.com/application/rsp~elery/w1.log --pidfile=/var/run/celery/w1.pid
 2512 be/4 mysql       0.00 B/s   26.47 K/s  0.00 %  0.00 % mysqld
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
    4 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0]
    5 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H]
    6 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/u:0]
    7 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/u:0H]
    8 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
    9 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]
   10 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/1:0]
   11 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/1:0H]
   12 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/1]
   13 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/2]
   14 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/2:0]
   15 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/2:0H]
   16 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/2]
   17 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/3]
   18 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/3:0]
   19 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/3:0H]
   20 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/3]
   21 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuset]
   22 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [khelper]
   23 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kdevtmpfs]
   24 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [netns]
   25 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/u:1]
  538 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [jfsSync]
   28 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [xenwatch]
   29 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [xenbus]

Addmittedly, I have not changed my MaxClients setting in apache2.conf, but I'm pretty certain that's not the problem since the swap usage increases during each cron job and we only have two sporadic users. I have adjusted the MySQL settings to these:

key_buffer = 16K
max_allowed_packet = 1M
thread_stack = 64K
table_cache = 4
sort_buffer = 64K
net_buffer_length = 2K

Another setting I tinkered with recently was setting:

vm.swappiness=10

in /etc/sysctl.conf, but the problem persists.

I'm pretty new to server admin, so any help at all would be appreciated, and please let me know if you need any other outputs to help diagnose. I'm hoping this can be solved with configs and not increasing the cost of the linode by expanding the available memory. Any ideas?

9 Replies

Try –concurrency=2 instead of --concurrency=8 and see if that improves the situation. Also, mysqltuner.pl is good for adjusting MySQL settings for your situation; the ones in the troubleshooting guide are a little dated (and only adjust MyISAM, the old default storage engine).

Thanks! I have changed the concurrency setting, and it's now gone through two of the cron jobs, and there's nothing in swap so far, so at the very least, it has improved. I'll let it go through more of the cron jobs, and I'll also run the mysql tuner when I haven't rebooted so recently. Thanks for the help!

maybe you should also have a lock to stop two celery (whatever that is) jobs running at the same time.

make sure you set up monitoring, an email will do, say if the job normally takes 1 minute to complete, check with find if the lock becomes older than 15 minutes, or something, run it from cron every 5 minutes, then have it email you so you can determine why the job didn't finish or exit properly.

Since I last posted and reduced my concurrency setting, I no longer get high io emails from linode. However, I still get swap usage building up over time. I'm not sure how dangerous that is. For instance, I rebooted 3 days ago and swap usage is currently at 44% (memory is at 45%). I don't know the highest level it has reached, and I've been rebooting every few days.

Also, to clarify one mistake in my original post: I don't think the celery tasks are "hanging." I think they are just there waiting to be called upon, so that's not a problem. Reducing the concurrency seems to have helped their impact. I guess I can even reduce it to 1, if neccessary.

I have run mysqltuner.pl and changed settings twice based on its suggestions. The original settings I had were the default mysql, then modified according to linode's suggestions (which are listed in my original post). Also, when I tried to run OPTIMIZE TABLE on all my tables, I was told it could not do that ("Table does not support optimize, doing recreate + analyze instead" in response to running mysqlcheck -o). Here are the settings I have modified since my original post:

table_cache             = 16
query_cache_limit       = 4M
query_cache_size        = 32M
innodb_buffer_pool_size = 266M
tmp_table_size = 32M
max_heap_table_size = 32M

It still wants me to modify querycachelimit, tmptablesize, maxheaptablesize, tablecache. But it also states: "MySQL's maximum memory usage is dangerously high" and "Add RAM before increasing MySQL buffer variables."

Here's the relevant part of the mysqltuner output:

-------- Performance Metrics -------------------------------------------------
[--] Up for: 3d 0h 3m 21s (5M q [21.698 qps], 37K conn, TX: 4B, RX: 2B)
[--] Reads / Writes: 86% / 14%
[--] Total buffers: 362.0M global + 704.0K per thread (151 max threads)
[!!] Maximum possible memory usage: 465.8M (95% of installed RAM)
[OK] Slow queries: 0% (0/5M)
[OK] Highest usage of available connections: 2% (4/151)
[!!] Cannot calculate MyISAM index size - re-run script as root user
[!!] Query cache efficiency: 14.8% (632K cached / 4M selects)
[OK] Query cache prunes per day: 0
[OK] Sorts requiring temporary tables: 0% (148 temp sorts / 29K sorts)
[!!] Temporary tables created on disk: 36% (1K on disk / 3K total)
[OK] Thread cache hit rate: 99% (4 created / 37K connections)
[!!] Table cache hit rate: 2% (16 open / 654 opened)
[OK] Open file limit used: 0% (0/1K)
[OK] Table locks acquired immediately: 100% (4M immediate / 4M locks)
[OK] InnoDB data size / buffer pool: 144.6M/266.0M

-------- Recommendations -----------------------------------------------------
General recommendations:
    Run OPTIMIZE TABLE to defragment tables for better performance
    Reduce your overall MySQL memory footprint for system stability
    Enable the slow query log to troubleshoot bad queries
    When making adjustments, make tmp_table_size/max_heap_table_size equal
    Reduce your SELECT DISTINCT queries without LIMIT clauses
    Increase table_cache gradually to avoid file descriptor limits
Variables to adjust:
  *** MySQL's maximum memory usage is dangerously high ***
  *** Add RAM before increasing MySQL buffer variables ***
    query_cache_limit (> 4M, or use smaller result sets)
    tmp_table_size (> 32M)
    max_heap_table_size (> 32M)
    table_cache (> 16)

So my questions are:

1. Is this kind of swap usage something I need to worry about?

2. Any recommendations on what to do with the mysql settings?

3. Or should I really pinpoint what causes the swap usage to grow first? (and how can I do this?)

4. Is there any other info I need to provide?

Thanks.

````
[OK] Highest usage of available connections: 2% (4/151)

mysql by default allows 151 clients to connect to it, it's like apache's maxclients setting.

in 3 days, the highest concurrent connections was 4, so you could safely turn

max connections down to 20\. Then you can increase the amount of memory mysql is allowed to

use (max mysql memory usage = n times max connections), and hopefully decrease webpage load

times and therefore decrease overall memory usage, or increase the number of concurrent hits

you can take.

ps aux will tell you which processes are using the most memory,

running top, then pressing shift-m will sort processes by highest memory usage.

table_cache you can set to something like 512 and then ignore mysqltuner.pl messages about it

mysqltuner also always says to optimise tables, I do that via cron every hour.

[!!] Cannot calculate MyISAM index size - re-run script as root user
````
you might want to do that, too.

chesty,

Thanks for the help!

I have reduced max_connections to 20, and made the other adjustments suggested by mysqltuner.

I also re-ran mysqltuner with sudo, and the missing line was:

[OK] Key buffer size / total MyISAM indexes: 16.0M/99.0K

So, I guess that one is fine. I will continue to monitor and tweak the mysql settings. If I continue to run into problems, I'll come back.

Thanks again!

mysqltuner keeps recommending that I increase my querycachelimit, which now stands at 192M. Considering I'm on a 512 linode, this already seems way too high.

I'm also starting to have doubts that this helps at all in my case. The main usage of mysql occurs every 30 minutes when I pull new data and load it into (mostly) one table. I have a lot of reads, because I need to check whether I've previously loaded each piece of data previously or not. For most of each pull, I've already loaded the data previously. But the reads change over time, and the reads that are driven by the website itself will rarely produce the same retrievals twice (and the retrievals it does repeat are relatively small).

The other recommendations from mysqltuner are to expand tmptablesize and maxheaptable_size, which each stand at 168M. This also seems high to me. I still accumulate swap usage (every day, but less than I used to).

I've taken my max_connections all the way down to 6, yet my max possible memory usage is at the limit for my linode after I have made recommended adjustments from mysqltuner over time.

My questions are:

1. Should I disable caching in mysql, given my use cases?

2. Should I continue to increase tmptablesize and maxheaptable_size?

3. Or should I try to pinpoint exactly what is causing the swap usage? Most of the every-30-minute events seem to use very little to no swap, with an occasional spike that does most of the damage. How would I capture what is causing the spikes if they occur when I am not at the terminal?

I don't think tmptablesize and maxheaptable_size increase memory usage, they increase io and disk usage because they limit how much space mysql can use for temporary disk tables for sorts, etc. I'm guessing here, that if they aren't high enough, a large query fails. You can google that to know for sure.

I have no concrete suggestions, if you're running the same queries every 30 minutes, you could cache the results in memcache, and turn off or turn down mysql query cache. I guess that depends on how much data you're collecting. Another option is to use unique keys and just insert the data every time, if the data is already in the db, the insert will fail, if not, it will succeed. I have no idea whether this will help you, but it has helped me in the past. For a particular app and db I was using, it was quicker to use unique keys and try: insert; except: pass everything than it was to do select/insert

Also get familiar with mysql explain to see how a query is being handled, and whether an index might help.

for gathering data, collectd is really good. it's written in C and is quick, it polls every 10 seconds by default. I haven't looked at the collectd package in a while, but last time I used it, it was fairly minimal out of the box, you have no know (or research) what things to monitor.

Hopefully someone else here can chip in, but you're more than capable of tuning your app yourself, it just takes research and testing.

OK, I've pinpointed what is causing my daily spike in swap io: it's the unattended-upgrades, which automatically installs security updates each day right when the spike occurs. Last night, I tried changing a few settings to help reduce this. I uncommented these lines:

Unattended-Upgrade::MinimalSteps "true";

Acquire::http::Dl-Limit "70";

This seems to have helped a little bit (or maybe there just weren't many updates). I suppose I can continue lowering the download speed, but I don't see any other settings that would obviously lower the swap io. Are there any ways to eliminate the swap io caused by this? Or is disabling it the only way?

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct