LAMP Stack taking up swap space can't figure out why

I'm running a linode 512, with a web forum on it that gets about 20-25k views per day, and 1500-2500 average posts daily.

I've got a standard lamp stack with webmin and virtualmin on top of it and since inception I've been slowly working through various issues slowing the system down. The biggest problem I've solved so far was that logrotates compression would take my site down for 10-15 minute every friday when it compressed the apache access logs.

Right now the site is accessible and rarely has any issues (a few 500's here and there, not sure what's causing them exactly) but I've got swap usage I can't explain and performance issues whenever I do anything disc related.

free -m output:

             total       used       free     shared    buffers     cached
Mem:           498        476         22          0          4        228
-/+ buffers/cache:        243        255
Swap:          255        122        133

Here's a selection of top sorted by swap usage:

15317 mysql     20   0  123m  25m 2668 S    0  5.2 676:39.96  97m mysqld
13603 www-data  20   0 97640 5792 2480 S    0  1.1   0:00.00  89m apache2
13572 www-data  20   0 98216 6476 2560 S    0  1.3   0:00.04  89m apache2
13468 www-data  20   0 98884 7152 2576 S    0  1.4   0:00.18  89m apache2
13581 www-data  20   0 98220 6500 2556 S    0  1.3   0:00.04  89m apache2
 2118 bind      20   0 74192  756  420 S    0  0.1   0:00.53  71m named
10952 irc       20   0 35784 6708 1116 S    0  1.3  31:43.05  28m python
11282 www-data  20   0 30988 3304  840 S    0  0.6   0:00.41  27m apache2
19595 pss       20   0 31668 4056 2540 S    0  0.8   0:00.03  26m php5-cgi
 8448 test.rae  20   0 31408 5268 3228 S    0  1.0   0:00.04  25m php5-cgi
11852 talizora  20   0 35448  10m 4324 S    0  2.1   0:29.72  24m php5-cgi
11856 talizora  20   0 35448  10m 4324 S    1  2.1   0:32.84  24m php5-cgi
11862 talizora  20   0 39156  14m 4324 S    7  2.9   0:33.27  23m php5-cgi
11841 talizora  20   0 36960  12m 4320 S    0  2.5   0:32.63  23m php5-cgi
11858 talizora  20   0 36756  12m 4320 S    0  2.4   0:36.07  23m php5-cgi
11864 talizora  20   0 37020  12m 4324 S    0  2.5   0:36.61  23m php5-cgi
11866 talizora  20   0 36732  12m 4324 S    0  2.4   0:34.39  23m php5-cgi
11869 talizora  20   0 36632  11m 4324 S    0  2.4   0:26.17  23m php5-cgi
11872 talizora  20   0 37512  12m 4324 S    0  2.6   0:37.67  23m php5-cgi
11854 talizora  20   0 36768  12m 4384 S    0  2.4   0:35.97  23m php5-cgi
11860 talizora  20   0 36704  12m 4404 S    0  2.4   0:32.61  23m php5-cgi

When left alone everything runs fine, but when I go do anything much at all the entire system goes crazy. I used cp to copy 60 megs of stuff, it took 10 minutes to copy and load average went up to 10-15 during it and the whole site 500 errored on any request while it did. I did this during a slow period on accesses so it would interfere as little as possible and load average was 0.02 when I started the copy.

I'm not sure if my mysql/apache2 confs or which other configs or information would be most helpful and I've run out of places to look to solve it myself.

I have looked through the low memory MySQL/Apache config information here and limited some things (running mpm worker limited to 25 clients) but I still have these oddball issues I can't explain.

Thanks in advance for any advice,

-Rael

25 Replies

Well, for one thing, you've misconfigured Apache/PHP. You've got way more PHP processes than your amount of traffic needs (and I presume there are more php processes than we're seeing). I presume this is a mpm_worker with fastcgi (or fpm) setup?

The stock lighttpd configuration has 2 parent PHP processes with 4 children each, and even that's overkill for most people.

I believe that is all of the PHP procs actually.

it is mpm_worker with fastcgi.

That shot was also taken nearby peak load time and almost all traffic is dynamic pages. The entire site is a forum so everything runs through PHP…

It averages 6 requests per second overall and the place is pretty much dead quiet for 12 hours a day. My traffic is pretty peaky.

Last night I rebooted the server and it's already got 50MB of swap used… I have more free memory (not cache or buffers, truly free memory) than I have used swap space… I don't understand why it's swapped anything at all.

Relevant apache config (as far as I know, if more is needed please let me know):

 <ifmodule mpm_worker_module="">StartServers          1
    ServerLimit 10
    MinSpareThreads      5
    MaxSpareThreads      20
    #ThreadLimit          25
    ThreadsPerChild      5
    MaxClients          25
    MaxRequestsPerChild   1000</ifmodule> 
 <ifmodule mod_fcgid.c="">AddHandler    fcgid-script .fcgi
  FcgidConnectTimeout 20</ifmodule> 

I'm not only trying to handle the regular traffic but deal with some rather sharp peaks of traffic I get on occasion. Before I got settings more calmed down I had a bunch of users show up and the whole place OOMed out on me. To give you an idea of what is "normal" for unexpected traffic I've had a single user show up and make 800 posts in one day, they brought the entire days average up by 3 requests per second. I also have had bursts where 20-40 new users will just show up and start posting 2-3 times per minute each on top of their browsing. That's why I have my max clients up as high as I do.

Again I'm really at a loss to explain the behavior of the software I just don't know enough about what I'm working with. If anyone can give me some idea of what I'm doing wrong I'd appreciate it.

I'm not sure how to lower the number of PHP processes, where do I need to go in config's to handle that sort of thing?

What forum software are you using? What sort of caching have you implemented for it?

You should install something like munin, it's really good for showing you what's a resource hog.

As for swap, you will always use some swap.

Swap is not bad, moving stuff in and out of swap all the time IS bad.

Swap is basically used for 1) storing stuff that isn't used very often so the memory can be freed up for something else, and 2) acting as extra ram.

The 2nd one on a server is the bad one. The first one is a good thing.

@obs:

You should install something like munin, it's really good for showing you what's a resource hog.

As for swap, you will always use some swap.

Swap is not bad, moving stuff in and out of swap all the time IS bad.

Swap is basically used for 1) storing stuff that isn't used very often so the memory can be freed up for something else, and 2) acting as extra ram.

The 2nd one on a server is the bad one. The first one is a good thing. I'll second that. Swapping out unused memory pages is a good thing. It means that freed RAM can be used for things that need it. If you are interested, read more here: http://kerneltrap.org/node/3000 .

Also, you can look into moving to nginx+php-fpm+mariadb/percona :)

@rdaruszka:

 <ifmodule mpm_worker_module="">StartServers          1
    ServerLimit 10
    MinSpareThreads      5
    MaxSpareThreads      20
    #ThreadLimit          25
    ThreadsPerChild      5
    MaxClients          25
    MaxRequestsPerChild   1000</ifmodule> 

This looks reasonable, although MaxClients might be a bit low for mpm_worker. But I'll defer to others here since I'm not an apache expert.

@rdaruszka:

 <ifmodule mod_fcgid.c="">AddHandler    fcgid-script .fcgi
  FcgidConnectTimeout 20</ifmodule> 

Caveat for this section of my post: I'm a lighttpd user, so I'm thinking in terms of Lighttpd's fastcgi module.

Back to the age-old topic, "Apache's default settings are insanely stupid", FcgidMaxProcessesPerClass defaults to 100 processes. At PHP's max memory load, that'd be about 13 gigabytes of RAM. A sane limit for this setting is 8. The default of 100 means you can handle 100 simultaneous requests, but since you've only got access to four processor cores, that's pointless. 4 is theoretically enough to max out your CPU resources, but because stuff can be waiting on IO or other things, 8 gives you a 2:1 ratio to help keep those processors busy. Anything beyond that is starting to get pretty pointless on a quadcore processor unless you've got really long-running scripts (which a forum tends not to). You've already got 13 processes that I can see in that 'top', so you're already getting more than you need.

With lighttpd, at least, if there is a request that comes in while all 8 processes are busy, then it queues. That's the best policy, because it's better to queue up the requests than trying to serve them all at the same time if you've got an insane load spike.

@rdaruszka:

I'm not only trying to handle the regular traffic but deal with some rather sharp peaks of traffic I get on occasion. Before I got settings more calmed down I had a bunch of users show up and the whole place OOMed out on me. To give you an idea of what is "normal" for unexpected traffic I've had a single user show up and make 800 posts in one day, they brought the entire days average up by 3 requests per second. I also have had bursts where 20-40 new users will just show up and start posting 2-3 times per minute each on top of their browsing. That's why I have my max clients up as high as I do.

OOMs only happen because the webserver is misconfigured. No amount of load should cause the server to OOM, even if it's enough to make it slow to a crawl. Excess requests should be queued, so an OOM is indicative that you've configured your server to try to handle too many simultaneous requests.

40 users posting 3 times per minute is only 2 posts per second. That's pretty low load, not a huge load spike. Besides, for all this stuff, your database is probably the bottleneck if anything. Also, remember that 40 users posting 3 times per minute are not consuming 40 clients worth of resources. If your forum software is very slow and it takes half a second to process a request to make a post, then that user consumes one client worth of resources for half a second, and then is not connected to your server for the next 19.5 seconds… although keepalives might change this.

Again I'm really at a loss to explain the behavior of the software I just don't know enough about what I'm working with. If anyone can give me some idea of what I'm doing wrong I'd appreciate it.

@rdaruszka:

I'm not sure how to lower the number of PHP processes, where do I need to go in config's to handle that sort of thing?

I could tell you with more authority for lighttpd… If these settings are equivalent to what I'm used to, I would set FcgidMaxProcessesPerClass to 8, and FcgidMinProcessesPerClass to 4 (although the default of 3 is probably fine)… I'd wait for more apache-oriented people to comment.

Well, setting number of PHP processes to number of cores assumes they can handles stuff as fast as possible and free themselves for next use. As they're actually blocked for relatively long sending data over the network, accessing DB, and such, I would say that a few more might be useful.

Also. 5 threads per worker subprocess is crazy low, IMO…

This all is based in guesses, empirical evidence, and possibly wrong conclusions. All I can say is that I use 25 threads per worker, and 16 PHP engines, and

a) don't suffer from swapping, actually have more than half of RAM available for cache

b) Can easily handle ~500 static requests/s AND ~50 PHP requests/s

c) never had more than 75 apache threads active or more than 25/400% CPU under real world load, and even these happened only a few times.

Btw, doesn't fcgid spawn each fastcgi subprocess separately, so they can't share an APC cache if installed? (I use mod_fastcgi).

EDIT/PS. Not claiming values quited are awesome in general, just saying "they are fulfilling my particular needs very good".

Thought I replied to a couple of these last night! Failed to hit post… :roll:

Emufarmers: SMF 1.1.11, looking to upgrade to 2.0. Had APC running at one point but it was getting more 500's with it than without so I pulled it better a working poorly site than a non working site and I didn't have a solution. I don't believe APC was actually causing any problems and could probably go back in cleanly right now without issue.

obs: Had munin installed couldn't figure out what the problems where from it, it was causing some issues during it's update cycles I couldn't explain and it wasn't helping me so I pulled it.

Net-burst: Been considering something like that but not sure, this is "production" I can pull it down but I like to do so for as short of time as possible. Experimentation isn't a good thing.

Guspaz: Thanks for the call on the MaxClients, I think I had it as high as 40 at one point and dropped it to test if that was causing problems… It wasn't just haven't reverted. Would love to hear from others. on this as well.

I'm reading this as "add FcgidMaxProcessesPerClass being set to 8 into this section" correct interpretation?

The OOM was a misconfig, it was in full defaults on everything back then, MPM Prefork, something like 300 max clients, it was a disaster. Totally not a current issue.

true 40 users * 3 per minute is only two posts per second. But they are all F5 happy so there is a lot of reading load happening.

I think the keep alive is at like 5 seconds or something, I'd have to check.

Thanks for the hint on Fcgid settings, I'll look into it.

rsk: Thanks for the information I'll look at tweaking.

To clarify a few things I am running about 50% of ram free for cache. All of the data for the site fits into about 260MB (DB and scripts/image/etc). Under normal situations I do not have a problem. The biggest concern I had was when I copied a 60MB folder and load ran to 10-15 and the whole site was 500 errored because PHP was taking longer than 30 seconds per request for about 15 minutes. I cannot for the life of me understand what could cause this.

Yeah I need to tighten up my web server so it runs better and consensus sounds like "Up max clients, up threads per worker, set fgci to 8ish procs, get APC installed" but it does serve the pages fine right now, just so long as I don't get anywhere near the disk while it's doing it…

Also I do have some linux experience as well, "some" means "can install gentoo on raid 5 without looking up docs". I have almost no experience running it as a web server but it's my primary desktop/laptop OS.

I made a couple of changes and ran some tests…

I set the Fcgimax discussed above to 8. I altered the mpm_worker to 10 threads (from 5) and 30 maxclients(from 25)

I ran apachebench on a -short- static page and got results of around 2500-2800 requests per second. This test was run for 50,000 requests total with a concurrency of 10 up to 100. with very little change in the requests/second. Almost no load on the cpu, caching kept the few bytes of content from hitting the disc. This seemed reasonable so I moved on to testing dynamic content.

I tested against the board view of the forum and code the following:

Concurrency Level:      10
Time taken for tests:   21.107 seconds
Complete requests:      500
Failed requests:        54
   (Connect: 0, Receive: 0, Length: 54, Exceptions: 0)
Write errors:           0
Total transferred:      21930420 bytes
HTML transferred:       21731920 bytes
Requests per second:    23.69 [#/sec] (mean)
Time per request:       422.142 [ms] (mean)
Time per request:       42.214 [ms] (mean, across all concurrent requests)
Transfer rate:          1014.65 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:   167  418 1132.6    329   20371
Waiting:      167  418 1132.7    329   20371
Total:        167  419 1132.6    329   20371

Percentage of the requests served within a certain time (ms)
  50%    329
  66%    334
  75%    338
  80%    343
  90%    368
  95%    404
  98%    478
  99%    566
 100%  20371 (longest request)

During this test cpu load hit 100% though I didn't notice any IO load. The giant variance in times kinda bothers me… anywhere from 0.16 seconds up to 20 seconds? Also a concurrency of 1 results in about 5.5 per second being handled. The rates seem about right considering there is no caching involved. 0.167 seconds per request to process is ~6 per second and 4 cores… just under 4 times as many. That large variance in time kinda bothers me though. Seems about 10 seconds too long… Shouldn't things be FIFO? Seems to me that would be a max of about 2 seconds…

I'm going to see if I can't get apc going again and bench that as well but I might not be able to…

I was able to enable APC and got some promising results out of it.

110 requests per second from the same page below. This was pretty well the rate when I used anywhere from 10 up to 1000 concurrent requests (not sure how well 10,000 concurrent slams things but I'd assume it's not what would be called "pretty")

So I'm satisfied with general performance. I am concerned by the large variance in my last post. When you're attempting 1000 connections at once when the server will only handle 30 at a time you are of course going to have issues, but a mere 10? That massive length on the long end just feels about 10 times longer than it should be. Am I off my a factor of 10 in my conversion? 20371ms is 20.371 seconds right?

And none of this really explains why a simple file copy would cause so much load… That is the factor that really bothers me. I'd like to be able to copy the sites files to a test virtual server without worrying about the server exploding on me while I do.

Hmm.

Watch 'vmstat 1' (or 'dstat -cdg') for a while, and compare the CPU time in 'wa'it state with amount of disk io (bi/bo si/so for block in/out swap in/out in case of iostat).

If the former is very high, but latter are small, it may be some other Linode at your physical host hogging the disks like crazy, and making yours (and everyone else's) block for looong each time it tries to read or write something.

@rsk:

Hmm.

Watch 'vmstat 1' (or 'dstat -cdg') for a while, and compare the CPU time in 'wa'it state with amount of disk io (bi/bo si/so for block in/out swap in/out in case of iostat).

If the former is very high, but latter are small, it may be some other Linode at your physical host hogging the disks like crazy, and making yours (and everyone else's) block for looong each time it tries to read or write something.

I'm about to do some stuff that pissed the server off royally last time I tried it so I'll be sure to do this while I try it :D

Thanks for all of your suggestions. I do suspect that tuning MySQL is my next target in tuning this thing up so if I start posting a bunch of stuff about that later today you know why.

Alright well I was able to capture some stats. Hopefully they'll give you guys some idea though.

before I started doing things:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0     74     38     15    321    0    0     0     0  274  197  0  0 100  0
 0  0     74     38     15    321    0    0     0     4  277  213  0  0 100  0
 0  0     74     38     15    321    0    0     0     0  236  196  0  0 100  0
 0  0     74     38     15    321    0    0     0     0  278  214  0  0 100  0
 0  0     74     38     15    321    0    0     0     0  241  204  0  0 100  0
 0  0     74     38     15    321    0    0     0     0  236  194  0  0 100  0
 0  0     74     38     15    321    0    0     0   128  276  207  0  0 100  0
 0  0     74     38     15    321    0    0     0     0  298  203  0  0 100  0
 0  0     74     38     15    321    0    0     0    12  262  212  0  0 98  2
 0  0     74     38     15    321    0    0     0     0  253  200  0  0 100  0
 0  0     74     38     15    321    0    0     0     0  245  200  0  0 100  0
 0  0     74     38     15    321    0    0     0    88  276  219  0  0 100  0
 0  0     74     38     15    321    0    0     0     0  273  202  0  0 100  0
 0  0     74     38     15    321    0    0     0     0  233  198  0  0 100  0
 1  0     74     38     15    321    0    0     0     0  941  399  3  0 97  0
 0  0     74     38     15    321    0    0     0     0  685  242  2  0 98  0
 0  0     74     38     15    321    0    0     0     0  826  430  0  0 100  0
 0  0     74     37     15    321    0    0     0     0  813  446  1  0 99  0
 0  0     74     38     15    321    0    0     0     4  686  272  2  0 98  0

At some point it was getting a little busier but there wasn't any visible effect:

 0  0     77     17      3    327    0    0     0     0  284  199  0  0 100  0
 0  4     77     16      3    327    0    0   368  2380 1033  572  0  1 55 44
 0  1     77     14      3    328    0    0   300     0  976  497  0  1 67 32
 0  1     77     11      3    328    0    0    92     0  747  322  1  0 80 18
 0  2     77     10      3    328    0    0   528    60 1119  545  1  0 78 21
 1  2     77      9      3    328    0    0   552     4 1529  798  1  0 66 33
 0  4     77      6      3    327    0    0  1804   296 3273  820 10  6 44 40
 0  5     78      7      3    325    0    0  1592   276 1315  643  1  1 20 79
 0  2     78      6      3    325    0    0   612    36 1959  593  6  1 46 47
 1  2     78      5      2    323    0    0  2760   108 2355 1067  2  1 50 46
 0  2     78      6      2    321    0    0  2060     4 1984  791  2  1 53 44
 0  1     78     22      2    322    0    0   860     8 1917  618  3  2 73 22
 0  1     78     22      2    322    0    0   340     0 1869  378  4  5 86  5

Then I refreshed a page on a test copy of my forum and it did this:

 0  3     88     13      1    325    0    0  1168     0 3535 1412 12  1 62 25
 2  0     88      6      1    331    0    0  6400     0 2138 1237  1  1 53 45
 0  2     88      5      1    331    0    0 39632     4 6142 3782  1  2 59 39
 0  2     88      6      1    331    0    0 33600   108 4010 2777  0  1 60 38
 0  3     88      6      1    331    0    0 17480     4 2484 1641  1  1 43 55
 0  2     88      5      1    332    0    0 37456     0 8287 5244  5  2 45 49
 0  6     88      7      1    329    0    0 28508    28 7055 4424  2  3 54 41
 0  6     88      5      1    331    0    0  2792     0 1833  990  1  0 15 84
 0  4     88      5      1    331    0    0  3120     4 1507 1025  0  0  6 94
 0  4     88      5      1    328    0    0  2924     4 1317  865  0  1  7 92
 0  7     88      6      1    327    0    0  1744     8  979  654  0  0 17 82
 1  7     88      5      1    327    0    0  2336    88 2119  772  8  0  2 90
 0  8     88     11      1    319    0    0  5944     8 2468 1507  2  1  7 90
 0  3     88      6      1    306    0    0  4428    12 3176 1786  5  2 14 80
 0  4     88      5      1    305    0    0  3692     0 1853 1301  1  1 38 61
 0  7     88      5      1    287    0    0  4728     0 3078 1637  7  1  9 83
 0  7     88      6      1    289    0    0  4204   116 3136 1545 11  2  9 78
 1  8     88      5      1    288    0    0  2984   264 1672 1164  1  1  6 92
 0  9     88      5      1    289    0    0  1868     0  952  808  1  1  6 93

Those are all output from vmstat -SM 1

Edit:

I just checked with iotop and it is -all- mysql.

That seems more than ridiculous…

All four cores stuck in iowaits while reading just a few MB in?

And that all with… three hunded megabytes of cache ram available?!

Good news is that you're not swapping. Bad news is that it seems your disk access times are super slow. You may want to open a support ticket asking "is there a disk hog on my host, or is it me, after all?"

EDIT: Well, I assume that the "blocks in" count is either in sectors, or some small units like 1kb/4kb. If it's in megabytes, it sure is your own fault.

EDIT2: It's 1kb for me. So, yeah, the worst spike is ~ 30MB of a database table, and your usual is below a megabyte/s.

Tuning MySQL (especially the tablecache/resultcache) may help a lot, but still, the disks' response should be faster.

Yes, there's an I/O hog on the host: it's this Linode! :-) Pulling that much from the disk is going to take a long time and should not happen with a properly-tuned database system.

http://www.mysqltuner.pl/ <–- run this script, see what it says. Odds are good you've got either bad queries or a bad schema, or your MySQL config is out of whack.

Here is ~15 queries/second on a MySQL server for a handful of vBulletin and Wordpress sites (only about ~2.5 pageviews/second right now, since it's a weekend):

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0    100 274060 250136 1092832    0    0     0     0  174   25  0  0 100  0
 0  0    100 274060 250136 1092832    0    0     0     0   96   24  0  0 100  0
 0  0    100 274060 250136 1092832    0    0     0     0  102   26  0  0 100  0
 0  0    100 274060 250136 1092832    0    0     0     0   99   22  0  0 100  0
 0  0    100 274060 250136 1092832    0    0     0     0  634  291  0  0 100  0
 0  0    100 273756 250136 1092832    0    0     0     0 1099  552  0  0 100  0
 0  0    100 273908 250136 1092832    0    0     0     0  453  164  0  0 100  0
 0  0    100 273908 250136 1092832    0    0     0     0  167   83  0  0 100  0
 0  0    100 273908 250136 1092832    0    0     0     0  227  165  0  0 100  0
 0  0    100 273908 250136 1092836    0    0     0     0   86   22  0  0 100  0

… ok, so it's a 2 GB instance, but still. The web servers' disks are typically busier, just from logging.

I agree with rsk - you should open a ticket for this. Either the host itself is having trouble, or it's just an ugly set of neighbors to be with. If it's not something that Linode can directly correct, in my experience they've been willing to offer a migration to a new host. While that won't really guarantee anything, I can't really think how you could be any worse off.

– David

I'll query linode about it but until I hear back from them I'm going to assume it's my fault and keep tuning.

I am -sure- my mysql conf is whacked but I think there is further issues as well. 60MB folder copy using cp -a ~otheruser/folder ./folder took 10 minutes and thrashed the system into 500 errors the entire time… 60mb. that should take seconds in my book.

I grabbed the mysqltuner and got this:

-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.1.41-3ubuntu12.6
[OK] Operating on 32-bit architecture with less than 2GB RAM

-------- Storage Engine Statistics -------------------------------------------
[--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 409M (Tables: 444)
[--] Data in InnoDB tables: 16K (Tables: 1)
[!!] Total fragmented tables: 26

-------- Performance Metrics -------------------------------------------------
[--] Up for: 2d 17h 28m 45s (4M q [17.377 qps], 6K conn, TX: 3B, RX: 1B)
[--] Reads / Writes: 50% / 50%
[--] Total buffers: 42.0M global + 2.6M per thread (151 max threads)
[!!] Maximum possible memory usage: 438.4M (87% of installed RAM)
[OK] Slow queries: 0% (40/4M)
[OK] Highest usage of available connections: 15% (24/151)
[!!] Key buffer size / total MyISAM indexes: 16.0K/125.6M
[!!] Key buffer hit rate: 89.4% (371M cached / 39M reads)
[OK] Query cache efficiency: 61.7% (1M cached / 2M selects)
[!!] Query cache prunes per day: 5982
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 19K sorts)
[!!] Joins performed without indexes: 4292
[!!] Temporary tables created on disk: 44% (11K on disk / 26K total)
[OK] Thread cache hit rate: 98% (121 created / 6K connections)
[!!] Table cache hit rate: 0% (64 open / 6K opened)
[OK] Open file limit used: 9% (100/1K)
[OK] Table locks acquired immediately: 99% (3M immediate / 3M locks)
[OK] InnoDB data size / buffer pool: 16.0K/8.0M

-------- Recommendations -----------------------------------------------------
General recommendations:
    Run OPTIMIZE TABLE to defragment tables for better performance
    Reduce your overall MySQL memory footprint for system stability
    Enable the slow query log to troubleshoot bad queries
    Adjust your join queries to always utilize indexes
    When making adjustments, make tmp_table_size/max_heap_table_size equal
    Reduce your SELECT DISTINCT queries without LIMIT clauses
    Increase table_cache gradually to avoid file descriptor limits
Variables to adjust:
    key_buffer_size (> 125.6M)
    query_cache_size (> 16M)
    join_buffer_size (> 128.0K, or always use indexes with joins)
    tmp_table_size (> 16M)
    max_heap_table_size (> 16M)
    table_cache (> 64)

I futzed with a couple mysql settings, restarted and ran the optimize tables on stuff for my forum and a test copy board. It trashed for about 45 seconds each while doing this and I ran it again to get:

 >>  MySQLTuner 1.0.1 - Major Hayden <major@mhtx.net>>>  Bug reports, feature requests, and downloads at http://mysqltuner.com/
 >>  Run with '--help' for additional options and output filtering
Please enter your MySQL administrative login: root
Please enter your MySQL administrative password:

-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.1.41-3ubuntu12.6
[OK] Operating on 32-bit architecture with less than 2GB RAM

-------- Storage Engine Statistics -------------------------------------------
[--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 408M (Tables: 444)
[--] Data in InnoDB tables: 16K (Tables: 1)
[!!] Total fragmented tables: 16

-------- Performance Metrics -------------------------------------------------
[--] Up for: 8m 59s (5K q [9.929 qps], 527 conn, TX: 6M, RX: 1M)
[--] Reads / Writes: 68% / 32%
[--] Total buffers: 106.0M global + 2.6M per thread (50 max threads)
[OK] Maximum possible memory usage: 237.2M (47% of installed RAM)
[OK] Slow queries: 0% (2/5K)
[OK] Highest usage of available connections: 28% (14/50)
[!!] Key buffer size / total MyISAM indexes: 64.0M/118.9M
[!!] Key buffer hit rate: 78.7% (585K cached / 124K reads)
[OK] Query cache efficiency: 37.4% (1K cached / 3K selects)
[OK] Query cache prunes per day: 0
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 29 sorts)
[OK] Temporary tables created on disk: 16% (57 on disk / 349 total)
[OK] Thread cache hit rate: 97% (14 created / 527 connections)
[!!] Table cache hit rate: 1% (64 open / 5K opened)
[OK] Open file limit used: 8% (91/1K)
[OK] Table locks acquired immediately: 98% (3K immediate / 3K locks)
[OK] InnoDB data size / buffer pool: 16.0K/8.0M

-------- Recommendations -----------------------------------------------------
General recommendations:
    Run OPTIMIZE TABLE to defragment tables for better performance
    MySQL started within last 24 hours - recommendations may be inaccurate
    Enable the slow query log to troubleshoot bad queries
    Increase table_cache gradually to avoid file descriptor limits
Variables to adjust:
    key_buffer_size (> 118.9M)
    table_cache (> 64)</major@mhtx.net> 

Settings I had in my.cnf where something I had found around the linode forums or library I can't remember.

Here is current my.cnf:

[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock

[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = 0

[mysqld]
user            = mysql
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /var/lib/mysql
tmpdir          = /tmp
skip-external-locking

bind-address            = 127.0.0.1

max_connections = 50

key_buffer              = 64M
max_allowed_packet      = 1M
thread_stack            = 64K
thread_cache_size       = 8

myisam-recover         = BACKUP

query_cache_limit       = 1M
query_cache_size        = 16M

log_error                = /var/log/mysql/error.log

expire_logs_days        = 10
max_binlog_size         = 100M

#innodb_file_per_table = 1

[mysqldump]
quick
quote-names
max_allowed_packet      = 16M

[mysql]

[isamchk]
key_buffer              = 16M

!includedir /etc/mysql/conf.d/

Thanks again for all of your help, I really appreciate it. Hope I can pay it forward at some point!

@rdaruszka:

I'll query linode about it but until I hear back from them I'm going to assume it's my fault and keep tuning.
No harm in that (other than diminishing returns past a certain point), but I suspect what's been happening, at least most recently, is that your tuning is helping to prevent you from having to hit the disks much at all, so basically avoiding the I/O contention or minimizing it's impact.

But if copying 10s of MB (like your 60MB example) is thrashing things for 10-15 minutes, there's not a lot of tuning you'll be able to do to address that. Eventually you'll need to be able to do a reasonable amount of I/O every so often.

A quick test on one of my Linode 512s for a 60MB copy took about 2 seconds, and I managed to catch an I/O wait status of 10% in one of those seconds.

It might be interesting to check an "hdparm -t" test on your Linode. It's a read test, but I'm pretty sure will still suffer similarly under contention. Numbers can vary widely, but I think even in poor circumstances I've seen decent double digit MB/s on my Linodes, and most often it's 100+.

– David

Just heard back from linode, they tweaked some settings and I reran the disc copy. It completed in under a second. Fiddles with a couple mysql things and they completed reasonably quickly as well.

If there are anythings in my mysql conf you see that should be tweaked I'd love to hear about it though. No reason not to fix things just because they haven't caused a problem yet.

I'm probably going to spend some time intentionally trying to make it thrash but I'm hopeful at this point.

Okay there was definitely something up on that node.

I just did a full backup copy and update to 2.0 in 15 minutes. It trashed a bit and did some serious disk IO(it is 250+MB of data and I was changing column types, migrating data, adding indexes…) however I was able to browse the primary forum during it, It did slow down at times and I did have a couple of 500 errors but it worked.

Last time I tried this I had the entire place in 500 errors for an hour before I gave up.

@rdaruszka:

Okay there was definitely something up on that node.
You mean your (Li)node, or the physical "host"?

I guess the physical host, they said they "made some changes". I don't know what that means exactly but I'm not having nearly the problems I was :D

nods!

Thanks for clarification!

@rdaruszka:

Just heard back from linode, they tweaked some settings and I reran the disc copy. It completed in under a second. Fiddles with a couple mysql things and they completed reasonably quickly as well.
Excellent. That sounds like more typical performance.

> If there are anythings in my mysql conf you see that should be tweaked I'd love to hear about it though. No reason not to fix things just because they haven't caused a problem yet.
Not sure if the "you" here was aimed at my post or just more in general, but just in case it's the former and lest you feel ignored, I'm a PostgreSQL guy, so my mysql experience is too limited to be of much help.

– David

It was general, but I appreciate the concern.

Right now I'm feeling good about my server once again, but plan to keep on this MySQL thing until I'm sure I've tuned it as good as I can.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct