LAMP Stack taking up swap space can't figure out why
I've got a standard lamp stack with webmin and virtualmin on top of it and since inception I've been slowly working through various issues slowing the system down. The biggest problem I've solved so far was that logrotates compression would take my site down for 10-15 minute every friday when it compressed the apache access logs.
Right now the site is accessible and rarely has any issues (a few 500's here and there, not sure what's causing them exactly) but I've got swap usage I can't explain and performance issues whenever I do anything disc related.
free -m output:
total used free shared buffers cached
Mem: 498 476 22 0 4 228
-/+ buffers/cache: 243 255
Swap: 255 122 133
Here's a selection of top sorted by swap usage:
15317 mysql 20 0 123m 25m 2668 S 0 5.2 676:39.96 97m mysqld
13603 www-data 20 0 97640 5792 2480 S 0 1.1 0:00.00 89m apache2
13572 www-data 20 0 98216 6476 2560 S 0 1.3 0:00.04 89m apache2
13468 www-data 20 0 98884 7152 2576 S 0 1.4 0:00.18 89m apache2
13581 www-data 20 0 98220 6500 2556 S 0 1.3 0:00.04 89m apache2
2118 bind 20 0 74192 756 420 S 0 0.1 0:00.53 71m named
10952 irc 20 0 35784 6708 1116 S 0 1.3 31:43.05 28m python
11282 www-data 20 0 30988 3304 840 S 0 0.6 0:00.41 27m apache2
19595 pss 20 0 31668 4056 2540 S 0 0.8 0:00.03 26m php5-cgi
8448 test.rae 20 0 31408 5268 3228 S 0 1.0 0:00.04 25m php5-cgi
11852 talizora 20 0 35448 10m 4324 S 0 2.1 0:29.72 24m php5-cgi
11856 talizora 20 0 35448 10m 4324 S 1 2.1 0:32.84 24m php5-cgi
11862 talizora 20 0 39156 14m 4324 S 7 2.9 0:33.27 23m php5-cgi
11841 talizora 20 0 36960 12m 4320 S 0 2.5 0:32.63 23m php5-cgi
11858 talizora 20 0 36756 12m 4320 S 0 2.4 0:36.07 23m php5-cgi
11864 talizora 20 0 37020 12m 4324 S 0 2.5 0:36.61 23m php5-cgi
11866 talizora 20 0 36732 12m 4324 S 0 2.4 0:34.39 23m php5-cgi
11869 talizora 20 0 36632 11m 4324 S 0 2.4 0:26.17 23m php5-cgi
11872 talizora 20 0 37512 12m 4324 S 0 2.6 0:37.67 23m php5-cgi
11854 talizora 20 0 36768 12m 4384 S 0 2.4 0:35.97 23m php5-cgi
11860 talizora 20 0 36704 12m 4404 S 0 2.4 0:32.61 23m php5-cgi
When left alone everything runs fine, but when I go do anything much at all the entire system goes crazy. I used cp to copy 60 megs of stuff, it took 10 minutes to copy and load average went up to 10-15 during it and the whole site 500 errored on any request while it did. I did this during a slow period on accesses so it would interfere as little as possible and load average was 0.02 when I started the copy.
I'm not sure if my mysql/apache2 confs or which other configs or information would be most helpful and I've run out of places to look to solve it myself.
I have looked through the low memory MySQL/Apache config information here and limited some things (running mpm worker limited to 25 clients) but I still have these oddball issues I can't explain.
Thanks in advance for any advice,
-Rael
25 Replies
The stock lighttpd configuration has 2 parent PHP processes with 4 children each, and even that's overkill for most people.
it is mpm_worker with fastcgi.
That shot was also taken nearby peak load time and almost all traffic is dynamic pages. The entire site is a forum so everything runs through PHP…
It averages 6 requests per second overall and the place is pretty much dead quiet for 12 hours a day. My traffic is pretty peaky.
Last night I rebooted the server and it's already got 50MB of swap used… I have more free memory (not cache or buffers, truly free memory) than I have used swap space… I don't understand why it's swapped anything at all.
Relevant apache config (as far as I know, if more is needed please let me know):
<ifmodule mpm_worker_module="">StartServers 1
ServerLimit 10
MinSpareThreads 5
MaxSpareThreads 20
#ThreadLimit 25
ThreadsPerChild 5
MaxClients 25
MaxRequestsPerChild 1000</ifmodule>
<ifmodule mod_fcgid.c="">AddHandler fcgid-script .fcgi
FcgidConnectTimeout 20</ifmodule>
I'm not only trying to handle the regular traffic but deal with some rather sharp peaks of traffic I get on occasion. Before I got settings more calmed down I had a bunch of users show up and the whole place OOMed out on me. To give you an idea of what is "normal" for unexpected traffic I've had a single user show up and make 800 posts in one day, they brought the entire days average up by 3 requests per second. I also have had bursts where 20-40 new users will just show up and start posting 2-3 times per minute each on top of their browsing. That's why I have my max clients up as high as I do.
Again I'm really at a loss to explain the behavior of the software I just don't know enough about what I'm working with. If anyone can give me some idea of what I'm doing wrong I'd appreciate it.
I'm not sure how to lower the number of PHP processes, where do I need to go in config's to handle that sort of thing?
As for swap, you will always use some swap.
Swap is not bad, moving stuff in and out of swap all the time IS bad.
Swap is basically used for 1) storing stuff that isn't used very often so the memory can be freed up for something else, and 2) acting as extra ram.
The 2nd one on a server is the bad one. The first one is a good thing.
@obs:
You should install something like munin, it's really good for showing you what's a resource hog.
As for swap, you will always use some swap.
Swap is not bad, moving stuff in and out of swap all the time IS bad.
Swap is basically used for 1) storing stuff that isn't used very often so the memory can be freed up for something else, and 2) acting as extra ram.
The 2nd one on a server is the bad one. The first one is a good thing. I'll second that. Swapping out unused memory pages is a good thing. It means that freed RAM can be used for things that need it. If you are interested, read more here:
http://kerneltrap.org/node/3000 .
Also, you can look into moving to nginx+php-fpm+mariadb/percona
@rdaruszka:
<ifmodule mpm_worker_module="">StartServers 1 ServerLimit 10 MinSpareThreads 5 MaxSpareThreads 20 #ThreadLimit 25 ThreadsPerChild 5 MaxClients 25 MaxRequestsPerChild 1000</ifmodule>
This looks reasonable, although MaxClients might be a bit low for mpm_worker. But I'll defer to others here since I'm not an apache expert.
@rdaruszka:
<ifmodule mod_fcgid.c="">AddHandler fcgid-script .fcgi FcgidConnectTimeout 20</ifmodule>
Caveat for this section of my post: I'm a lighttpd user, so I'm thinking in terms of Lighttpd's fastcgi module.
Back to the age-old topic, "Apache's default settings are insanely stupid", FcgidMaxProcessesPerClass defaults to 100 processes. At PHP's max memory load, that'd be about 13 gigabytes of RAM. A sane limit for this setting is 8. The default of 100 means you can handle 100 simultaneous requests, but since you've only got access to four processor cores, that's pointless. 4 is theoretically enough to max out your CPU resources, but because stuff can be waiting on IO or other things, 8 gives you a 2:1 ratio to help keep those processors busy. Anything beyond that is starting to get pretty pointless on a quadcore processor unless you've got really long-running scripts (which a forum tends not to). You've already got 13 processes that I can see in that 'top', so you're already getting more than you need.
With lighttpd, at least, if there is a request that comes in while all 8 processes are busy, then it queues. That's the best policy, because it's better to queue up the requests than trying to serve them all at the same time if you've got an insane load spike.
@rdaruszka:
I'm not only trying to handle the regular traffic but deal with some rather sharp peaks of traffic I get on occasion. Before I got settings more calmed down I had a bunch of users show up and the whole place OOMed out on me. To give you an idea of what is "normal" for unexpected traffic I've had a single user show up and make 800 posts in one day, they brought the entire days average up by 3 requests per second. I also have had bursts where 20-40 new users will just show up and start posting 2-3 times per minute each on top of their browsing. That's why I have my max clients up as high as I do.
OOMs only happen because the webserver is misconfigured. No amount of load should cause the server to OOM, even if it's enough to make it slow to a crawl. Excess requests should be queued, so an OOM is indicative that you've configured your server to try to handle too many simultaneous requests.
40 users posting 3 times per minute is only 2 posts per second. That's pretty low load, not a huge load spike. Besides, for all this stuff, your database is probably the bottleneck if anything. Also, remember that 40 users posting 3 times per minute are not consuming 40 clients worth of resources. If your forum software is very slow and it takes half a second to process a request to make a post, then that user consumes one client worth of resources for half a second, and then is not connected to your server for the next 19.5 seconds… although keepalives might change this.
Again I'm really at a loss to explain the behavior of the software I just don't know enough about what I'm working with. If anyone can give me some idea of what I'm doing wrong I'd appreciate it.
@rdaruszka:
I'm not sure how to lower the number of PHP processes, where do I need to go in config's to handle that sort of thing?
I could tell you with more authority for lighttpd… If these settings are equivalent to what I'm used to, I would set FcgidMaxProcessesPerClass to 8, and FcgidMinProcessesPerClass to 4 (although the default of 3 is probably fine)… I'd wait for more apache-oriented people to comment.
Also. 5 threads per worker subprocess is crazy low, IMO…
This all is based in guesses, empirical evidence, and possibly wrong conclusions. All I can say is that I use 25 threads per worker, and 16 PHP engines, and
a) don't suffer from swapping, actually have more than half of RAM available for cache
b) Can easily handle ~500 static requests/s AND ~50 PHP requests/s
c) never had more than 75 apache threads active or more than 25/400% CPU under real world load, and even these happened only a few times.
Btw, doesn't fcgid spawn each fastcgi subprocess separately, so they can't share an APC cache if installed? (I use mod_fastcgi).
EDIT/PS. Not claiming values quited are awesome in general, just saying "they are fulfilling my particular needs very good".
Emufarmers: SMF 1.1.11, looking to upgrade to 2.0. Had APC running at one point but it was getting more 500's with it than without so I pulled it better a working poorly site than a non working site and I didn't have a solution. I don't believe APC was actually causing any problems and could probably go back in cleanly right now without issue.
obs: Had munin installed couldn't figure out what the problems where from it, it was causing some issues during it's update cycles I couldn't explain and it wasn't helping me so I pulled it.
Net-burst: Been considering something like that but not sure, this is "production" I can pull it down but I like to do so for as short of time as possible. Experimentation isn't a good thing.
Guspaz: Thanks for the call on the MaxClients, I think I had it as high as 40 at one point and dropped it to test if that was causing problems… It wasn't just haven't reverted. Would love to hear from others. on this as well.
I'm reading this as "add FcgidMaxProcessesPerClass being set to 8 into this section" correct interpretation?
The OOM was a misconfig, it was in full defaults on everything back then, MPM Prefork, something like 300 max clients, it was a disaster. Totally not a current issue.
true 40 users * 3 per minute is only two posts per second. But they are all F5 happy so there is a lot of reading load happening.
I think the keep alive is at like 5 seconds or something, I'd have to check.
Thanks for the hint on Fcgid settings, I'll look into it.
rsk: Thanks for the information I'll look at tweaking.
To clarify a few things I am running about 50% of ram free for cache. All of the data for the site fits into about 260MB (DB and scripts/image/etc). Under normal situations I do not have a problem. The biggest concern I had was when I copied a 60MB folder and load ran to 10-15 and the whole site was 500 errored because PHP was taking longer than 30 seconds per request for about 15 minutes. I cannot for the life of me understand what could cause this.
Yeah I need to tighten up my web server so it runs better and consensus sounds like "Up max clients, up threads per worker, set fgci to 8ish procs, get APC installed" but it does serve the pages fine right now, just so long as I don't get anywhere near the disk while it's doing it…
Also I do have some linux experience as well, "some" means "can install gentoo on raid 5 without looking up docs". I have almost no experience running it as a web server but it's my primary desktop/laptop OS.
I set the Fcgimax discussed above to 8. I altered the mpm_worker to 10 threads (from 5) and 30 maxclients(from 25)
I ran apachebench on a -short- static page and got results of around 2500-2800 requests per second. This test was run for 50,000 requests total with a concurrency of 10 up to 100. with very little change in the requests/second. Almost no load on the cpu, caching kept the few bytes of content from hitting the disc. This seemed reasonable so I moved on to testing dynamic content.
I tested against the board view of the forum and code the following:
Concurrency Level: 10
Time taken for tests: 21.107 seconds
Complete requests: 500
Failed requests: 54
(Connect: 0, Receive: 0, Length: 54, Exceptions: 0)
Write errors: 0
Total transferred: 21930420 bytes
HTML transferred: 21731920 bytes
Requests per second: 23.69 [#/sec] (mean)
Time per request: 422.142 [ms] (mean)
Time per request: 42.214 [ms] (mean, across all concurrent requests)
Transfer rate: 1014.65 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 167 418 1132.6 329 20371
Waiting: 167 418 1132.7 329 20371
Total: 167 419 1132.6 329 20371
Percentage of the requests served within a certain time (ms)
50% 329
66% 334
75% 338
80% 343
90% 368
95% 404
98% 478
99% 566
100% 20371 (longest request)
During this test cpu load hit 100% though I didn't notice any IO load. The giant variance in times kinda bothers me… anywhere from 0.16 seconds up to 20 seconds? Also a concurrency of 1 results in about 5.5 per second being handled. The rates seem about right considering there is no caching involved. 0.167 seconds per request to process is ~6 per second and 4 cores… just under 4 times as many. That large variance in time kinda bothers me though. Seems about 10 seconds too long… Shouldn't things be FIFO? Seems to me that would be a max of about 2 seconds…
I'm going to see if I can't get apc going again and bench that as well but I might not be able to…
110 requests per second from the same page below. This was pretty well the rate when I used anywhere from 10 up to 1000 concurrent requests (not sure how well 10,000 concurrent slams things but I'd assume it's not what would be called "pretty")
So I'm satisfied with general performance. I am concerned by the large variance in my last post. When you're attempting 1000 connections at once when the server will only handle 30 at a time you are of course going to have issues, but a mere 10? That massive length on the long end just feels about 10 times longer than it should be. Am I off my a factor of 10 in my conversion? 20371ms is 20.371 seconds right?
And none of this really explains why a simple file copy would cause so much load… That is the factor that really bothers me. I'd like to be able to copy the sites files to a test virtual server without worrying about the server exploding on me while I do.
Watch 'vmstat 1' (or 'dstat -cdg') for a while, and compare the CPU time in 'wa'it state with amount of disk io (bi/bo si/so for block in/out swap in/out in case of iostat).
If the former is very high, but latter are small, it may be some other Linode at your physical host hogging the disks like crazy, and making yours (and everyone else's) block for looong each time it tries to read or write something.
@rsk:
Hmm.
Watch 'vmstat 1' (or 'dstat -cdg') for a while, and compare the CPU time in 'wa'it state with amount of disk io (bi/bo si/so for block in/out swap in/out in case of iostat).
If the former is very high, but latter are small, it may be some other Linode at your physical host hogging the disks like crazy, and making yours (and everyone else's) block for looong each time it tries to read or write something.
I'm about to do some stuff that pissed the server off royally last time I tried it so I'll be sure to do this while I try it
Thanks for all of your suggestions. I do suspect that tuning MySQL is my next target in tuning this thing up so if I start posting a bunch of stuff about that later today you know why.
before I started doing things:
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 74 38 15 321 0 0 0 0 274 197 0 0 100 0
0 0 74 38 15 321 0 0 0 4 277 213 0 0 100 0
0 0 74 38 15 321 0 0 0 0 236 196 0 0 100 0
0 0 74 38 15 321 0 0 0 0 278 214 0 0 100 0
0 0 74 38 15 321 0 0 0 0 241 204 0 0 100 0
0 0 74 38 15 321 0 0 0 0 236 194 0 0 100 0
0 0 74 38 15 321 0 0 0 128 276 207 0 0 100 0
0 0 74 38 15 321 0 0 0 0 298 203 0 0 100 0
0 0 74 38 15 321 0 0 0 12 262 212 0 0 98 2
0 0 74 38 15 321 0 0 0 0 253 200 0 0 100 0
0 0 74 38 15 321 0 0 0 0 245 200 0 0 100 0
0 0 74 38 15 321 0 0 0 88 276 219 0 0 100 0
0 0 74 38 15 321 0 0 0 0 273 202 0 0 100 0
0 0 74 38 15 321 0 0 0 0 233 198 0 0 100 0
1 0 74 38 15 321 0 0 0 0 941 399 3 0 97 0
0 0 74 38 15 321 0 0 0 0 685 242 2 0 98 0
0 0 74 38 15 321 0 0 0 0 826 430 0 0 100 0
0 0 74 37 15 321 0 0 0 0 813 446 1 0 99 0
0 0 74 38 15 321 0 0 0 4 686 272 2 0 98 0
At some point it was getting a little busier but there wasn't any visible effect:
0 0 77 17 3 327 0 0 0 0 284 199 0 0 100 0
0 4 77 16 3 327 0 0 368 2380 1033 572 0 1 55 44
0 1 77 14 3 328 0 0 300 0 976 497 0 1 67 32
0 1 77 11 3 328 0 0 92 0 747 322 1 0 80 18
0 2 77 10 3 328 0 0 528 60 1119 545 1 0 78 21
1 2 77 9 3 328 0 0 552 4 1529 798 1 0 66 33
0 4 77 6 3 327 0 0 1804 296 3273 820 10 6 44 40
0 5 78 7 3 325 0 0 1592 276 1315 643 1 1 20 79
0 2 78 6 3 325 0 0 612 36 1959 593 6 1 46 47
1 2 78 5 2 323 0 0 2760 108 2355 1067 2 1 50 46
0 2 78 6 2 321 0 0 2060 4 1984 791 2 1 53 44
0 1 78 22 2 322 0 0 860 8 1917 618 3 2 73 22
0 1 78 22 2 322 0 0 340 0 1869 378 4 5 86 5
Then I refreshed a page on a test copy of my forum and it did this:
0 3 88 13 1 325 0 0 1168 0 3535 1412 12 1 62 25
2 0 88 6 1 331 0 0 6400 0 2138 1237 1 1 53 45
0 2 88 5 1 331 0 0 39632 4 6142 3782 1 2 59 39
0 2 88 6 1 331 0 0 33600 108 4010 2777 0 1 60 38
0 3 88 6 1 331 0 0 17480 4 2484 1641 1 1 43 55
0 2 88 5 1 332 0 0 37456 0 8287 5244 5 2 45 49
0 6 88 7 1 329 0 0 28508 28 7055 4424 2 3 54 41
0 6 88 5 1 331 0 0 2792 0 1833 990 1 0 15 84
0 4 88 5 1 331 0 0 3120 4 1507 1025 0 0 6 94
0 4 88 5 1 328 0 0 2924 4 1317 865 0 1 7 92
0 7 88 6 1 327 0 0 1744 8 979 654 0 0 17 82
1 7 88 5 1 327 0 0 2336 88 2119 772 8 0 2 90
0 8 88 11 1 319 0 0 5944 8 2468 1507 2 1 7 90
0 3 88 6 1 306 0 0 4428 12 3176 1786 5 2 14 80
0 4 88 5 1 305 0 0 3692 0 1853 1301 1 1 38 61
0 7 88 5 1 287 0 0 4728 0 3078 1637 7 1 9 83
0 7 88 6 1 289 0 0 4204 116 3136 1545 11 2 9 78
1 8 88 5 1 288 0 0 2984 264 1672 1164 1 1 6 92
0 9 88 5 1 289 0 0 1868 0 952 808 1 1 6 93
Those are all output from vmstat -SM 1
Edit:
I just checked with iotop and it is -all- mysql.
All four cores stuck in iowaits while reading just a few MB in?
And that all with… three hunded megabytes of cache ram available?!
Good news is that you're not swapping. Bad news is that it seems your disk access times are super slow. You may want to open a support ticket asking "is there a disk hog on my host, or is it me, after all?"
EDIT: Well, I assume that the "blocks in" count is either in sectors, or some small units like 1kb/4kb. If it's in megabytes, it sure is your own fault.
EDIT2: It's 1kb for me. So, yeah, the worst spike is ~ 30MB of a database table, and your usual is below a megabyte/s.
Tuning MySQL (especially the tablecache/resultcache) may help a lot, but still, the disks' response should be faster.
Here is ~15 queries/second on a MySQL server for a handful of vBulletin and Wordpress sites (only about ~2.5 pageviews/second right now, since it's a weekend):
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 100 274060 250136 1092832 0 0 0 0 174 25 0 0 100 0
0 0 100 274060 250136 1092832 0 0 0 0 96 24 0 0 100 0
0 0 100 274060 250136 1092832 0 0 0 0 102 26 0 0 100 0
0 0 100 274060 250136 1092832 0 0 0 0 99 22 0 0 100 0
0 0 100 274060 250136 1092832 0 0 0 0 634 291 0 0 100 0
0 0 100 273756 250136 1092832 0 0 0 0 1099 552 0 0 100 0
0 0 100 273908 250136 1092832 0 0 0 0 453 164 0 0 100 0
0 0 100 273908 250136 1092832 0 0 0 0 167 83 0 0 100 0
0 0 100 273908 250136 1092832 0 0 0 0 227 165 0 0 100 0
0 0 100 273908 250136 1092836 0 0 0 0 86 22 0 0 100 0
… ok, so it's a 2 GB instance, but still. The web servers' disks are typically busier, just from logging.
– David
I am -sure- my mysql conf is whacked but I think there is further issues as well. 60MB folder copy using cp -a ~otheruser/folder ./folder took 10 minutes and thrashed the system into 500 errors the entire time… 60mb. that should take seconds in my book.
I grabbed the mysqltuner and got this:
-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.1.41-3ubuntu12.6
[OK] Operating on 32-bit architecture with less than 2GB RAM
-------- Storage Engine Statistics -------------------------------------------
[--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 409M (Tables: 444)
[--] Data in InnoDB tables: 16K (Tables: 1)
[!!] Total fragmented tables: 26
-------- Performance Metrics -------------------------------------------------
[--] Up for: 2d 17h 28m 45s (4M q [17.377 qps], 6K conn, TX: 3B, RX: 1B)
[--] Reads / Writes: 50% / 50%
[--] Total buffers: 42.0M global + 2.6M per thread (151 max threads)
[!!] Maximum possible memory usage: 438.4M (87% of installed RAM)
[OK] Slow queries: 0% (40/4M)
[OK] Highest usage of available connections: 15% (24/151)
[!!] Key buffer size / total MyISAM indexes: 16.0K/125.6M
[!!] Key buffer hit rate: 89.4% (371M cached / 39M reads)
[OK] Query cache efficiency: 61.7% (1M cached / 2M selects)
[!!] Query cache prunes per day: 5982
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 19K sorts)
[!!] Joins performed without indexes: 4292
[!!] Temporary tables created on disk: 44% (11K on disk / 26K total)
[OK] Thread cache hit rate: 98% (121 created / 6K connections)
[!!] Table cache hit rate: 0% (64 open / 6K opened)
[OK] Open file limit used: 9% (100/1K)
[OK] Table locks acquired immediately: 99% (3M immediate / 3M locks)
[OK] InnoDB data size / buffer pool: 16.0K/8.0M
-------- Recommendations -----------------------------------------------------
General recommendations:
Run OPTIMIZE TABLE to defragment tables for better performance
Reduce your overall MySQL memory footprint for system stability
Enable the slow query log to troubleshoot bad queries
Adjust your join queries to always utilize indexes
When making adjustments, make tmp_table_size/max_heap_table_size equal
Reduce your SELECT DISTINCT queries without LIMIT clauses
Increase table_cache gradually to avoid file descriptor limits
Variables to adjust:
key_buffer_size (> 125.6M)
query_cache_size (> 16M)
join_buffer_size (> 128.0K, or always use indexes with joins)
tmp_table_size (> 16M)
max_heap_table_size (> 16M)
table_cache (> 64)
I futzed with a couple mysql settings, restarted and ran the optimize tables on stuff for my forum and a test copy board. It trashed for about 45 seconds each while doing this and I ran it again to get:
>> MySQLTuner 1.0.1 - Major Hayden <major@mhtx.net>>> Bug reports, feature requests, and downloads at http://mysqltuner.com/
>> Run with '--help' for additional options and output filtering
Please enter your MySQL administrative login: root
Please enter your MySQL administrative password:
-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.1.41-3ubuntu12.6
[OK] Operating on 32-bit architecture with less than 2GB RAM
-------- Storage Engine Statistics -------------------------------------------
[--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 408M (Tables: 444)
[--] Data in InnoDB tables: 16K (Tables: 1)
[!!] Total fragmented tables: 16
-------- Performance Metrics -------------------------------------------------
[--] Up for: 8m 59s (5K q [9.929 qps], 527 conn, TX: 6M, RX: 1M)
[--] Reads / Writes: 68% / 32%
[--] Total buffers: 106.0M global + 2.6M per thread (50 max threads)
[OK] Maximum possible memory usage: 237.2M (47% of installed RAM)
[OK] Slow queries: 0% (2/5K)
[OK] Highest usage of available connections: 28% (14/50)
[!!] Key buffer size / total MyISAM indexes: 64.0M/118.9M
[!!] Key buffer hit rate: 78.7% (585K cached / 124K reads)
[OK] Query cache efficiency: 37.4% (1K cached / 3K selects)
[OK] Query cache prunes per day: 0
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 29 sorts)
[OK] Temporary tables created on disk: 16% (57 on disk / 349 total)
[OK] Thread cache hit rate: 97% (14 created / 527 connections)
[!!] Table cache hit rate: 1% (64 open / 5K opened)
[OK] Open file limit used: 8% (91/1K)
[OK] Table locks acquired immediately: 98% (3K immediate / 3K locks)
[OK] InnoDB data size / buffer pool: 16.0K/8.0M
-------- Recommendations -----------------------------------------------------
General recommendations:
Run OPTIMIZE TABLE to defragment tables for better performance
MySQL started within last 24 hours - recommendations may be inaccurate
Enable the slow query log to troubleshoot bad queries
Increase table_cache gradually to avoid file descriptor limits
Variables to adjust:
key_buffer_size (> 118.9M)
table_cache (> 64)</major@mhtx.net>
Settings I had in my.cnf where something I had found around the linode forums or library I can't remember.
Here is current my.cnf:
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
[mysqld]
user = mysql
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
skip-external-locking
bind-address = 127.0.0.1
max_connections = 50
key_buffer = 64M
max_allowed_packet = 1M
thread_stack = 64K
thread_cache_size = 8
myisam-recover = BACKUP
query_cache_limit = 1M
query_cache_size = 16M
log_error = /var/log/mysql/error.log
expire_logs_days = 10
max_binlog_size = 100M
#innodb_file_per_table = 1
[mysqldump]
quick
quote-names
max_allowed_packet = 16M
[mysql]
[isamchk]
key_buffer = 16M
!includedir /etc/mysql/conf.d/
Thanks again for all of your help, I really appreciate it. Hope I can pay it forward at some point!
@rdaruszka:
I'll query linode about it but until I hear back from them I'm going to assume it's my fault and keep tuning.
No harm in that (other than diminishing returns past a certain point), but I suspect what's been happening, at least most recently, is that your tuning is helping to prevent you from having to hit the disks much at all, so basically avoiding the I/O contention or minimizing it's impact.
But if copying 10s of MB (like your 60MB example) is thrashing things for 10-15 minutes, there's not a lot of tuning you'll be able to do to address that. Eventually you'll need to be able to do a reasonable amount of I/O every so often.
A quick test on one of my Linode 512s for a 60MB copy took about 2 seconds, and I managed to catch an I/O wait status of 10% in one of those seconds.
It might be interesting to check an "hdparm -t" test on your Linode. It's a read test, but I'm pretty sure will still suffer similarly under contention. Numbers can vary widely, but I think even in poor circumstances I've seen decent double digit MB/s on my Linodes, and most often it's 100+.
– David
If there are anythings in my mysql conf you see that should be tweaked I'd love to hear about it though. No reason not to fix things just because they haven't caused a problem yet.
I'm probably going to spend some time intentionally trying to make it thrash but I'm hopeful at this point.
I just did a full backup copy and update to 2.0 in 15 minutes. It trashed a bit and did some serious disk IO(it is 250+MB of data and I was changing column types, migrating data, adding indexes…) however I was able to browse the primary forum during it, It did slow down at times and I did have a couple of 500 errors but it worked.
Last time I tried this I had the entire place in 500 errors for an hour before I gave up.
@rdaruszka:
Okay there was definitely something up on that node.
You mean your (Li)node, or the physical "host"?
@rdaruszka:
Just heard back from linode, they tweaked some settings and I reran the disc copy. It completed in under a second. Fiddles with a couple mysql things and they completed reasonably quickly as well.
Excellent. That sounds like more typical performance.
> If there are anythings in my mysql conf you see that should be tweaked I'd love to hear about it though. No reason not to fix things just because they haven't caused a problem yet.
Not sure if the "you" here was aimed at my post or just more in general, but just in case it's the former and lest you feel ignored, I'm a PostgreSQL guy, so my mysql experience is too limited to be of much help.
– David
Right now I'm feeling good about my server once again, but plan to keep on this MySQL thing until I'm sure I've tuned it as good as I can.