Why my linodes CPU + Disk IO Spiking?
Here are my vm stats…
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 122780 187652 9476 277784 2 2 14 25 182 217 4 1 94 0
0 0 122780 192216 9752 279180 0 0 12 72 741 848 4 1 94 0
0 0 122780 193948 9848 279652 0 0 4 20 589 767 1 1 98 0
0 0 122780 207408 9964 280076 0 0 6 17 489 738 1 0 98 0
1 0 122780 177968 10088 280660 0 0 8 33 464 737 3 1 96 0
0 0 122780 183028 10232 281056 0 0 3 33 501 730 2 1 97 0
0 0 122780 185124 10400 281380 0 0 4 45 560 791 3 1 96 0
0 0 122780 185940 10708 283280 0 0 23 79 479 746 3 1 96 0
0 0 122780 181284 10868 284088 0 0 7 45 481 732 3 1 96 0
0 0 122780 191896 10944 284660 0 0 8 22 405 688 2 1 97 0
1 0 122780 184984 11348 286320 0 0 23 97 707 867 3 1 95 1
3 0 122780 173064 11740 287936 0 0 11 88 721 879 4 1 94 0
1 0 122780 176856 12012 288816 0 0 10 70 818 945 5 2 93 0
10 Replies
root 29 0.0 0.0 0 0 ? S< Aug12 0:00 [kblockd/2]
root 30 0.0 0.0 0 0 ? S< Aug12 0:00 [kblockd/3]
root 32 0.0 0.0 0 0 ? S< Aug12 0:00 [kseriod]
root 116 0.0 0.0 0 0 ? S Aug12 0:00 [pdflush]
root 117 0.0 0.0 0 0 ? S Aug12 0:00 [pdflush]
root 118 0.0 0.0 0 0 ? S< Aug12 0:07 [kswapd0]
root 119 0.0 0.0 0 0 ? S< Aug12 0:00 [aio/0]
root 120 0.0 0.0 0 0 ? S< Aug12 0:00 [aio/1]
root 121 0.0 0.0 0 0 ? S< Aug12 0:00 [aio/2]
root 122 0.0 0.0 0 0 ? S< Aug12 0:00 [aio/3]
root 124 0.0 0.0 0 0 ? S< Aug12 0:00 [jfsIO]
root 125 0.0 0.0 0 0 ? S< Aug12 0:00 [jfsCommit]
root 126 0.0 0.0 0 0 ? S< Aug12 0:00 [jfsCommit]
root 127 0.0 0.0 0 0 ? S< Aug12 0:00 [jfsCommit]
root 128 0.0 0.0 0 0 ? S< Aug12 0:00 [jfsCommit]
root 129 0.0 0.0 0 0 ? S< Aug12 0:00 [jfsSync]
root 130 0.0 0.0 0 0 ? S< Aug12 0:00 [xfslogd/0]
root 131 0.0 0.0 0 0 ? S< Aug12 0:00 [xfslogd/1]
root 132 0.0 0.0 0 0 ? S< Aug12 0:00 [xfslogd/2]
root 133 0.0 0.0 0 0 ? S< Aug12 0:00 [xfslogd/3]
root 134 0.0 0.0 0 0 ? S< Aug12 0:00 [xfsdatad/0]
root 135 0.0 0.0 0 0 ? S< Aug12 0:00 [xfsdatad/1]
root 136 0.0 0.0 0 0 ? S< Aug12 0:00 [xfsdatad/2]
root 137 0.0 0.0 0 0 ? S< Aug12 0:00 [xfsdatad/3]
root 746 0.0 0.0 0 0 ? S< Aug12 0:00 [net_accel/0]
root 747 0.0 0.0 0 0 ? S< Aug12 0:00 [net_accel/1]
root 748 0.0 0.0 0 0 ? S< Aug12 0:00 [net_accel/2]
root 749 0.0 0.0 0 0 ? S< Aug12 0:00 [net_accel/3]
root 757 0.0 0.0 0 0 ? S< Aug12 0:00 [kpsmoused]
root 760 0.0 0.0 0 0 ? S< Aug12 0:00 [kcryptd/0]
root 761 0.0 0.0 0 0 ? S< Aug12 0:00 [kcryptd/1]
root 762 0.0 0.0 0 0 ? S< Aug12 0:00 [kcryptd/2]
root 763 0.0 0.0 0 0 ? S< Aug12 0:00 [kcryptd/3]
root 764 0.0 0.0 0 0 ? S< Aug12 0:00 [kmirrord]
root 774 0.0 0.0 0 0 ? S< Aug12 0:01 [kjournald]
root 1006 0.0 0.0 6288 192 ? Ss Aug12 0:00 dhclient3 -pf /var/run/dhclient.eth0.pid -lf /var/lib/dhcp3/dhclient.eth0.leases eth0
root 1096 0.0 0.0 121572 1180 ? Sl Aug12 0:00 /usr/sbin/rsyslogd -c3
bind 1114 0.0 0.0 140616 1460 ? Ssl Aug12 0:00 /usr/sbin/named -u bind
root 1131 0.0 0.0 48856 900 ? Ss Aug12 0:00 /usr/sbin/sshd
amavis 1159 0.0 0.0 124060 1392 ? Ss Aug12 0:00 amavisd (master)
amavis 1171 0.0 0.0 125260 912 ? S Aug12 0:00 amavisd (virgin child)
amavis 1172 0.0 0.0 125260 896 ? S Aug12 0:00 amavisd (virgin child)
root 1302 0.0 0.0 73740 432 ? Ss Aug12 0:00 /usr/sbin/citserver -d -x3 -lmail -t/dev/null
citadel 1303 0.2 0.2 224796 4312 ? Sl Aug12 1:21 /usr/sbin/citserver -d -x3 -lmail -t/dev/null
clamav 1514 0.0 8.7 167772 129056 ? Ssl Aug12 0:05 /usr/sbin/clamd
clamav 1612 0.0 0.0 21704 1216 ? Ss Aug12 0:00 /usr/bin/freshclam -d --quiet
clamav 1819 0.0 14.4 277628 213636 ? Ssl Aug12 0:06 /usr/sbin/clamav-milter --max-children=2 -ol --pidfile /var/run/clamav/clamav-milter.pid local:/var/run/clamav/clamav-milter.ctl
root 1837 0.0 0.0 21820 284 ? Ss Aug12 0:00 /usr/sbin/webcit -D/var/run/webcit/webcit.pid.8888 -p8888 127.0.0.1 504 -i0.0.0.0 -f -t/var/log/webcit//access.8888.log
root 1838 0.0 0.0 37256 720 ? Sl Aug12 0:01 /usr/sbin/webcit -D/var/run/webcit/webcit.pid.8888 -p8888 127.0.0.1 504 -i0.0.0.0 -f -t/var/log/webcit//access.8888.log
root 1840 0.0 0.0 21820 284 ? Ss Aug12 0:00 /usr/sbin/webcit -D/var/run/webcit/webcit.pid.4444 -p4444 127.0.0.1 504 -s -i0.0.0.0 -f -t/var/log/webcit//access.4444.log -s
root 1841 0.0 0.0 37256 1044 ? Sl Aug12 0:01 /usr/sbin/webcit -D/var/run/webcit/webcit.pid.4444 -p4444 127.0.0.1 504 -s -i0.0.0.0 -f -t/var/log/webcit//access.4444.log -s
root 1853 0.0 0.0 12552 492 ? Ss Aug12 0:00 ftpd: accepting connections on port 21
root 1873 0.0 0.0 18544 756 ? Ss Aug12 0:00 /usr/sbin/cron
root 1887 0.0 0.3 224792 5096 ? Ss Aug12 0:00 /usr/sbin/apache2 -k start
root 1904 0.0 0.0 3788 460 tty1 Ss+ Aug12 0:00 /sbin/getty 38400 tty1
www-data 4967 0.1 2.9 233812 43036 ? S Aug12 0:34 /usr/sbin/apache2 -k start
www-data 5216 0.0 2.7 232072 41240 ? S Aug12 0:19 /usr/sbin/apache2 -k start
www-data 5524 0.1 3.2 238072 48344 ? S 00:47 0:20 /usr/sbin/apache2 -k start
www-data 5617 0.1 3.1 236784 46888 ? S 01:08 0:14 /usr/sbin/apache2 -k start
www-data 5642 0.1 2.7 229672 40436 ? S 01:08 0:19 /usr/sbin/apache2 -k start
www-data 5659 0.1 3.0 235964 45348 ? S 01:08 0:14 /usr/sbin/apache2 -k start
www-data 5685 0.1 2.9 232960 43900 ? S 01:08 0:19 /usr/sbin/apache2 -k start
www-data 5697 0.1 3.2 239892 48180 ? S 01:08 0:21 /usr/sbin/apache2 -k start
www-data 5704 0.1 2.8 232608 41324 ? S 01:08 0:24 /usr/sbin/apache2 -k start
www-data 5779 0.1 2.9 232280 42868 ? S 01:20 0:17 /usr/sbin/apache2 -k start
www-data 5780 0.1 2.7 234404 40628 ? S 01:21 0:17 /usr/sbin/apache2 -k start
www-data 5800 0.1 2.8 232028 41884 ? S 01:24 0:17 /usr/sbin/apache2 -k start
root 6159 0.0 0.0 8888 1276 ? S 01:53 0:00 /bin/sh /usr/bin/mysqld_safe
mysql 6198 7.8 4.5 326128 67364 ? Sl 01:53 15:29 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock
root 6199 0.0 0.0 3776 628 ? S 01:53 0:00 logger -p daemon.err -t mysqld_safe -i -t mysqld
www-data 14909 0.2 2.9 232132 42920 ? S 03:27 0:14 /usr/sbin/apache2 -k start
www-data 18987 0.1 2.9 232276 42912 ? S 03:50 0:06 /usr/sbin/apache2 -k start
www-data 18988 0.0 3.0 237028 44732 ? S 03:50 0:03 /usr/sbin/apache2 -k start
www-data 19128 0.0 2.5 229308 38132 ? S 04:27 0:02 /usr/sbin/apache2 -k start
www-data 19129 0.0 2.3 232072 34940 ? S 04:27 0:02 /usr/sbin/apache2 -k start
www-data 19134 0.0 2.3 231376 34160 ? S 04:27 0:02 /usr/sbin/apache2 -k start
www-data 19135 0.1 2.4 231024 36832 ? S 04:27 0:04 /usr/sbin/apache2 -k start
www-data 19140 0.3 2.9 232064 43264 ? S 04:27 0:08 /usr/sbin/apache2 -k start
www-data 19141 0.0 2.3 232224 34456 ? S 04:27 0:01 /usr/sbin/apache2 -k start
www-data 19145 0.0 2.5 232992 37924 ? S 04:27 0:01 /usr/sbin/apache2 -k start
www-data 19147 0.1 2.7 229748 40160 ? S 04:27 0:02 /usr/sbin/apache2 -k start
www-data 19150 0.1 2.6 230224 39800 ? R 04:27 0:04 /usr/sbin/apache2 -k start
www-data 19152 0.2 2.5 230964 37460 ? S 04:27 0:06 /usr/sbin/apache2 -k start
www-data 19156 0.1 2.8 233200 41296 ? S 04:27 0:03 /usr/sbin/apache2 -k start
www-data 19159 0.3 2.4 232576 36644 ? S 04:27 0:08 /usr/sbin/apache2 -k start
www-data 19161 0.1 2.5 229724 37428 ? S 04:27 0:04 /usr/sbin/apache2 -k start
www-data 19162 0.0 2.5 233200 38100 ? S 04:27 0:01 /usr/sbin/apache2 -k start
www-data 19164 0.0 2.4 232140 35756 ? S 04:27 0:01 /usr/sbin/apache2 -k start
www-data 19168 0.3 2.9 232584 42852 ? S 04:27 0:08 /usr/sbin/apache2 -k start
www-data 19172 0.1 2.9 232152 42964 ? S 04:27 0:04 /usr/sbin/apache2 -k start
www-data 19183 0.1 2.8 237020 42460 ? S 04:27 0:03 /usr/sbin/apache2 -k start
www-data 19184 0.1 2.4 232064 36280 ? S 04:27 0:02 /usr/sbin/apache2 -k start
www-data 19185 0.3 2.9 232272 43188 ? S 04:27 0:08 /usr/sbin/apache2 -k start
www-data 19187 0.1 2.3 229748 35236 ? S 04:27 0:03 /usr/sbin/apache2 -k start
www-data 19188 0.1 2.4 232140 36340 ? S 04:27 0:02 /usr/sbin/apache2 -k start
www-data 19189 0.0 2.8 231876 41956 ? S 04:27 0:02 /usr/sbin/apache2 -k start
www-data 19190 0.0 2.7 232808 40504 ? S 04:27 0:01 /usr/sbin/apache2 -k start
www-data 19192 0.2 2.4 232024 36564 ? S 04:27 0:06 /usr/sbin/apache2 -k start
www-data 19196 0.1 2.6 229616 38388 ? S 04:27 0:02 /usr/sbin/apache2 -k start
www-data 19201 0.1 2.5 232628 37756 ? S 04:27 0:03 /usr/sbin/apache2 -k start
www-data 19204 0.1 2.7 233148 40004 ? S 04:27 0:03 /usr/sbin/apache2 -k start
www-data 19205 0.1 2.9 232460 43488 ? S 04:27 0:03 /usr/sbin/apache2 -k start
www-data 19207 0.1 2.6 231560 39196 ? S 04:27 0:03 /usr/sbin/apache2 -k start
www-data 19208 0.1 3.2 240396 48628 ? S 04:27 0:04 /usr/sbin/apache2 -k start
www-data 19217 0.0 2.7 232464 40092 ? S 04:27 0:02 /usr/sbin/apache2 -k start
www-data 19219 0.1 2.8 232492 42444 ? S 04:27 0:04 /usr/sbin/apache2 -k start
root 19285 0.0 0.2 66060 3092 ? Ss 04:49 0:00 sshd: root@pts/0
root 19288 0.0 0.1 17524 1788 pts/0 Ss 04:49 0:00 -bash
www-data 19298 0.1 2.0 232804 30840 ? S 04:50 0:01 /usr/sbin/apache2 -k start
www-data 19299 0.1 2.1 231264 31676 ? S 04:50 0:02 /usr/sbin/apache2 -k start
www-data 19301 0.1 2.3 231964 35000 ? S 04:50 0:02 /usr/sbin/apache2 -k start
www-data 19302 0.0 1.7 233900 26508 ? S 04:50 0:00 /usr/sbin/apache2 -k start
www-data 19303 0.0 1.0 225864 15316 ? S 04:50 0:00 /usr/sbin/apache2 -k start
www-data 19331 0.1 1.7 229336 25736 ? S 04:59 0:00 /usr/sbin/apache2 -k start
www-data 19333 0.2 2.1 233380 31264 ? S 05:00 0:01 /usr/sbin/apache2 -k start
www-data 19347 0.1 2.2 235680 32636 ? S 05:02 0:00 /usr/sbin/apache2 -k start
www-data 19348 0.0 1.6 229428 24308 ? S 05:02 0:00 /usr/sbin/apache2 -k start
www-data 19349 0.1 1.7 229624 26396 ? S 05:02 0:00 /usr/sbin/apache2 -k start
www-data 19350 0.1 1.9 232640 28608 ? S 05:02 0:00 /usr/sbin/apache2 -k start
www-data 19351 0.5 2.2 231856 33836 ? S 05:02 0:03 /usr/sbin/apache2 -k start
www-data 19352 0.1 1.9 230904 28528 ? S 05:02 0:00 /usr/sbin/apache2 -k start
www-data 19353 0.1 2.9 242064 43428 ? S 05:02 0:00 /usr/sbin/apache2 -k start
www-data 19354 0.0 1.3 228548 19956 ? S 05:02 0:00 /usr/sbin/apache2 -k start
www-data 19355 0.1 2.0 232464 30812 ? S 05:02 0:00 /usr/sbin/apache2 -k start
www-data 19356 0.1 1.9 232636 28908 ? S 05:02 0:00 /usr/sbin/apache2 -k start
www-data 19358 0.0 1.6 230500 25032 ? S 05:02 0:00 /usr/sbin/apache2 -k start
www-data 19359 0.4 2.3 234456 34552 ? S 05:02 0:02 /usr/sbin/apache2 -k start
root 19418 0.0 0.0 14720 992 pts/0 R+ 05:10 0:00 ps auxwww
search the forums on how to sanitize your apache configs and limit the number of processes the apache is allowed to spawn.
maybe someone should make one of the "how to configure my apache properly" threads sticky, this is coming up about once a week.
@MsMimi:
The spikes gets so bad that I have to reboot the server in order for it to work again.
:(
As Oliver posted you need to limit the number of simultaneous processes Apache is running to service requests.
To help explain the output you gathered, the 6th column for each process is the RSS or resident set size (in KB), which is a good indicator of the actual active memory being used by that process.
If you add up that column of all of your "apache2" processes, you'll find it sums to 2507768, or about 2448MB. Given that your Linode only has 1440MB memory available to it, you can see how badly overallocated you are, which is why performance gets so bad. Alternatively, add up the 4th column, which is the percentage of total memory used by that process. You'll find your total is about 166%.
The machine is spending so much time accessing the disk in order to exchange bits of apache2 processes between disk and memory that it never gets much of anything done. A classic case of "thrashing".
Limiting how many Apache processes (the MaxClients config file parameter I believe) can run will keep your memory usage reasonable (on a Linode you should avoid any swapping/using more memory than you have in your steady state). Hitting the limit will cause additional connections to delay momentarily until a free Apache process is available to service them, but the time you gain back by not swapping will easily let them be serviced in fractions of seconds, and the addiitonal delay is not something the clients will notice.
– David
Something like 3 seconds should be enough for keep alive and connection time out.
For example, db3l went through and added up the resident set size of all your apache processes and came up with ~2.5GB. That is an impossible number of a Linode 1440. There is simply no way to have 2.5GB resident at once. There are other hints that this number isn't saying what its interpreted as saying, look at the earlier output from vmstat. 176M memory free, 288M for cache, 12M for buffers, only 122M of swap used.
The problem is that it is hard to express how much of a processes resident memory is actually shared memory or a shared libraries. In the case of Apache, I it is often a lot of the latter.
It could still be that you are getting slammed by too many active apache processes at times, and you should definitely see about cutting down the max number of Apache processes.
Nginx can also help. A lot of people replace Apache with Nginx. That's an obvious move to make if you are serving static files. If you have dynamic stuff that already runs in Apache, you can still win by using Nginx as a reverse proxy. Set it up to serve static content directly and proxy dynamic content to Apache. It helps both by keeping apache out of requests for static content, and it allows apache to move on to the next dynamic request more quickly because nginx takes the result quickly and then deals with feeding it out to slower clients. Some example config info here