Random extreme disk I/O spikes and maxed out memory

Some info: Linode 512, Ubuntu 10.04 32bit, Paravirt 2.6.32.16-linode28, LAMP stack, apache & mysql tuned according to the Linode guide.

I've been tracing this problem for over a day now. At first I thought it was caused by some cronjobs but no. I tried to disable all cronjobs and the prob still exist.

My latest guess is rsyslog and the kernel that caused memory leaked. I'm not really sure but when the memory maxed out, htop shows fairly normal situation.

Here is the disk IO. Note: CPU is around 25% during these spikes.

~~![](<URL url=)http://img375.imageshack.us/img375/8794 … raphsh.png">http://img375.imageshack.us/img375/8794/generategraphsh.png" />

Any help is really appreciated.

TIA

matt~~

11 Replies

Does anything stand out by looking at the top 10 CPU or memory consuming processes?

Top 10 - CPU

ps auxf | sort -nr -k 3 | head -10

Top 10 - Memory

ps auxf | sort -nr -k 4 | head -10

Thanks Brian for the info. I'll check when the time comes.

At the mean time this is part of the big chunk syslog during the 02:00 spike. (It suppose to be 01:00 on the graph)

Aug  5 00:17:01 mattbox CRON[5196]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug  5 00:39:01 mattbox CRON[5539]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
Aug  5 01:07:52 mattbox kernel: apache2 invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Aug  5 01:07:52 mattbox kernel: Pid: 5950, comm: apache2 Not tainted 2.6.32.16-linode28 #1
Aug  5 01:07:52 mattbox kernel: Call Trace:
Aug  5 01:07:52 mattbox kernel: [<c01778fa>] ? oom_kill_process+0x9a/0x280
Aug  5 01:07:52 mattbox kernel: [<c0177f6c>] ? __out_of_memory+0xfc/0x160
Aug  5 01:07:52 mattbox kernel: [<c0178024>] ? out_of_memory+0x54/0xb0
Aug  5 01:07:52 mattbox kernel: [<c017b251>] ? __alloc_pages_nodemask+0x561/0x580
Aug  5 01:07:52 mattbox kernel: [<c017cb1d>] ? __do_page_cache_readahead+0xdd/0x1f0
Aug  5 01:07:52 mattbox kernel: [<c017cc57>] ? ra_submit+0x27/0x40
Aug  5 01:07:52 mattbox kernel: [<c0175bb7>] ? filemap_fault+0x397/0x3a0
Aug  5 01:07:52 mattbox kernel: [<c01043b0>] ? xen_set_pte_at+0x80/0xf0
Aug  5 01:07:52 mattbox kernel: [<c018870a>] ? __do_fault+0x5a/0x490
Aug  5 01:07:52 mattbox kernel: [<c018a817>] ? handle_mm_fault+0x167/0x990
Aug  5 01:07:52 mattbox kernel: [<c0108907>] ? xen_do_upcall+0x7/0xc
Aug  5 01:07:52 mattbox kernel: [<c018dd37>] ? find_vma+0x17/0x80
Aug  5 01:07:52 mattbox kernel: [<c011b5a4>] ? do_page_fault+0x134/0x330
Aug  5 01:07:52 mattbox kernel: [<c0105b2f>] ? xen_restore_fl_direct_end+0x0/0x1
Aug  5 01:07:52 mattbox kernel: [<c0102be6>] ? xen_clts+0x46/0x50
Aug  5 01:07:52 mattbox kernel: [<c011b470>] ? do_page_fault+0x0/0x330
Aug  5 01:07:52 mattbox kernel: [<c05fb58e>] ? error_code+0x66/0x6c
Aug  5 01:07:52 mattbox kernel: [<c011b470>] ? do_page_fault+0x0/0x330
Aug  5 01:07:52 mattbox kernel: Mem-Info:
Aug  5 01:07:52 mattbox kernel: DMA per-cpu:
Aug  5 01:07:52 mattbox kernel: CPU    0: hi:    0, btch:   1 usd:   0
Aug  5 01:07:52 mattbox kernel: CPU    1: hi:    0, btch:   1 usd:   0
Aug  5 01:07:52 mattbox kernel: CPU    2: hi:    0, btch:   1 usd:   0
Aug  5 01:07:52 mattbox kernel: CPU    3: hi:    0, btch:   1 usd:   0
Aug  5 01:07:52 mattbox kernel: Normal per-cpu:
Aug  5 01:07:52 mattbox kernel: CPU    0: hi:  186, btch:  31 usd: 148
Aug  5 01:07:52 mattbox kernel: CPU    1: hi:  186, btch:  31 usd: 109
Aug  5 01:07:52 mattbox kernel: CPU    2: hi:  186, btch:  31 usd:  92
Aug  5 01:07:52 mattbox kernel: CPU    3: hi:  186, btch:  31 usd: 154
Aug  5 01:07:52 mattbox kernel: active_anon:59811 inactive_anon:59948 isolated_anon:0
Aug  5 01:07:52 mattbox kernel: active_file:64 inactive_file:183 isolated_file:0
Aug  5 01:07:52 mattbox kernel: unevictable:0 dirty:1 writeback:112 unstable:0
Aug  5 01:07:52 mattbox kernel: free:1237 slab_reclaimable:858 slab_unreclaimable:2059
Aug  5 01:07:52 mattbox kernel: mapped:104 shmem:4 pagetables:1162 bounce:0
Aug  5 01:07:52 mattbox kernel: DMA free:2028kB min:84kB low:104kB high:124kB active_anon:2044kB inactive_anon:2424kB active_file:4kB inactive_file:68kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15872kB mlocked:0kB dirty:0kB writeback:4kB mapped:4kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:164kB kernel_stack:72kB pagetables:496kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:128 all_unreclaimable? yes
Aug  5 01:07:52 mattbox kernel: lowmem_reserve[]: 0 492 492 492
Aug  5 01:07:52 mattbox kernel: Normal free:2920kB min:2792kB low:3488kB high:4188kB active_anon:237200kB inactive_anon:237368kB active_file:252kB inactive_file:664kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:503936kB mlocked:0kB dirty:4kB writeback:444kB mapped:412kB shmem:16kB slab_reclaimable:3432kB slab_unreclaimable:8072kB kernel_stack:1464kB pagetables:4152kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Aug  5 01:07:52 mattbox kernel: lowmem_reserve[]: 0 0 0 0
Aug  5 01:07:52 mattbox kernel: DMA: 11*4kB 0*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2028kB
Aug  5 01:07:52 mattbox kernel: Normal: 262*4kB 58*8kB 22*16kB 7*32kB 3*64kB 5*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2920kB
Aug  5 01:07:52 mattbox kernel: 9175 total pagecache pages
Aug  5 01:07:52 mattbox kernel: 8887 pages in swap cache
Aug  5 01:07:52 mattbox kernel: Swap cache stats: add 82478, delete 73591, find 6190/7760
Aug  5 01:07:52 mattbox kernel: Free swap  = 0kB
Aug  5 01:07:52 mattbox kernel: Total swap = 262136kB
Aug  5 01:07:52 mattbox kernel: 131072 pages RAM
Aug  5 01:07:52 mattbox kernel: 0 pages HighMem
Aug  5 01:07:52 mattbox kernel: 3409 pages reserved
Aug  5 01:07:52 mattbox kernel: 10031 pages shared
Aug  5 01:07:52 mattbox kernel: 125085 pages non-shared
Aug  5 01:07:52 mattbox kernel: Out of memory: kill process 2139 (apache2) score 17579 or a child
Aug  5 01:07:52 mattbox kernel: Killed process 5865 (apache2)
Aug  5 01:07:55 mattbox kernel: apache2 invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
Aug  5 01:07:55 mattbox kernel: Pid: 5895, comm: apache2 Not tainted 2.6.32.16-linode28 #1
Aug  5 01:07:55 mattbox kernel: Call Trace:
Aug  5 01:07:55 mattbox kernel: [<c01778fa>] ? oom_kill_process+0x9a/0x280
Aug  5 01:07:55 mattbox kernel: [<c0177f6c>] ? __out_of_memory+0xfc/0x160
Aug  5 01:07:55 mattbox kernel: [<c0178024>] ? out_of_memory+0x54/0xb0
Aug  5 01:07:55 mattbox kernel: [<c017b251>] ? __alloc_pages_nodemask+0x561/0x580
Aug  5 01:07:55 mattbox kernel: [<c0196707>] ? read_swap_cache_async+0xc7/0x110
Aug  5 01:07:55 mattbox kernel: [<c01967b8>] ? swapin_readahead+0x68/0x90
Aug  5 01:07:55 mattbox kernel: [<c018af6f>] ? handle_mm_fault+0x8bf/0x990
Aug  5 01:07:55 mattbox kernel: [<c0105357>] ? xen_force_evtchn_callback+0x17/0x30
Aug  5 01:07:55 mattbox kernel: [<c0105b2f>] ? xen_restore_fl_direct_end+0x0/0x1
Aug  5 01:07:55 mattbox kernel: [<c01035f7>] ? xen_mc_flush+0xf7/0x1b0
Aug  5 01:07:55 mattbox kernel: [<c011b5a4>] ? do_page_fault+0x134/0x330
Aug  5 01:07:55 mattbox kernel: [<c0105b2f>] ? xen_restore_fl_direct_end+0x0/0x1
Aug  5 01:07:55 mattbox kernel: [<c0102be6>] ? xen_clts+0x46/0x50
Aug  5 01:07:55 mattbox kernel: [<c011b470>] ? do_page_fault+0x0/0x330
Aug  5 01:07:55 mattbox kernel: [<c05fb58e>] ? error_code+0x66/0x6c
Aug  5 01:07:55 mattbox kernel: [<c011b470>] ? do_page_fault+0x0/0x330</c011b470></c05fb58e></c011b470></c0102be6></c0105b2f></c011b5a4></c01035f7></c0105b2f></c0105357></c018af6f></c01967b8></c0196707></c017b251></c0178024></c0177f6c></c01778fa></c011b470></c05fb58e></c011b470></c0102be6></c0105b2f></c011b5a4></c018dd37></c0108907></c018a817></c018870a></c01043b0></c0175bb7></c017cc57></c017cb1d></c017b251></c0178024></c0177f6c></c01778fa> 

I'm not good in analyzing these logs. Maybe some can help out.

Thanks!

matt

Out of memory: kill process 2139 (apache2) score 17579 or a child

I'd look at Apache first. After all, it's the usual suspect. (It's also possible that something else used up all memory and Apache just became an innocent victim, but I'd say Apache is guilty until proven innocent. Software don't have constitutional rights.)

What does your web traffic look like when those spikes happen? Check your Apache logs and see how many people/robots visited around that time. How many Apache children do you normally use? What are the values of MacClients and ServerLimit in your /etc/apache2/apache2.conf ?

@hybinet:

What does your web traffic look like when those spikes happen?

Traffic looks fairly normal, no DOS or some near that. Incoming is like 30kb/s and outgoing 300kb/s.

@hybinet:

What are the values of MacClients and ServerLimit in your /etc/apache2/apache2.conf ?

This is my configuration, kindly advise:

 <ifmodule mpm_prefork_module="">StartServers          1
    MinSpareServers       1
    MaxSpareServers       5
    ServerLimit          50
    MaxClients           50
    MaxRequestsPerChild   3000</ifmodule> 

Fresh from the oven: It spikes again and here's the iotop, top, htop, Top 10 CPU & Memory:

htop:

  1  :  0.6% sys:  1.4% low:  0.0%                  Tasks: 131 total, 1 running
  2  :  0.8% sys:  1.2% low:  0.0%                  Load average: 36.27 46.20 46.57 
  3  :  0.2% sys:  0.6% low:  0.0%                  Uptime: 10:33:49
  4  :  0.2% sys:  0.6% low:  0.0%                  Load: 36.27 
  Mem:498M used:483M buffers:0M cache:9M 
  Swp:255M used:262128k

  PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command                                
 9735 root      20   0  9732  3308  1272 S  0.0  0.6  0:22.93 /usr/bin/python /usr/bin/iotop
 9697 root      20   0  2592   684   496 R  0.0  0.1  0:31.25 htop
10168 www-data  20   0 44432 16228  1784 S  0.0  3.2  0:00.48 /usr/sbin/apache2 -k start             
10158 root      20   0 42312 10744   108 S  0.0  2.1  0:00.44 /usr/sbin/apache2 -k start
 9842 www-data  20   0 44156  9072   112 S  0.0  1.8  0:06.92 /usr/sbin/apache2 -k start
10166 www-data  20   0 43852 11752   100 S  0.0  2.3  0:00.32 /usr/sbin/apache2 -k start
10165 www-data  20   0 43852 11736   100 S  0.0  2.3  0:00.36 /usr/sbin/apache2 -k start
 9893 root      20   0 45948  8268  1328 S  0.0  1.6  0:07.94 /usr/sbin/apache2 -k start
10053 root      20   0 41608  7260   760 D  0.0  1.4  0:03.19 /usr/sbin/apache2 -k start
10128 root      20   0 45412  8588   100 S  0.0  1.7  0:01.39 /usr/sbin/apache2 -k start
 9871 www-data  20   0 45432  7704   100 S  0.0  1.5  0:06.75 /usr/sbin/apache2 -k start
 9854 mysql     20   0  143M 23624  1312 S  0.0  4.6  0:00.77 /usr/sbin/mysqld
 9855 mysql     20   0  143M 23624  1312 S  0.0  4.6  0:00.89 /usr/sbin/mysqld
10045 www-data  20   0 41528  6796   100 S  0.0  1.3  0:03.51 /usr/sbin/apache2 -k start
10152 www-data  20   0 45740 13968   112 S  0.0  2.7  0:00.72 /usr/sbin/apache2 -k start
10037 root      20   0 41912 10288  2264 D  0.0  2.0  0:04.07 /usr/sbin/apache2 -k start
 2089 root      20   0 10372  2900   364 D  0.0  0.6  0:16.26 /usr/sbin/cloudkick-agent --daemon -c /
10153 www-data  20   0 44364 10160   100 S  0.0  2.0  0:00.63 /usr/sbin/apache2 -k start
10151 root      20   0 45132 12268   100 S  0.0  2.4  0:00.65 /usr/sbin/apache2 -k start
10144 www-data  20   0 45372 11016    96 S  0.0  2.2  0:00.85 /usr/sbin/apache2 -k start

top

top - 13:28:00 up 10:37,  3 users,  load average: 54.20, 49.73, 47.84
Tasks: 141 total,   1 running, 140 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  2.2%sy,  0.0%ni,  0.5%id, 96.7%wa,  0.0%hi,  0.0%si,  0.1%st
Mem:    510652k total,   504152k used,     6500k free,      488k buffers
Swap:   262136k total,   262136k used,        0k free,     8664k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                
 1941 mysql     20   0  143m  24m 2348 S    2  4.9   8:44.19 mysqld                                  
10173 www-data  20   0 37416 7892  672 D    2  1.5   0:00.44 apache2                                 
10185 www-data  20   0 38160 8552  616 D    2  1.7   0:00.29 apache2                                 
10184 www-data  20   0 37928 8340  592 D    1  1.6   0:00.35 apache2                                 
10053 www-data  20   0 41588 7192  512 D    1  1.4   0:03.53 apache2                                 
10188 www-data  20   0 33384 3960  748 D    1  0.8   0:00.14 apache2                                 
 9697 root      20   0  2592  380  188 D    1  0.1   0:32.82 htop                                    
 9735 root      20   0  9732 2280  260 D    1  0.4   0:23.79 iotop                                   
10069 www-data  20   0 42040 7800  396 D    1  1.5   0:03.10 apache2                                 
10118 www-data  20   0 45944 8468  556 D    1  1.7   0:02.03 apache2                                 
10128 www-data  20   0 45680 7868  564 D    1  1.5   0:01.75 apache2                                 
10140 www-data  20   0 45684  10m   80 D    1  2.0   0:01.74 apache2                                 
10170 www-data  20   0 44692  14m  960 D    1  2.9   0:00.56 apache2                                 
10174 www-data  20   0 45636  14m  680 D    1  2.8   0:00.51 apache2                                 
10176 www-data  20   0 45132  15m 1200 D    1  3.0   0:00.52 apache2                                 
10179 www-data  20   0 45116  14m  864 D    1  2.9   0:00.56 apache2                                 
10187 www-data  20   0 32764 3460  656 D    1  0.7   0:00.17 apache2                                 
  183 root      20   0     0    0    0 D    0  0.0   0:56.62 kswapd0                                 
 2089 root      20   0 10372 3248  200 D    0  0.6   0:18.29 cloudkick-agent                         
 9806 www-data  20   0 45952 5892  376 D    0  1.2   0:09.02 apache2                                 
 9838 www-data  20   0 45684 9.8m   80 D    0  2.0   0:07.50 apache2                                 
 9842 www-data  20   0 45692 9016  548 D    0  1.8   0:07.26 apache2         

iotop

Total DISK READ: 4.86 M/s | Total DISK WRITE: 1269.67 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                               
10144 be/4 www-data   31.16 K/s    0.00 B/s  7.00 % 99.99 % apache2 -k start
 9859 be/4 mysql      38.95 K/s    0.00 B/s  0.00 % 99.99 % mysqld
10187 be/4 www-data   74.00 K/s    0.00 B/s  0.00 % 99.99 % apache2 -k start
10166 be/4 www-data  249.26 K/s    0.00 B/s 58.10 % 99.99 % apache2 -k start
10185 be/4 www-data   81.79 K/s    0.00 B/s  0.00 % 99.99 % apache2 -k start
10173 be/4 www-data   38.95 K/s    0.00 B/s  0.00 % 99.99 % apache2 -k start
10179 be/4 www-data   27.26 K/s    0.00 B/s  0.00 % 99.99 % apache2 -k start
10172 be/4 www-data   42.84 K/s    0.00 B/s  0.00 % 99.99 % apache2 -k start
10174 be/4 www-data    0.00 B/s    0.00 B/s  0.00 % 99.99 % apache2 -k start
  183 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [kswapd0]
10018 be/4 www-data    3.89 K/s    0.00 B/s  0.78 % 99.99 % apache2 -k start
 2052 be/4 mysql      11.68 K/s  109.05 K/s  0.00 % 99.99 % mysqld
 9806 be/4 www-data  218.10 K/s    0.00 B/s 64.36 % 99.99 % apache2 -k start
 2139 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % apache2 -k start
 9907 be/4 mysql      19.47 K/s    0.00 B/s  0.00 % 99.99 % mysqld
 9858 be/4 mysql       0.00 B/s    0.00 B/s  0.00 % 99.99 % mysqld   
10150 be/4 www-data  101.26 K/s    0.00 B/s 23.13 % 99.99 % apache2 -k start
10162 be/4 www-data  105.16 K/s    0.00 B/s 32.62 % 99.99 % apache2 -k start
10034 be/4 www-data  136.31 K/s    0.00 B/s 26.51 % 99.99 % apache2 -k start
10147 be/4 www-data   81.79 K/s    0.00 B/s 31.30 % 99.99 % apache2 -k start
10139 be/4 www-data  120.74 K/s    0.00 B/s 52.46 % 99.99 % apache2 -k start
 9896 be/4 mysql     128.52 K/s    3.89 K/s  0.00 % 99.99 % mysqld
10176 be/4 www-data  101.26 K/s    0.00 B/s  0.00 % 51.94 % apache2 -k start
 9888 be/4 mysql     120.74 K/s    3.89 K/s  0.00 % 48.22 % mysqld
  979 be/4 root        7.79 K/s    0.00 B/s  0.00 % 45.45 % [kjournald]
 2031 be/4 mysql      15.58 K/s   35.05 K/s  0.00 % 41.49 % mysqld
10177 be/4 www-data    0.00 B/s    0.00 B/s  0.00 % 41.08 % apache2 -k start

Top 10 CPU

mysql     1941  1.3  4.8 147440 24992 ?        Ssl  02:50   8:45 /usr/sbin/mysqld
root      9697  0.9  0.1   2588   684 pts/0    S+   12:30   0:33  |       \_ htop
root      9735  0.7  0.6   9732  3372 pts/1    S+   12:31   0:24  |       \_ /usr/bin/python /usr/bin/iotop
www-data 10190  0.4  0.8  31960  4340 ?        D    13:27   0:00  \_ /usr/sbin/apache2 -k start
www-data 10189  0.4  2.5  42304 12928 ?        S    13:27   0:00  \_ /usr/sbin/apache2 -k start
www-data 10195  0.3  0.2  31368  1300 ?        D    13:28   0:00  \_ /usr/sbin/apache2 -k start
www-data 10188  0.3  2.4  42324 12536 ?        D    13:27   0:00  \_ /usr/sbin/apache2 -k start
www-data 10185  0.3  2.5  44220 12908 ?        S    13:26   0:00  \_ /usr/sbin/apache2 -k start
www-data 10184  0.3  2.5  43612 12956 ?        D    13:26   0:00  \_ /usr/sbin/apache2 -k start
www-data 10179  0.3  2.4  45372 12440 ?        S    13:25   0:00  \_ /usr/sbin/apache2 -k start

Top 10 Memory

mysql     1941  1.3  4.6 147440 23832 ?        Dsl  02:50   8:46 /usr/sbin/mysqld
www-data 10189  0.3  2.6  45124 13300 ?        S    13:27   0:00  \_ /usr/sbin/apache2 -k start
www-data 10188  0.2  2.5  45372 12936 ?        S    13:27   0:00  \_ /usr/sbin/apache2 -k start
www-data 10187  0.2  2.3  43696 11832 ?        S    13:26   0:00  \_ /usr/sbin/apache2 -k start
www-data 10172  0.2  2.3  46200 11828 ?        D    13:23   0:00  \_ /usr/sbin/apache2 -k start
www-data 10193  0.3  2.2  42312 11540 ?        S    13:28   0:00  \_ /usr/sbin/apache2 -k start
www-data 10184  0.2  2.2  45644 11244 ?        S    13:26   0:00  \_ /usr/sbin/apache2 -k start
www-data 10177  0.4  2.2  45640 11492 ?        D    13:25   0:01  \_ /usr/sbin/apache2 -k start
www-data 10173  0.1  2.2  45284 11240 ?        S    13:23   0:00  \_ /usr/sbin/apache2 -k start
www-data 10169  0.2  2.2  45640 11464 ?        D    13:23   0:01  \_ /usr/sbin/apache2 -k start

One observation I can see is that the number of sleeping processes. Normal situation, there's like 40 or something. The disk I/O is just crazy.

Thanks

You seem to have too many Apache processes for a Linode 512. Try lowering MaxClients and ServerLimit to 25. Lower settings cause fewer requests to be processed at a time, but each request will finish faster due to lower system load. Might not help if something other than traffic is the cause, but at least it's worth a try.

Try adding nginx as a front end, I have nginx with 4 workers processing my front end, and apache with MaxClients set to 10 on a backend. Since the connection between nginx and apache is very quick apache can offload to nginx and let nginx to the slow connection to the browser, hence apache requires less children since it's existing ones are freed up faster.

You want Apache to consume as much memory as possible, without paging. With that in mind, take a look at this article: ~~[http://library.linode.com/troubleshooting/memory-networking" target="_blank">](http://library.linode.com/troubleshooti … networking">http://library.linode.com/troubleshooting/memory-networking](

The memory_limit value times the MaxClients value should be less than your available ram (subtract ram required for other processes).

(memory_limit * MaxClients) < (Server's RAM - Other RAM Usage)

@BrianJM:

You want Apache to consume as much memory as possible, without paging.

Umm, no, you really don't. You want Apache to consume as little memory as possible while still meeting your performance requirements. If memory isn't the bottleneck anyhow, and you're maxed out on something else (CPU, network, disk), then throwing more memory at it is actively counterproductive, since at least in the case of disk bottlenecks, the less RAM Apache uses, the more RAM the disk cache can use.

@Guspaz:

@BrianJM:

You want Apache to consume as much memory as possible, without paging.

Umm, no, you really don't. You want Apache to consume as little memory as possible while still meeting your performance requirements. If memory isn't the bottleneck anyhow, and you're maxed out on something else (CPU, network, disk), then throwing more memory at it is actively counterproductive, since at least in the case of disk bottlenecks, the less RAM Apache uses, the more RAM the disk cache can use.

Based on his MaxClient/ServerLimit configuration, the IO spikes indicate that Apache may reason for paging. In his situation, I believe it is safe to say that "You want Apache to be able to consume as much memory as possible, without paging." Of course, this is assuming that his "performance requirements" are to serve as many clients as possible while using all of the resources are to consume only part of the available resources, then my statement is entirely inaccurate.

@ALL

Thanks for all the feedbacks.

I've lowered the MaxClients and ServerLimit to 25, plus installing PHP APC. So far so good. Memory consumption is around 20%.

Will keep close eyes on it.

Thanks again!

matt

@BrianJM:

Based on his MaxClient/ServerLimit configuration, the IO spikes indicate that Apache may reason for paging. In his situation, I believe it is safe to say that "You want Apache to be able to consume as much memory as possible, without paging." Of course, this is assuming that his "performance requirements" are to serve as many clients as possible while using all of the resources are to consume only part of the available resources, then my statement is entirely inaccurate.
It's unlikely that you can set a configuration to consume "all of the resources" (in general) because different resources will bottleneck at different points, given what knobs are available. For example, disk can be a bottleneck even if no paging is involved and memory far from exceeded.

Let's take a ridiculous example where a database operation to generate a page takes a full 1s of solid disk I/O. In that case the ideal response rate is 1/second. In such a case, due to the latency, Apache processes are pretty much guaranteed to pile up, but having more than a handful of Apache processes (to provide for I/O interleaving) will likely drop the system response rate below the ideal due to contention between the processes for the bottleneck disk I/O. So in this case you'd want to configure Apache to be far below the limit of memory, using disk I/O availability as the threshold instead.

That example is sort of ridiculously extreme, but the threshold still exists, just at a different point that needs to be discovered on a case by case basis while tuning a configuration. While essentially the same thing happens when you get too far into paging on a VPS - the paging contention is such that adding more processes makes the entire system bog down even further - paging isn't the only risk.

Your suggestion is definitely along the right lines, it's just assumes a little too quickly that the disk resource will bottleneck strictly because of paging. The better approach is to find the inflection point where adding any further resources of one type might not exceed the bottleneck of another type of resource. That's not necessarily just memory vs. paging, which is I believe, what Guspaz was trying to point out.

Of course, having too many Apache processes overloading memory, causing paging, and then contending for that paging so the system dies a slow death is a very common way for a VPS to get into trouble. So dropping MaxClients to get things working again is certainly a good first step. But tuning to maximize performance might take a little work, and in the end yield a configuration where Apache never comes close to using all memory, since any more would adversely impact some other constrained resource (even without paging).

To the OP, one way you can approach this is to dramatically pull back on your Apache configuration (say 1-2 clients, and very few requests per client). Then hit your server with a benchmark tool such as ab for a common page that involves a typical amount of back-end processing. Then slowly increase your apache configuration and retest until you hit an inflection point in performance. Then pull back slightly on the configuration to give yourself some headroom.

It's not perfect, as disk and CPU availability can vary on a Linode, but it's relatively easy to do and at least you know what your limits are under heavy load. You'll want to do it when your Linode is idle, so if it's a production system you could clone it to a different Linode (even if only purchased for a few days) and run the tests there.

– David

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct