strange problem on my server

I found strange problem on my server: after 10 days of working the server stopped processing requests, CPU loading is 400%!!!

~~![](<URL url=)http://spnova.org/error2.png" />

How can I solve this problem? Or how can I make my server to reboot if CPU is loaded more than 40%? Because after reloading the server works well again.~~

16 Replies

When that happens, your best bet is to look at the console and figure out what broke:

~~[http://library.linode.com/linode-manager/using-lish-the-linode-shell" target="_blank">](http://library.linode.com/linode-manage … node-shell">http://library.linode.com/linode-manager/using-lish-the-linode-shell](

My guess is that the kernel locked up. It's rare, but it happens. The "logview" command in lish should still have the information from before the last reboot.

I'm getting this same exact problem.

Running CentOS 5.x

Virtualmin

Apache

MysSql

and running 3 Wordpress sites with 830MB of RAM

I can't find anything in logs until the server is locked up and reports its out of swap and memory. No large spikes of traffic or other cpu load leading up to this.

 free m
             total       used       free     shared    buffers     cached
Mem:        829644     666756     162888          0      15676     161228
-/+ buffers/cache:     489852     339792
Swap:       262136          0     262136

I then have to reboot the linode in order to solve the problem.

They're are not high traffic sites either. Avg cpu usage is < 5% avg network is 5kb/s combined.

You'll want to glance at this: http://library.linode.com/troubleshooti … ory_issues">http://library.linode.com/troubleshooting/memory-networking#diagnosingandfixingmemoryissues

My concern is, I have a dedicated Dell 1850. 2GB RAM 10k scsi drives. 100 sites, all dynamic, 300gb traffic a month. Same setup as this VPS - CentOS, Virtualmin, and no tweaks. Runs without issue. This VPS has 4 sites that barely see any traffic and it randomly croaks. Maybe I'm overestimating what a VPS should be capable of handling?

Here are the stats from my VPS in accordance to that article.

iostat -d -x 2 5   
Linux 2.6.18.8-x86_64-linode10 ()     05/12/2010

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.14     1.82  0.54  1.08    15.97    23.26    24.22     0.02   11.08   3.01   0.49
xvdb              0.00     0.00  0.00  0.00     0.02     0.00    26.22     0.00    9.39   9.11   0.00

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00  0.00  0.50     0.00     4.00     8.00     0.00    0.00   0.00   0.00
xvdb              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdb              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdb              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdb              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
httpd -V | grep 'MPM'
Server MPM:     Prefork
 -D APACHE_MPM_DIR="server/mpm/prefork"
# Timeout: The number of seconds before receives and sends time out.
#
Timeout 30

#
# KeepAlive: Whether or not to allow persistent connections (more than
# one request per connection). Set to "Off" to deactivate.
#
KeepAlive Off

#
# MaxKeepAliveRequests: The maximum number of requests to allow
# during a persistent connection. Set to 0 to allow an unlimited amount.
# We recommend you leave this number high, for maximum performance.
#
MaxKeepAliveRequests 100

#
# KeepAliveTimeout: Number of seconds to wait for the next request from the
# same client on the same connection.
#
KeepAliveTimeout 5

# StartServers: number of server processes to start
# MinSpareServers: minimum number of server processes which are kept spare
# MaxSpareServers: maximum number of server processes which are kept spare
# ServerLimit: maximum value for MaxClients for the lifetime of the server
# MaxClients: maximum number of server processes allowed to start
# MaxRequestsPerChild: maximum number of requests a server process serve
 <ifmodule prefork.c="">StartServers       5
MinSpareServers    5
MaxSpareServers   10
ServerLimit      128
MaxClients       128
MaxRequestsPerChild  2000</ifmodule> 

# worker MPM
# StartServers: initial number of server processes to start
# MaxClients: maximum number of simultaneous client connections
# MinSpareThreads: minimum number of worker threads which are kept spare
# MaxSpareThreads: maximum number of worker threads which are kept spare
# ThreadsPerChild: constant number of worker threads in each server process
# MaxRequestsPerChild: maximum number of requests a server process serves
 <ifmodule worker.c="">StartServers         2
MaxClients           3
MinSpareThreads      3
MaxSpareThreads      7
ThreadsPerChild      3
MaxRequestsPerChild  200</ifmodule> 

after an http restart:

free -m
             total       used       free     shared    buffers     cached
Mem:           810        361        449          0         21        150
-/+ buffers/cache:        189        620
Swap:          255          0        255
uname -ra
Linux li125-138 2.6.18.8-x86_64-linode10 #1 SMP Tue Nov 10 16:29:17 UTC 2009 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/my.cnf
[mysqld]
port            = 3306
socket          = /var/lib/mysql/mysql.sock
skip-locking
key_buffer = 16K
max_allowed_packet = 1M
table_cache = 4
sort_buffer_size = 64K
read_buffer_size = 256K
read_rnd_buffer_size = 256K
net_buffer_length = 2K
thread_stack = 64K

# For low memory, Berkeley DB should not be used so keep skip-bdb uncommented unless required
skip-bdb

# For low memory, InnoDB should not be used so keep skip-innodb uncommented unless required
skip-innodb

# Uncomment the following if you are using InnoDB tables
#innodb_data_home_dir = /var/lib/mysql/
#innodb_data_file_path = ibdata1:10M:autoextend
#innodb_log_group_home_dir = /var/lib/mysql/
#innodb_log_arch_dir = /var/lib/mysql/
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
#innodb_buffer_pool_size = 16M
#innodb_additional_mem_pool_size = 2M
# Set .._log_file_size to 25 % of buffer pool size
#innodb_log_file_size = 5M
#innodb_log_buffer_size = 8M
#innodb_flush_log_at_trx_commit = 1
#innodb_lock_wait_timeout = 50

[mysqldump]
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates

[isamchk]
key_buffer = 8M
sort_buffer_size = 8M

[myisamchk]
key_buffer = 8M
sort_buffer_size = 8M

[mysqlhotcopy]
interactive-timeout

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

these suckers get pretty large - pretty quick

%MEM %CPU   RSS    VSZ COMMAND
 6.6  0.1 55444 326360 /usr/sbin/httpd
 6.5  0.0 54536 325416 /usr/sbin/httpd
 6.5  0.0 54356 325352 /usr/sbin/httpd
 6.5  0.0 54264 325160 /usr/sbin/httpd
 6.5  0.0 54124 325100 /usr/sbin/httpd
 6.4  0.0 53628 324528 /usr/sbin/httpd
 1.8  0.0 15664 282452 /usr/sbin/httpd

@mattm:

these suckers get pretty large - pretty quick

%MEM %CPU   RSS    VSZ COMMAND
 6.6  0.1 55444 326360 /usr/sbin/httpd
 6.5  0.0 54536 325416 /usr/sbin/httpd
 6.5  0.0 54356 325352 /usr/sbin/httpd
 6.5  0.0 54264 325160 /usr/sbin/httpd
 6.5  0.0 54124 325100 /usr/sbin/httpd
 6.4  0.0 53628 324528 /usr/sbin/httpd
 1.8  0.0 15664 282452 /usr/sbin/httpd

Apache is a notorious memory hog. You'll want to either tweak your Apache config or switch to a web server with a sane default setup.

Yes, that's what I read all over the forums. I think it (may) have been one of your threads where you said (or someone) said "tweak" apache use WP-Supercache and you would have no problem serving 15k hits a day on a 360 node.

I posted my worker process settings up top on this thread - maybe it's too little? These sites barely push any traffic and then out of no where CPU and IO spikes 400 times the norm then the server locks up.

I've since purchased more memory from ~800 to ~1100

~~![](<URL url=)http://i44.tinypic.com/2m3605f.png" />

![](" />~~

You are using prefork- not worker, > httpd -V | grep 'MPM'

Server MPM: Prefork

-D APACHEMPMDIR="server/mpm/prefork" so you need to tweak the prefork values-

> # StartServers: number of server processes to start

MinSpareServers: minimum number of server processes which are kept spare

MaxSpareServers: maximum number of server processes which are kept spare

ServerLimit: maximum value for MaxClients for the lifetime of the server

MaxClients: maximum number of server processes allowed to start

MaxRequestsPerChild: maximum number of requests a server process serve

StartServers 5

MinSpareServers 5

MaxSpareServers 10

ServerLimit 128

MaxClients 128

MaxRequestsPerChild 2000 >

worker MPM

StartServers: initial number of server processes to start

MaxClients: maximum number of simultaneous client connections

MinSpareThreads: minimum number of worker threads which are kept spare

MaxSpareThreads: maximum number of worker threads which are kept spare

ThreadsPerChild: constant number of worker threads in each server process

MaxRequestsPerChild: maximum number of requests a server process serves

StartServers 2

MaxClients 3

MinSpareThreads 3

MaxSpareThreads 7

ThreadsPerChild 3

MaxRequestsPerChild 200

This is mine on a 360-
> StartServers 1

MinSpareServers 3

MaxSpareServers 6

Serverlimit 24

MaxClients 24

MaxRequestsPerChild 3000

Seems to be working pretty well- I hit the max clients, but don't get in to the swap much (though I am still tweaking).

thanks for adding yours. What kind of traffic are you getting to your 360?

What is the best method to monitor these processes, esp during normal traffic hours?

I have 3 sites on it- 1 static and 2 wordpress. The static site gets around 6-700 "visits" per day, and the wordpress sites each get around 200.

I have had it crash and burn, but now I use monit and Munin- Munin simply reports, but I have monit set to restart apache if it gets to memory hungry, and have made the changes outlined in http://www.linode.com/wiki/index.php/RebootingonOOM if the whole thing borks- which it has once.

When I get time I'm going to get another 360 and set it up with worker (or nginx), and move everything over.

I'm way far from a guru- though I've used Linux at home for nearly 10 years now, I'm still a newbie when it comes to server administration :).

Nice - thanks for your input.

 free -m
             total       used       free     shared    buffers     cached
Mem:          1170        816        353          0         33        211
-/+ buffers/cache:        571        598
Swap:          255          0        255

FWIW- right now mine is

linpear:~# free -m
             total       used       free     shared    buffers     cached
Mem:           360        273         86          0          2        137
-/+ buffers/cache:        134        226
Swap:          255         45        210

I'm running Debian Lenny 32 bit.

I ran an apache bench - not sure how reliable that is to to test load on the server…

Server Port:            80

Document Path:          /
Document Length:        42086 bytes

Concurrency Level:      10
Time taken for tests:   8.801834 seconds
Complete requests:      100
Failed requests:        76
   (Connect: 0, Length: 76, Exceptions: 0)
Write errors:           0
Total transferred:      4242605 bytes
HTML transferred:       4214505 bytes
Requests per second:    11.36 [#/sec] (mean)
Time per request:       880.183 [ms] (mean)
Time per request:       88.018 [ms] (mean, across all concurrent requests)
Transfer rate:          470.70 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       13   13   0.7     13      16
Processing:   339  852 397.5    770    2514
Waiting:      250  586 342.7    479    2318
Total:        352  865 397.5    783    2527

Percentage of the requests served within a certain time (ms)
  50%    783
  66%    952
  75%   1012
  80%   1031
  90%   1412
  95%   1712
  98%   2424
  99%   2527
 100%   2527 (longest request)

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct