Need help diagnosing high disk IO and site down
I have a recurring problem on my linode, where the disk IO rate skyrockets and nobody can access my site. There seems to be a pattern of it starting during high traffic.
> Your Linode, linodeXXXXXX, has exceeded the notification threshold (200) for disk io rate by averaging 7012.79 for the last 2 hours.
I can ssh in but it takes about ten minutes to do so. I ran top and it looks like there's tons of apache2 processes (that's bad, right?):
~~![](<URL url=)
Also worth noting is that I logged into the LISH terminal when this was going on and it had a message every few seconds that it was out of memory and that it was killing an apache2 process. Now I know that that is definitely bad.
Here are graphs from the linode dashboard:
~~![](<URL url=)
~~![](<URL url=)
~~![](<URL url=)
As you can see, the disk IO rate is a total troll. He pops up all of a sudden and prevents all my visitors from visiting.
The quickest way to get back up and running is just to restart the server. If the terminal is more responsive, I can kill mysql and apache2 and that seems to solve it. I'm running Joomla so it's probably some rogue script, problem is I don't know where to start to diagnose what the problem is. I checked my joomla logs… for some reason they haven't logged anything since 2008. Big help there, thanks Joomla!
Anyway, anybody know where to start, maybe logs that have a wider perspective outside of Joomla, to see what is really going on?~~~~
6 Replies
All memory used by some process.
Press "Shft+m" (M) in 'top'.
@obs:
Set your apache MaxClients to 20 and restart apache.
Yep. Do that first.
Then find out whether either your apache processes are taking a long time to service requests, or if not, check your KeepAlive setting, you may find Apache is keeping connections open to each client rather than releasing them
@OZ:
Your system uses swap - that's why your server works so slowly.
All memory used by some process.
Press "Shft+m" (M) in 'top'.
Thanks, I always thought there was a way to sort the process list but I never found out how!
@obs:
Set your apache MaxClients to 20 and restart apache.
Ok, Unless there's another config file with a MaxClients setting somewhere that I don't know about, all MaxClients settings in my apache2.conf are set at 150. I think this was the default. Here is the config file:
### Section 1: Global Environment
#
# The directives in this section affect the overall operation of Apache,
# such as the number of concurrent requests it can handle or where it
# can find its configuration files.
#
#
# ServerRoot: The top of the directory tree under which the server's
# configuration, error, and log files are kept.
#
# NOTE! If you intend to place this on an NFS (or otherwise network)
# mounted filesystem then please read the LockFile documentation (available
# at <url:http: httpd.apache.org="" docs-2.1="" mod="" mpm_common.html#lockfile="">);
# you will save yourself a lot of trouble.
#
# Do NOT add a slash at the end of the directory path.
#
ServerRoot "/etc/apache2"
#
# The accept serialization lock file MUST BE STORED ON A LOCAL DISK.
#
# <ifmodule !mpm_winnt.c=""># <ifmodule !mpm_netware.c="">LockFile /var/lock/apache2/accept.lock
#</ifmodule>
#</ifmodule>
#
# PidFile: The file in which the server should record its process
# identification number when it starts.
# This needs to be set in /etc/apache2/envvars
#
PidFile ${APACHE_PID_FILE}
#
# Timeout: The number of seconds before receives and sends time out.
#
Timeout 300
#
# KeepAlive: Whether or not to allow persistent connections (more than
# one request per connection). Set to "Off" to deactivate.
#
KeepAlive On
#
# MaxKeepAliveRequests: The maximum number of requests to allow
# during a persistent connection. Set to 0 to allow an unlimited amount.
# We recommend you leave this number high, for maximum performance.
#
MaxKeepAliveRequests 100
#
# KeepAliveTimeout: Number of seconds to wait for the next request from the
# same client on the same connection.
#
KeepAliveTimeout 15
##
## Server-Pool Size Regulation (MPM specific)
##
# prefork MPM
# StartServers: number of server processes to start
# MinSpareServers: minimum number of server processes which are kept spare
# MaxSpareServers: maximum number of server processes which are kept spare
# MaxClients: maximum number of server processes allowed to start
# MaxRequestsPerChild: maximum number of requests a server process serves
<ifmodule mpm_prefork_module="">StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxClients 150
MaxRequestsPerChild 0</ifmodule>
# worker MPM
# StartServers: initial number of server processes to start
# MaxClients: maximum number of simultaneous client connections
# MinSpareThreads: minimum number of worker threads which are kept spare
# MaxSpareThreads: maximum number of worker threads which are kept spare
# ThreadsPerChild: constant number of worker threads in each server process
# MaxRequestsPerChild: maximum number of requests a server process serves
<ifmodule mpm_worker_module="">StartServers 2
MinSpareThreads 25
MaxSpareThreads 75
ThreadLimit 64
ThreadsPerChild 25
MaxClients 150
MaxRequestsPerChild 0</ifmodule>
# event MPM
# StartServers: initial number of server processes to start
# MaxClients: maximum number of simultaneous client connections
# MinSpareThreads: minimum number of worker threads which are kept spare
# MaxSpareThreads: maximum number of worker threads which are kept spare
# ThreadsPerChild: constant number of worker threads in each server process
# MaxRequestsPerChild: maximum number of requests a server process serves
<ifmodule mpm_event_module="">StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadLimit 64
ThreadsPerChild 25
MaxRequestsPerChild 0</ifmodule>
# These need to be set in /etc/apache2/envvars
User ${APACHE_RUN_USER}
Group ${APACHE_RUN_GROUP}
#
# AccessFileName: The name of the file to look for in each directory
# for additional configuration directives. See also the AllowOverride
# directive.
#
AccessFileName .htaccess
#
# The following lines prevent .htaccess and .htpasswd files from being
# viewed by Web clients.
#
<files ~="" "^\.ht"="">Order allow,deny
Deny from all
Satisfy all</files>
#
# DefaultType is the default MIME type the server will use for a document
# if it cannot otherwise determine one, such as from filename extensions.
# If your server contains mostly text or HTML documents, "text/plain" is
# a good value. If most of your content is binary, such as applications
# or images, you may want to use "application/octet-stream" instead to
# keep browsers from trying to display binary files as though they are
# text.
#
DefaultType text/plain
#
# HostnameLookups: Log the names of clients or just their IP addresses
# e.g., www.apache.org (on) or 204.62.129.132 (off).
# The default is off because it'd be overall better for the net if people
# had to knowingly turn this feature on, since enabling it means that
# each client request will result in AT LEAST one lookup request to the
# nameserver.
#
HostnameLookups Off
# ErrorLog: The location of the error log file.
# If you do not specify an ErrorLog directive within a <virtualhost># container, error messages relating to that virtual host will be
# logged here. If you *do* define an error logfile for a <virtualhost># container, that host's errors will be logged there and not here.
#
ErrorLog /var/log/apache2/error.log
#
# LogLevel: Control the number of messages logged to the error_log.
# Possible values include: debug, info, notice, warn, error, crit,
# alert, emerg.
#
LogLevel warn
# Include module configuration:
Include /etc/apache2/mods-enabled/*.load
Include /etc/apache2/mods-enabled/*.conf
# Include all the user configurations:
Include /etc/apache2/httpd.conf
# Include ports listing
Include /etc/apache2/ports.conf
#
# The following directives define some format nicknames for use with
# a CustomLog directive (see below).
# If you are behind a reverse proxy, you might want to change %h into %{X-Forwarded-For}i
#
LogFormat "%v:%p %h %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i"" vhost_combined
LogFormat "%h %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i"" combined
LogFormat "%h %l %u %t "%r" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent
#
# Define an access log for VirtualHosts that don't define their own logfile
CustomLog /var/log/apache2/other_vhosts_access.log vhost_combined
# Include of directories ignores editors' and dpkg's backup files,
# see README.Debian for details.
# Include generic snippets of statements
Include /etc/apache2/conf.d/
# Include the virtual host configurations:
Include /etc/apache2/sites-enabled/</virtualhost></virtualhost></url:http:>
@exiges:
@obs:Set your apache MaxClients to 20 and restart apache.
Yep. Do that first.
Then find out whether either your apache processes are taking a long time to service requests, or if not, check your KeepAlive setting, you may find Apache is keeping connections open to each client rather than releasing them
Here's the relevant settings in the config file:
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15
I'll take your advice and monitor my apache processes. Do you know of a good way to do this? I did some googling and found this blog post
I realize now I should have mentioned this earlier, my site gets a good amount of traffic every day. For example, last Thursday there were about 4,000 unique visitors, and 31,000 pageviews, which is common for a Thursday. Friday is usually the same or higher, and Friday is one of the days my site went down. That in mind, I wouldn't be surprised if the MaxClients setting is set too low. Whatcha think?
31000 Pageviews/day is not quite 3 per second, so you'll be fine.