disk IO is going insane

Question

disk IO is going insane

performance

I'm not sure what I did, but my hard drive activity is not normal and I'm sure I'm inadvertently causing problems. I am running two websites on my linode, both of which get some pretty heavy traffic. I recently moved one off of mysql completely but there's still legacy hits coming in.

output from Free:

free
             total       used       free     shared    buffers     cached
Mem:        720956     705504      15452          0       1044      16748
-/+ buffers/cache:     687712      33244
Swap:       524280     233108     291172

output from top

top - 21:35:19 up 1 day, 22:56,  3 users,  load average: 11.74, 16.23, 20.50
Tasks: 684 total,  58 running, 626 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.6%us, 61.4%sy,  0.0%ni, 29.0%id,  6.0%wa,  0.0%hi,  0.4%si,  0.6%st
Mem:    720956k total,   714392k used,     6564k free,      268k buffers
Swap:   524280k total,   320472k used,   203808k free,    11104k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6694 mysql     20   0  360m 5020 1912 S 4386  0.7 256:19.94 mysqld
  183 root      20   0     0    0    0 R   99  0.0   8:45.06 kswapd0
 7375 www-data  20   0 56052  14m 2892 R   15  2.0   0:02.22 apache2
 8046 www-data  20   0 59672  14m 3008 S   14  2.0   0:00.54 apache2
 7686 www-data  20   0 59676  14m 2968 R   13  2.0   0:00.78 apache2
 7461 www-data  20   0 59672  14m 2952 S   12  2.0   0:00.77 apache2
 7579 www-data  20   0 59580  14m 2964 D    9  2.0   0:00.43 apache2
 7459 www-data  20   0 59708  14m 2952 D    8  2.0   0:00.86 apache2
 8077 www-data  20   0 59672  14m 3016 S    7  2.0   0:00.93 apache2
 8041 www-data  20   0 53004  14m 2956 R    6  2.1   0:00.71 apache2
 8038 www-data  20   0 56144  14m 2940 R    6  2.1   0:00.36 apache2
 7238 www-data  20   0 59672  14m 2944 S    5  2.0   0:00.53 apache2
 7262 www-data  20   0 59580  13m 2964 R    5  2.0   0:00.35 apache2
 7292 www-data  20   0 59672  13m 2960 R    5  2.0   0:00.44 apache2
 7879 www-data  20   0 59672  14m 3020 R    5  2.0   0:00.18 apache2
 7104 www-data  20   0 53004  14m 2808 R    5  2.0   0:00.70 apache2
 7113 www-data  20   0 59672  13m 2936 R    5  2.0   0:01.10 apache2
 7206 www-data  20   0 59672  14m 2964 R    5  2.0   0:01.22 apache2
 7216 www-data  20   0 59448  14m 3148 R    5  2.1   0:00.57 apache2
 7635 www-data  20   0 59672  14m 2952 R    5  2.0   0:00.22 apache2
 7664 www-data  20   0 59676  14m 2932 R    5  2.0   0:00.79 apache2
 7869 www-data  20   0 59680  14m 3020 D    5  2.0   0:00.39 apache2
 7928 www-data  20   0 59552  14m 3244 R    5  2.1   0:00.50 apache2
 7067 www-data  20   0 59672  14m 2948 S    5  2.0   0:00.63 apache2
 7055 www-data  20   0 59672  13m 2960 S    4  2.0   0:00.50 apache2
 7148 www-data  20   0 59832  14m 2964 D    4  2.0   0:00.60 apache2
 7241 www-data  20   0 50668  14m 2796 R    4  2.0   0:00.64 apache2
 7389 www-data  20   0 50640  14m 2904 R    4  2.1   0:00.50 apache2
 7404 www-data  20   0 59672  13m 2964 S    4  1.9   0:00.69 apache2
 7233 www-data  20   0 59572  13m 2940 R    4  1.9   0:00.21 apache2
 7890 www-data  20   0 59680  14m 3008 D    4  2.0   0:00.32 apache2
 7910 www-data  20   0 59672  14m 3020 S    4  2.0   0:00.15 apache2
 7931 www-data  20   0 59568  13m 2428 S    4  1.9   0:00.11 apache2
 7249 www-data  20   0 59672  14m 2948 R    4  2.0   0:00.33 apache2
 7239 www-data  20   0 59572  13m 2948 R    3  1.9   0:00.59 apache2
 7355 www-data  20   0 59672  13m 2968 R    3  2.0   0:00.44 apache2
 7650 www-data  20   0 59672  14m 2956 D    3  2.0   0:00.87 apache2
 8043 www-data  20   0 59696  14m 3008 D    3  2.0   0:00.55 apache2
 7715 www-data  20   0 59580  13m 2984 R    3  1.9   0:00.39 apache2
 7856 root      20   0  8164 1096  836 S    3  0.2   0:00.18 sshd
 8042 www-data  20   0 59992  14m 3056 R    2  2.1   0:00.96 apache2
 7077 www-data  20   0 59672  14m 2940 D    2  2.0   0:00.32 apache2
 7269 www-data  20   0 59672  13m 2952 S    2  2.0   0:00.23 apache2
 7343 www-data  20   0 59672  14m 2968 R    2  2.0   0:00.19 apache2
 7457 www-data  20   0 59988  14m 2972 S    2  2.0   0:00.45 apache2
 7525 www-data  20   0 59672  14m 2964 D    2  2.0   0:00.89 apache2
 7199 www-data  20   0 59672  14m 2996 R    2  2.0   0:00.22 apache2
 7347 www-data  20   0 59672  14m 2948 D    2  2.0   0:01.02 apache2
 7468 www-data  20   0 59672  14m 2956 R    2  2.0   0:00.46 apache2
 7486 www-data  20   0 59672  14m 2952 D    2  2.0   0:00.91 apache2
 7745 www-data  20   0 59672  13m 2988 S    2  1.9   0:00.79 apache2
 7203 www-data  20   0 59672  13m 2956 R    1  2.0   0:00.37 apache2
 7590 www-data  20   0 59580  14m 2960 S    1  2.0   0:00.76 apache2
 7699 www-data  20   0 59568  14m 2972 R    1  2.0   0:00.76 apache2
 7922 www-data  20   0 59576  13m 2440 R    1  1.9   0:00.05 apache2

#
ServerRoot "/etc/apache2"

#
# The accept serialization lock file MUST BE STORED ON A LOCAL DISK.
#
# <ifmodule !mpm_winnt.c=""># <ifmodule !mpm_netware.c="">LockFile /var/lock/apache2/accept.lock
#</ifmodule>
#</ifmodule>

#
# PidFile: The file in which the server should record its process
# identification number when it starts.
# This needs to be set in /etc/apache2/envvars
#
PidFile ${APACHE_PID_FILE}

#
# Timeout: The number of seconds before receives and sends time out.
#
Timeout 300

#
# KeepAlive: Whether or not to allow persistent connections (more than
# one request per connection). Set to "Off" to deactivate.
#
KeepAlive On

#
# MaxKeepAliveRequests: The maximum number of requests to allow
# during a persistent connection. Set to 0 to allow an unlimited amount.
# We recommend you leave this number high, for maximum performance.
#
MaxKeepAliveRequests 80

#
# KeepAliveTimeout: Number of seconds to wait for the next request from the
# same client on the same connection.
#
KeepAliveTimeout 15

##
## Server-Pool Size Regulation (MPM specific)
##

# prefork MPM
# StartServers: number of server processes to start
# MinSpareServers: minimum number of server processes which are kept spare
# MaxSpareServers: maximum number of server processes which are kept spare
# MaxClients: maximum number of server processes allowed to start
# MaxRequestsPerChild: maximum number of requests a server process serves
 <ifmodule mpm_prefork_module="">StartServers          5
    MinSpareServers       5
    MaxSpareServers      10
    MaxClients          150
    MaxRequestsPerChild   0</ifmodule> 

# worker MPM
# StartServers: initial number of server processes to start
# MaxClients: maximum number of simultaneous client connections
# MinSpareThreads: minimum number of worker threads which are kept spare
# MaxSpareThreads: maximum number of worker threads which are kept spare
# ThreadsPerChild: constant number of worker threads in each server process
# MaxRequestsPerChild: maximum number of requests a server process serves
 <ifmodule mpm_worker_module="">StartServers          2
    MinSpareThreads      25
    MaxSpareThreads      75
    ThreadLimit          64
    ThreadsPerChild      25
    MaxClients          150
    MaxRequestsPerChild   0</ifmodule> 

# event MPM
# StartServers: initial number of server processes to start
# MaxClients: maximum number of simultaneous client connections
# MinSpareThreads: minimum number of worker threads which are kept spare
# MaxSpareThreads: maximum number of worker threads which are kept spare
# ThreadsPerChild: constant number of worker threads in each server process
# MaxRequestsPerChild: maximum number of requests a server process serves
 <ifmodule mpm_event_module="">StartServers          2
    MaxClients          150
    MinSpareThreads      25
    MaxSpareThreads      75
    ThreadLimit          64
    ThreadsPerChild      25
    MaxRequestsPerChild   0</ifmodule> 

# These need to be set in /etc/apache2/envvars
#User ${APACHE_RUN_USER}
User www-data

#Group ${APACHE_RUN_GROUP}
Group www-data

#
# AccessFileName: The name of the file to look for in each directory
# for additional configuration directives.  See also the AllowOverride
# directive.

AccessFileName .htaccess

#
# The following lines prevent .htaccess and .htpasswd files from being
# viewed by Web clients.
#
 <files ~="" "^\.ht"="">Order allow,deny
    Deny from all</files> 

#
# DefaultType is the default MIME type the server will use for a document
# if it cannot otherwise determine one, such as from filename extensions.
# If your server contains mostly text or HTML documents, "text/plain" is
# a good value.  If most of your content is binary, such as applications
# or images, you may want to use "application/octet-stream" instead to
# keep browsers from trying to display binary files as though they are
# text.
#
DefaultType text/plain

#
# HostnameLookups: Log the names of clients or just their IP addresses
# e.g., www.apache.org (on) or 204.62.129.132 (off).
# The default is off because it'd be overall better for the net if people
# had to knowingly turn this feature on, since enabling it means that
# each client request will result in AT LEAST one lookup request to the
# nameserver.
#
HostnameLookups Off

# ErrorLog: The location of the error log file.
# If you do not specify an ErrorLog directive within a <virtualhost># container, error messages relating to that virtual host will be
# logged here.  If you *do* define an error logfile for a <virtualhost># container, that host's errors will be logged there and not here.
#
ErrorLog /dev/null

#
# LogLevel: Control the number of messages logged to the error_log.
# Possible values include: debug, info, notice, warn, error, crit,
# alert, emerg.
#
LogLevel alert

# Include module configuration:
Include /etc/apache2/mods-enabled/*.load
Include /etc/apache2/mods-enabled/*.conf

# Include all the user configurations:
Include /etc/apache2/httpd.conf

# Include ports listing
Include /etc/apache2/ports.conf

#
# The following directives define some format nicknames for use with
# a CustomLog directive (see below).
# If you are behind a reverse proxy, you might want to change %h into %{X-Forwa                                                                              rded-For}i
#
#LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\""                                                                               vhost_combined
#LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combi                                                                              ned
#LogFormat "%h %l %u %t \"%r\" %>s %O" common
#LogFormat "%{Referer}i -> %U" referer
#LogFormat "%{User-agent}i" agent

#
# Define an access log for VirtualHosts that don't define their own logfile
# CustomLog /var/log/apache2/other_vhosts_access.log vhost_combined

# Include of directories ignores editors' and dpkg's backup files,
# see README.Debian for details.

# Include generic snippets of statements
Include /etc/apache2/conf.d/*.conf

# Include the virtual host configurations:
Include /etc/apache2/sites-enabled/</virtualhost></virtualhost>

# The MySQL database server configuration file.
#
# You can copy this to one of:
# - "/etc/mysql/my.cnf" to set global options,
# - "~/.my.cnf" to set user-specific options.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html

# This will be passed to all mysql clients
# It has been reported that passwords should be enclosed with ticks/quotes
# escpecially if they contain "#" chars...
# Remember to edit /etc/mysql/debian.cnf when changing the socket location.
[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock

# Here is entries for some specific programs
# The following values assume you have at least 32M ram

# This was formally known as [safe_mysqld]. Both versions are currently parsed.
[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = 0

[mysqld]
#
# * Basic Settings
#

#
# * IMPORTANT
#   If you make changes to these settings and your system uses apparmor, you may
#   also need to also adjust /etc/apparmor.d/usr.sbin.mysqld.
#

user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /var/lib/mysql
tmpdir          = /tmp
skip-external-locking
#
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
bind-address            = 127.0.0.1
#
# * Fine Tuning
#
key_buffer              = 128M
max_allowed_packet      = 16M
thread_stack            = 192K
thread_cache_size       = 8
# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched
myisam-recover         = BACKUP
max_connections        = 200
table_cache            = 128
thread_concurrency     = 10
#
# * Query Cache Configuration
#
query_cache_limit       = 1M
query_cache_size        = 128M
#
# * Logging and Replication
#
# Both location gets rotated by the cronjob.
# Be aware that this log type is a performance killer.
# As of 5.1 you can enable the log at runtime!
#general_log_file        = /var/log/mysql/mysql.log
#general_log             = 1
#
# Error logging goes to syslog due to /etc/mysql/conf.d/mysqld_safe_syslog.cnf.
#
# Here you can see queries with especially long duration
log_slow_queries        = /var/log/mysql/mysql-slow.log
long_query_time = 1
#log-queries-not-using-indexes
#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
#       other settings you may need to change.
#server-id              = 1
#log_bin                        = /var/log/mysql/mysql-bin.log
expire_logs_days        = 10
max_binlog_size         = 100M
#binlog_do_db           = include_database_name
#binlog_ignore_db       = include_database_name
#
# * InnoDB
#
# InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.
# Read the manual for more InnoDB related options. There are many!
#
# * Security Features
#
# Read the manual, too, if you want chroot!
# chroot = /var/lib/mysql/
#
# For generating SSL certificates I recommend the OpenSSL GUI "tinyca".
#
# ssl-ca=/etc/mysql/cacert.pem
# ssl-cert=/etc/mysql/server-cert.pem
# ssl-key=/etc/mysql/server-key.pem

[mysqldump]
quick
quote-names
max_allowed_packet      = 16M

[mysql]
#no-auto-rehash # faster start of mysql but no tab completition

[isamchk]
key_buffer              = 32M

#
# * IMPORTANT: Additional settings that can override those from this file!
#   The files must end with '.cnf', otherwise they'll be ignored.
#

Any help would be appreciated. I don't know what I did wrong but Disk IO is screaming high and it was never this bad when I was having a lot more traffic.

Thanks.

25 Replies

forum:db3l · Answer 1 · June 9, 2010, 11:13 p.m.

forum:db3l 14 years, 5 months ago

~~@sdlvx:~~

Any help would be appreciated. I don't know what I did wrong but Disk IO is screaming high and it was never this bad when I was having a lot more traffic.
Best guess is the I/O is a lot of swapping as you are over-committing your Linode. You're allowing far too many Apache client processes to exist than the memory your Linode has can support. With a Linode, you really want to avoid swapping if at all possible, and in your case, you're using swap at almost 1/3 your physical memory.

I don't know which Apache worker model you're using, but MaxClients is probably too large for any of them, as the first main knob.

There are several threads in the forums that discuss tuning Apache appropriately. The default settings that come with most distributions are nowhere near realistic for a VPS environment, where memory is limited and I/O overhead a significant performance hit.

I don't have a thread reference handy, but suspect that searching for MaxClients or apache will come up with some. If nothing else, starting by dropping your MaxClients down to 15-20 should be a quick start, but you'll need to experiment a little to find a best value for your specific Linode.

– David

forum:sdlvx · Answer 2 · June 9, 2010, 11:20 p.m.

forum:sdlvx 14 years, 5 months ago

it went back down to 700. It peaked at 20k. Is that even possible?

EDIT:

Thanks, I'll search around. I was in a giant panic, today has been horrible. My internet is down and I've been doing this all through SSH over my cell phone. What a nightmare. q.q

forum:db3l · Answer 3 · June 10, 2010, 1:34 a.m.

forum:db3l 14 years, 5 months ago

~~@sdlvx:~~

it went back down to 700. It peaked at 20k. Is that even possible?
Sure. I think I/O is counted in 1K blocks, though not absolutely certain. But at 1K blocks, 20k I/O requests would only be 20MB/s. In tests I can easily generate write I/O on my Linode 360 at 5x+ that rate - though not all the data is probably flushed in that timing - and can get hdparm read timings (-t) at 8-9x+ that.

Of course, the Linode monitoring is also 5min average, so you could have peaked quite a bit higher at points.

On the bright side, when there's contention on a host for I/O, performance can really tank (I've had some occasions of seeing 40-50% iowait). So given that you had pretty high I/O rates, and at least in the posted top output iowait was relatively modest at 6%, during your overload you might not have been seriously affecting too many peers on your host.

Definitely worth tuning the configuration for though, since you're likely at least impacting your own performance.

– David

forum:Guspaz · Answer 4 · June 10, 2010, 10:13 a.m.

forum:Guspaz 14 years, 5 months ago

You've still got way too many Apache processes. All your RAM is being eaten up by Apache for no good reason, you need to drop your maxclients waaay down. We're talking to 10% of what it's set to now.

forum:bryantrv · Answer 5 · June 10, 2010, 11:35 a.m.

forum:bryantrv 14 years, 5 months ago

I would think that MaxRequestsPerChild should be non zero as well- I have mine at 3000 on a 360, along with a keepalive timeout of 2.

forum:devjonfos · Answer 6 · June 10, 2010, 12:15 p.m.

forum:devjonfos 14 years, 5 months ago

http://library.linode.com/troubleshooti … y_settings">http://library.linode.com/troubleshooting/memory-networking#apache2lowmemorysettings

and vmstat

# vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0    920  32772  56796 155204    0    0     1     1    1    0  0  0 100  0
 1  0    920  32764  56796 155204    0    0     0     0   37   20  0  0 100  0
 0  0    920  32824  56796 155204    0    0     0     0   34   34  0  0 100  0
 0  0    920  32824  56796 155204    0    0     0     0   22   18  0  0 100  0
 0  0    920  32824  56796 155204    0    0     0     0   31   30  0  0 100  0

forum:sdlvx · Answer 7 · June 10, 2010, 10:31 p.m.

forum:sdlvx 14 years, 5 months ago

The problem I think was with mysql. I had a website and it was getting a lot of usage. I reworked it so that it didn't use a database at all and disk IO has been constantly under 1,000.

If I set these values any lower my site slows to a crawl and it takes forever for a page to load.

I did lower my cache settings for mysql and that seemed to help, but I noticed it was taking longer for pages to generate.

In the end it looks like this is a serious problem. I already have a 720 because I needed to move up from the 360. I think that I don't have much overhead the way my linode is set up.

But when it comes to upgrading it shouldn't be a big deal as long as this ad company I'm using doesn't rip me off. If these settings in Apache need to be this high it pretty much means the only way to make things better is better hardware, right?

I'm pretty overwhelmed, I have two startups that I coded myself and I'm doing the IT work and keeping the server going, while I do the social stuff that goes along with one of the startups. I'm also kind of new to the server running thing but I have a bachelors in CS and everything we did was on Linux so I have a good idea of what's going on.

Anyways the problem is more than likely entirely in mysql for now. Disk IO right now is at 288, cpu @ ~20%, and outgoing @ ~700kbps

EDIT: A lot of the problem was getting hammered and then mysql rsorting to using a file sort on a long database (30k+ entries)

forum:db3l · Answer 8 · June 10, 2010, 11:04 p.m.

forum:db3l 14 years, 5 months ago

~~@sdlvx:~~

In the end it looks like this is a serious problem. I already have a 720 because I needed to move up from the 360. I think that I don't have much overhead the way my linode is set up.

But when it comes to upgrading it shouldn't be a big deal as long as this ad company I'm using doesn't rip me off. If these settings in Apache need to be this high it pretty much means the only way to make things better is better hardware, right?
I still think you really have Apache configured too high. It sounds to me like you're shrinking resource use by MySQL as an alternative to fixing the root resource issue being introduced by Apache. MySQL may well have been sorting to a file, but that wouldn't explain your swap usage, which was very high for a 720.

Have you perhaps rebooted and/or restarted apache along the way of doing your recent work too? If so, then you just got a temporary reprieve by recovering the resources of all of those prior apache processes. But it'll just happen again when you get enough simultaneous requests. Note that improving processing time for a single request (such as removing the use of the database) is another way to keep down the simultaneous apache processes, but doesn't fix the root problem, which is still waiting to occur again.

You know your application best, but are you absolutely certain you need to support that many Apache clients? Your current configuration just isn't tuned for your available resources.

Having fewer clients (and depending on worker model, more requests serviced per client as suggested in another response) need not drop the peak load your system can handle - and in fact can improve it because you aren't overloading and swapping.

And realize that your process listing you showed only had around 50 apache processes. So you were nowhere near your client limit of 150 even when overloaded, so I fail to see how that can be a valid setting. Basically, if MaxClients times the size of an apache process (with whatever plugins you have) is more than your physical memory - or the portion not used by other processes - it's set too high.

If you literally can't tune apache better, then yes, you're only recourse is going to be to increase your Linode to the point where the number of Apache client processes you need fit within memory. You can also improve things by relocating the database to a second Linode, but that's just another way to free memory up for Apache.

I don't really think that's the right solution for your case though, and I'd suggest revisiting your conclusion that MySQL was at fault, though it may have contributed indirectly by letting the apache configuration burn through all available resources.

– David

forum:Guspaz · Answer 9 · June 11, 2010, 10 a.m.

forum:Guspaz 14 years, 5 months ago

The thing is you don't need Apache that high, and so it's consuming all your RAM leaving none for MySQL or other things.

You only have four virtual CPUs. If you're running much more than one or two Apache processes per CPU, you're not able to handle any more users, you just slow down the access per user. Additional accesses will just get queued if there isn't a worker ready to handle it, I believe.

You've got Apache extremely misconfigured, and if your site is database heavy, you need MORE memory for MySQL, not less. Properly configuring Apache will free up that memory for more important things.

forum:sdlvx · Answer 10 · June 11, 2010, 5:10 p.m.

forum:sdlvx 14 years, 5 months ago

~~@db3l:~~

I still think you really have Apache configured too high. It sounds to me like you're shrinking resource use by MySQL as an alternative to fixing the root resource issue being introduced by Apache. MySQL may well have been sorting to a file, but that wouldn't explain your swap usage, which was very high for a 720.

Have you perhaps rebooted and/or restarted apache along the way of doing your recent work too? If so, then you just got a temporary reprieve by recovering the resources of all of those prior apache processes. But it'll just happen again when you get enough simultaneous requests. Note that improving processing time for a single request (such as removing the use of the database) is another way to keep down the simultaneous apache processes, but doesn't fix the root problem, which is still waiting to occur again.

You know your application best, but are you absolutely certain you need to support that many Apache clients? Your current configuration just isn't tuned for your available resources.

Having fewer clients (and depending on worker model, more requests serviced per client as suggested in another response) need not drop the peak load your system can handle - and in fact can improve it because you aren't overloading and swapping.

And realize that your process listing you showed only had around 50 apache processes. So you were nowhere near your client limit of 150 even when overloaded, so I fail to see how that can be a valid setting. Basically, if MaxClients times the size of an apache process (with whatever plugins you have) is more than your physical memory - or the portion not used by other processes - it's set too high.

If you literally can't tune apache better, then yes, you're only recourse is going to be to increase your Linode to the point where the number of Apache client processes you need fit within memory. You can also improve things by relocating the database to a second Linode, but that's just another way to free memory up for Apache.

I don't really think that's the right solution for your case though, and I'd suggest revisiting your conclusion that MySQL was at fault, though it may have contributed indirectly by letting the apache configuration burn through all available resources.

– David

I think there might be something wrong with my mpm after googling around. If my ServerLimit and MaxClients is not really high, it takes forever for pages to load. I have a timer in my application to tell how long the page took during PHP and mysql, and it's always a good time. I've been googling and it sounds like new connections are being queued.

After reading about MPM, I'm thinking I messed it up and I don't have any MPM, and each apache process is a single request. I was googling and I couldn't find out how to see which MPM my server is running. I recall reading that maximum simultaneous users should be MaxClients * number of threads in your mpm. It seems like I'm only getting Max Clients * 1.

~~@Guspaz:~~

The thing is you don't need Apache that high, and so it's consuming all your RAM leaving none for MySQL or other things.

You only have four virtual CPUs. If you're running much more than one or two Apache processes per CPU, you're not able to handle any more users, you just slow down the access per user. Additional accesses will just get queued if there isn't a worker ready to handle it, I believe.

You've got Apache extremely misconfigured, and if your site is database heavy, you need MORE memory for MySQL, not less. Properly configuring Apache will free up that memory for more important things.

I have two sites on it. Both of which are pretty popular (one of huge, but very simple) and I wasn't planning on them getting this big.

I ran ps -ef |grep apache2 |wc and I think it said I had 222 processes open.

EDIT: One last edit. I gave up and just restored backups and it seems to be working now that I set

StartServers 5

MinSpareServers 5

MaxSpareServers 10

MaxClients 150

MaxRequestsPerChild 0

and I removed MaxClients and ServerLimit from httpd.conf

forum:db3l · Answer 11 · June 11, 2010, 6:55 p.m.

forum:db3l 14 years, 5 months ago

~~@sdlvx:~~

I think there might be something wrong with my mpm after googling around. If my ServerLimit and MaxClients is not really high, it takes forever for pages to load.
How much processing is required for you to deliver a page? It sounds like individual requests take a long time to render, in which case, yes, a configuration to only support a few simultaneous requests will cause queuing.

But that still doesn't mean that setting a value that permits your Linode to go into heavy swapping isn't going to be counter-productive. In other words, as long as you don't overflow memory, more processes are fine, but beyond that point you're actually going to hurt yourself.

> After reading about MPM, I'm thinking I messed it up and I don't have any MPM, and each apache process is a single request. I was googling and I couldn't find out how to see which MPM my server is running.
Well, you definitely have an MPM since any apache is going to have at least one MPM selected, but it's certainly true often the default is prefork which is a single process per request. Depending on distributions you can probably tell by what packages you installed. Or I believe "apache2 -V" should show the MPM configured into the server.

> I recall reading that maximum simultaneous users should be MaxClients * number of threads in your mpm. It seems like I'm only getting Max Clients * 1.
If you're using prefork, that would be right, as it only has one thread/client per process.

> I ran ps -ef |grep apache2 |wc and I think it said I had 222 processes open.
Given the resource demands per-worker in your prior posting, I have to imagine you were thrashing/swapping horribly with that many. Not sure how you hit 222 with a MaxClients of 150, but it could be you were thrashing so badly the system wasn't able to clean up the old processes fast enough, especially as with MaxRequestsPerChild it has to shut down and start a new child process for each incoming request.

> EDIT: One last edit. I gave up and just restored backups and it seems to be working now that I set

StartServers 5

MinSpareServers 5

MaxSpareServers 10

MaxClients 150

MaxRequestsPerChild 0

and I removed MaxClients and ServerLimit from httpd.conf
For what it's worth, that's still almost certainly too high a MaxClients value for your environment.

The fact that it's working now is more likely due to the reload/restart (which cleared all the processes) than any configuration correction. You're just sitting on a time bomb whose clock you have reset, but it's likely to blow up again as soon as you get enough load in terms of simultaneous requests being processed.

Again - your best bet is to tune Apache so that under full load you are barely swapping, taking into account all processes. That will most likely require dropping MaxClients.

You could increase MaxRequestsPerChild (as recommended by another poster) which won't change the simultaneous request limit nor peak memory footprint, but helps minimize process creation since it lets a child handle that many requests before restarting it. That should be safe unless you have "messy" code running per request or a buggy request processing chain.

You could also try switching to a threaded MPM model, since multiple threads per child process are a little lighter weight, but not all embedded interpreters may like that, nor do I suspect it'll make a massive difference in throughput, since if you're tying up a 720's worth of memory with apache processes, the individual rendering of a page is likely bottlenecked elsewhere.

The bottom line in all this is to find the sweet spot of configuration where you are maximally using your available resources, but not exceeding them. As you approach that point in tuning you will see your performance (requests/second you can service) steadily increase, but if you cross past it, your performance is going to tank like going off a cliff. So when in doubt, start conservatively. You may be a little more sluggish than you need to be, but at least you won't tank.

One thing you could do to provide yourself with more room for experimentation is to temporarily allocate a second 720. Clone your current box over (sounds like you have backups, so just restore to the new box and tweak hostnames and what not), and then experiment against that. Tools such as ab can help load down your server (be sure to request URLs that exercise your full database path), and help find an appropriate tuning point.

I really do suspect you'll be able to get quite good performance out of a 720 even with a reasonably inefficient rendering path, while still protecting yourself against becoming overloaded, but you really do want to find an appropriate configuration that won't let the box fall over the cliff if load gets too high. In such a case, it's better to queue (or even drop) requests since at least only a few suffer rather than killing everyone's performance and making your site essentially unusable.

– David

forum:hybinet · Answer 12 · June 11, 2010, 8:19 p.m.

forum:hybinet 14 years, 5 months ago

Is your website written in PHP? You mentioned PHP once, so I guess it is.

Here are some suggestions:

1. If you haven't installed APC yet, do it right now. That's a free performance boost for you.

2. Running Apache with the Prefork MPM and PHP as a module is usually quite bad for performance. Consider switching to something like nginx, or at least offload all the static files (css, js, images) to nginx. That could save you hundreds of MB of memory.

3. Minimize database calls with some smart caching. Memcached is easy, and it can be incredibly useful once you get the hang of it. Even if your data changes every 30 seconds, you should still cache it during the 30 seconds that it doesn't change.

forum:sdlvx · Answer 13 · June 12, 2010, 9:38 p.m.

forum:sdlvx 14 years, 5 months ago

Thanks again David. I think that apache2 is not efficient enough for my needs on a linode.

I do not have APC installed AFAIK and I think that moving to NGINX with PHP and mySQL is my best bet. After looking into it it seems like that was the wisest choice. I just jumped on Apache2 because I considered it to be the defacto standard.

The traffic on my site has died way down and I'm not sure if it's going to pick back up again.

I will try setting up NGINX monday. Right now all I have for internet is my cell phone and it occasionally goes out for long periods of time. I don't want to end up with hours of downtime because I don't have internet.

And yes it is written in PHP. I tried to hack in mpm-worker since I saw it gives roughly double the performance but it doesn't like php and the instructions I were given must have been outdated or I am dumb.

forum:db3l · Answer 14 · June 12, 2010, 11:04 p.m.

forum:db3l 14 years, 5 months ago

~~@sdlvx:~~

Thanks again David. I think that apache2 is not efficient enough for my needs on a linode.
Doubtful. I tested apache handling over ten thousand requests a second for static content even on a Linode 360.

Of course, once you layer a processing intensive PHP application, and the I/O overhead (and processing) of a database, you'll get nowhere near that, but I'd hardly blame apache for a php or database overhead. Nor is such overhead likely to change by switching web servers, since your overhead is not in the web server. At least not without restructuring your PHP architecture, or applying other changes to it such as the caching previously mentioned.

It's true that using nginx for purely static content can help a bit, because then you have a smaller footprint for the static content (don't need to start an apache process with the PHP overhead), and can leave more memory for the PHP-based requests. But I still don't buy the fact that you really need that yet, if you were only to keep apache from using up all your memory under load. I still don't think you've given a decently tuned environment a fair shake without trying to make much larger scale changes to your setup.

> The traffic on my site has died way down and I'm not sure if it's going to pick back up again.
Well, my prior suggestion for explicitly stress testing your application on a separate, test, Linode you create for a short period still stands. It'll help you identify where your bottlenecks are and identify a good set of configuration parameters before changing the production system.

> I will try setting up NGINX monday. Right now all I have for internet is my cell phone and it occasionally goes out for long periods of time. I don't want to end up with hours of downtime because I don't have internet.
Why not do your experiments on a test Linode to remove any risk of downtime?

Good luck with your site - it doesn't seem like this thread has done much to influence your efforts, but I wish you the best in resolving your setup.

– David

forum:glg · Answer 15 · June 13, 2010, 10:39 a.m.

forum:glg 14 years, 5 months ago

~~@sdlvx:~~

I do not have APC installed AFAIK

DO THIS FIRST. APC is a php cache that is trivial to install and use. Try it before going through the trouble of switching web servers. You'll want it either way.

forum:OverlordQ · Answer 16 · June 14, 2010, 3:45 a.m.

forum:OverlordQ 14 years, 5 months ago

~~@glg:~~

DO THIS FIRST. APC is a php cache that is trivial to install and use.

Until it starts corrupting it's cache randomly.

forum:sdlvx · Answer 17 · June 15, 2010, 6:26 p.m.

forum:sdlvx 14 years, 5 months ago

I installed APC and it seemed to help. However, my server still runs like crap. I still need to set the prefork_mpm maxclients to at least 100 to see decent performance.

The application itself is extremely simple. It's one of those stupid facebook like sites where people just type in a little blurb of text, it's saved in a mysql database, and then they like the page they made and it spreads like crazy. It's not resource intensive or anything at all.

I have whos.amung.us installed and I'm only seeing a peak of around 500 users on at once.

I bought more ram but I know something just flat out isn't right. This 720 had no problem reaching one million uniques in a day before, and now it's struggling to handle a hundred thousand.

Google isn't helping me much.

forum:Guspaz · Answer 18 · June 16, 2010, 10:17 a.m.

forum:Guspaz 14 years, 5 months ago

~~@sdlvx:~~

I installed APC and it seemed to help. However, my server still runs like crap. I still need to set the prefork_mpm maxclients to at least 100 to see decent performance.

The application itself is extremely simple. It's one of those stupid facebook like sites where people just type in a little blurb of text, it's saved in a mysql database, and then they like the page they made and it spreads like crazy. It's not resource intensive or anything at all.

I have whos.amung.us installed and I'm only seeing a peak of around 500 users on at once.

I bought more ram but I know something just flat out isn't right. This 720 had no problem reaching one million uniques in a day before, and now it's struggling to handle a hundred thousand.

Google isn't helping me much.

Apache is still misconfigured…

You may want to just give up and install lighttpd or nginx or some other web server, since they're configured sanely out of the box. Or perhaps switch to mpm_worker? I'm not that familiar with configuring Apache since I switched to lighttpd years ago.

From a database perspective, is your schema properly indexed? On large data sets, this can be the difference between 1ms per query and 1000ms per query.

forum:rsk · Answer 19 · June 16, 2010, 10:59 a.m.

forum:rsk 14 years, 5 months ago

You can't use modphp with worker, as PHP is not threadsafe. You'd need to use modfastcgi, and while it's possible, it takes a bit of work to get it almost right, and it's impossible to get it completely right unless you hack mod_fastcgi's code to not kill subprocesses on SIGUSR1.

forum:hybinet · Answer 20 · June 16, 2010, 2:31 p.m.

forum:hybinet 14 years, 5 months ago

~~@rsk:~~

You can't use modphp with worker, as PHP is not threadsafe. You'd need to use modfastcgi, and while it's possible, it takes a bit of work to get it almost right, and it's impossible to get it completely right unless you hack modfastcgi's code to not kill subprocesses on SIGUSR1. Yeah, modfastcgi is hell to get right. mod_fcgid is just a tiny bit better, but again much more hassle than just switching to nginx or lighttpd.

OP, did you try profiling your slow pages? Add little hooks here and there to measure the time taken to process each stage of page generation (request parsing, database access, templating, etc.) You could use microtime(true), or grab any open-source PHP profiling library. That could help you pinpoint the source of the slowness.

Also, check your MySQL tables using the CHECK command. Those beasts tend to get corrupted from time to time. Also try optimize and reindex.

forum:rsk · Answer 21 · June 16, 2010, 3:24 p.m.

forum:rsk 14 years, 5 months ago

Modfcgid doesn't pipeline requests, so your whole multi-slave APC-enabled php tree handles one request at a time. To use fcgid you'd need to use the "dumb" mode with no PHPFCGI_CHILDREN, and some other opcache that's external to the process. (Hmm, does memcached cache php opcodes, or webpage data?)

forum:Guspaz · Answer 22 · June 17, 2010, 11:09 a.m.

forum:Guspaz 14 years, 5 months ago

It's probably a lot easier to get lighttpd working with fastcgi, since it more or less comes pre-configured. Unless I'm mistaken, it's just a single command to enable the fastcgi module, and I don't think it requires any configuration beyond that.

forum:irgeek · Answer 23 · June 17, 2010, 11:20 a.m.

forum:irgeek 14 years, 5 months ago

~~@sdlvx:~~

I think there might be something wrong with my mpm after googling around. If my ServerLimit and MaxClients is not really high, it takes forever for pages to load. I have a timer in my application to tell how long the page took during PHP and mysql, and it's always a good time. I've been googling and it sounds like new connections are being queued.

Turn KeepAlives off. Or at the very least, turn the timeout down to 1 or 2 seconds. You're seeing that behavior because connections are being opened and Apache has to sit there waiting for about 15 seconds after it finishes sending data because the client might send another request down the same connection - this ties up that whole Apache process so it can't serve other requests. Turning off KeepAlives causes Apache to serve the data to the client, then close the connection so it can move on to serving up the next client's request.

-James

forum:AceStar · Answer 24 · June 20, 2010, 11 p.m.

forum:AceStar 14 years, 5 months ago

I notice each of the Apache processes was using 14MB min.

If you run PHP but a lot of requests you receive are non-PHP (ie static files) you should be able to reduce this to about 3 or 4MB each for non-PHP requests by switching from modphp to modfcgid and using PHP via FastCGI. Note that you actually have to disable the mod_php module though. You can then also switch your MPM from prefork to worker which can save a little overall overhead with Apache.

Takes a bit of setting up - maybe practice in a virtual machine. Ah yes, I notice that others above have complained about modfcgid being difficult to set up - note that you can solve it though, you just have to make sure PHP does not spawn any threads, and all that is handled by modfcgid. modfcgid is not compatible with PHP's behaviour of spawning fastcgi threads itself, modfcgid needs to do that.

forum:sdlvx · Answer 25 · June 21, 2010, 2:58 a.m.

forum:sdlvx 14 years, 5 months ago

Thank you for all the help everyone.

I stumbled upon the server-status page and found out that I was very easily hitting 50 open connections, all of which were in the keep alive state. I disabled keepalive and saw that number drop to between 2 and 25, but it's hanging in the upper single digits mostly.

It's too early in the night to be stress testing the new setup, but I think this has made a very big difference.

Each process is using a ton of memory, still. top is reporting about 2.8% usage and I have a 720 slice with 360 extra ram.

> From a database perspective, is your schema properly indexed? On large data sets, this can be the difference between 1ms per query and 1000ms per query.

Yeah, I spent a long time making sure everything was properly indexed. Slow query log is empty (and I set the time to 1 second). The problem is not with pages rendering slowly, but actually being served. I have timer code at the start and end of every PHP page. If a page takes 5 seconds to load, it still reports being generated in about .01 seconds. I thought that was enough to rule out my application's performance.

> Turn KeepAlives off. Or at the very least, turn the timeout down to 1 or 2 seconds. You're seeing that behavior because connections are being opened and Apache has to sit there waiting for about 15 seconds after it finishes sending data because the client might send another request down the same connection - this ties up that whole Apache process so it can't serve other requests. Turning off KeepAlives causes Apache to serve the data to the client, then close the connection so it can move on to serving up the next client's request.

James this advice was A+. however, when I look at server-status page, it looks like everyone is in a state of waiting for the connection. Is this normal and acceptable? I'm guessing it is because instead of waiting around, apache is serving a page and moving on as soon as it's done.

> Takes a bit of setting up - maybe practice in a virtual machine. Ah yes, I notice that others above have complained about modfcgid being difficult to set up - note that you can solve it though, you just have to make sure PHP does not spawn any threads, and all that is handled by modfcgid. modfcgid is not compatible with PHP's behaviour of spawning fastcgi threads itself, modfcgid needs to do that.

I have a nearly identical VM locally I test on. I think I will try some different servers and see what I can do. I'm trying to run this stupid Facebook like page thing and a pretty complex startup website all by myself. Instead of fixing bugs and improving things I spend most of my time fiddling with the server. Not very helpful and there's a lot more better things I should be doing instead of breaking my server more and more while users say the site is slow and sucky.

Compute

Storage

Networking

Databases

Services

Developer Tools

Industries

Pricing

Community

Engage With Us

disk IO is going insane

25 Replies

Reply

Tips: