Monit the best program to use to restart/monitor processes?
I think Monit it used to restart processes like this when it gets too resource heavy bogging down the server. Am I correct on this?
Anyone have example mysql/apache configs that are pretty basic and usable? Perhaps an alternative to monit?
Thanks.
2 Replies
You should find the cause of your problems instead of using software to restart services (not saying you shouldn't use monit too).
A few things that will help
1) install munin
2) show your apache config file
3) show your mysql config file
4) show the output of top
Here are selections from my monitrc. Config-specific variables have been redacted and are marked with %%%. Also, note that Postfix is installed in a send-only configuration, so I only care if it is up and accessible from localhost. The built-in HTTP server is set to only bind to localhost - on the rare occasions I need to use it, I do so via an SSH tunnel with the command ssh -L 2812:localhost:2812 mylogin@mylinodeipaddress.
###############################################################################
## Global section
###############################################################################
##
## Start monit in background (run as daemon) and check the services at 2-minute
## intervals.
#
set daemon 120
#
## Set syslog logging with the 'daemon' facility. If the FACILITY option is
## omited, monit will use 'user' facility by default. You can specify the
## path to the file for monit native logging.
#
# set logfile syslog facility log_daemon
set logfile /var/log/monit.log
#
## You can set the alert recipients here, which will receive the alert for
## each service. The event alerts may be restricted using the list.
#
# set alert sysadm@foo.bar # receive all alerts
# set alert manager@foo.bar only on { timeout } # receive just service-
# # timeout alert
set alert %%%YOUR ADMIN E-MAIL ADDRESS%%%
## Monit has an embedded webserver, which can be used to view the
## configuration, actual services parameters or manage the services using the
## web interface.
#
set httpd port 2812 and
use address localhost # only accept connection from localhost
allow localhost # allow localhost to connect to the server and
allow %%%LOGIN%%%:%%%PASS%%% # require user LOGIN with password PASS
###############################################################################
## Services
###############################################################################
##
## Check the general system resources such as load average, cpu and memory
## usage. Each rule specifies the tested resource, the limit and the action
## which will be performed in the case that the test failed.
#
check system localhost
if loadavg (1min) > 10 then alert
if loadavg (5min) > 8 then alert
if memory usage > 80% then alert
if cpu usage (user) > 70% for 2 cycles then alert
if cpu usage (system) > 50% for 2 cycles then alert
if cpu usage (wait) > 50% for 2 cycles then alert
if loadavg (1min) > 20 for 3 cycles then exec "/sbin/shutdown -r now"
if loadavg (5min) > 15 for 5 cycles then exec "/sbin/shutdown -r now"
if memory usage > 97% for 3 cycles then exec "/sbin/shutdown -r now"
## Check that a process is running, responding on the HTTP request,
## check its resource usage such as cpu and memory, number of childrens.
## In the case that the process is not running, monit will restart it by
## default. In the case that the service was restarted very often and the
## problem remains, it is possible to disable the monitoring using the
## TIMEOUT statement. The service depends on another service (mysql) which
## is defined in the monit control file as well.
check process apache with pidfile /var/run/apache2.pid
start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
if cpu > 80% for 5 cycles then restart
if children > 50 then alert
if children > 60 then restart
# Apache MaxClients = 60
if failed host %%%PUBLIC IP ADDR%%% port 80 protocol http
and request "/index.html"
# Some smallish page that should be available when server is up
with timeout 10 seconds
for 2 cycles
# Sometimes Apache doesn't respond right away, so give it two chances before
# forcing a restart.
then restart
depends on mysql
if 3 restarts within 8 cycles then timeout
check process mysql with pidfile /var/run/mysqld/mysqld.pid
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
if cpu > 80% for 5 cycles then restart
if totalmem > 200.0 MB for 5 cycles then restart
# Base above value on your experience
if failed unixsocket /var/run/mysqld/mysqld.sock protocol mysql
# If you use the network instead of a UNIX socket, adjust settings
with timeout 15 seconds
then restart
if 3 restarts within 5 cycles then timeout
check process sshd with pidfile /var/run/sshd.pid
start program = "/etc/init.d/ssh start"
stop program = "/etc/init.d/ssh stop"
if cpu > 80% for 5 cycles then restart
if totalmem > 200.0 MB for 5 cycles then restart
if failed host %%%PUBLIC IP ADDR%%% port 22 protocol ssh 2 times within 2 cycles
then restart
if 3 restarts within 8 cycles then timeout
check process postfix with pidfile /var/spool/postfix/pid/master.pid
start program = "/etc/init.d/postfix start"
stop program = "/etc/init.d/postfix stop"
if cpu > 30% for 5 cycles then restart
if totalmem > 60.0 MB for 3 cycles then restart
if failed host localhost port 25 protocol smtp
with timeout 60 seconds
then restart
if 3 restarts within 8 cycles then timeout
## Check the device permissions, uid, gid, space and inode usage. Other
## services such as databases may depend on this resource and automatical
## graceful stop may be cascaded to them before the filesystem will become
## full and the data will be lost.
check device filesystem with path /dev/xvda
if space usage > 80% for 5 times within 15 cycles then alert
if space usage > 95% then exec "/etc/init.d/apache2 stop ; /etc/init.d/mysql stop"
if inode usage > 70% then alert
if inode usage > 95% then exec "/etc/init.d/apache2 stop ; /etc/init.d/mysql stop"
## Check a file's timestamp: when it becomes older then 15 minutes, the
## file is not updated and something is wrong. In the case that the size
## of the file exceeded given limit, perform the script.
#
# Monitor denyhosts activity, but not as often
check file hosts.deny path /etc/hosts.deny
every 3 cycles
if changed checksum then alert
There are probably many optimizations I could make to the above, but it works well enough to avoid downtime of more than a few minutes. Configuration is easy enough to figure out, which was a major plus in my book. As obs points out, it's not a substitute for proper configuration, but is a useful fallback when things go unexpectedly wrong.