why linode down suddently at same time?
1 nodebalancer
3 linode4096 as webserver (nginx+php-fpm, magento)
1 linode4096 as database server.
Because I made promotion for my website yesterday afternoon, so the network flow became 3 times than before, it reached to 160 connections/second, then my customer and I got 503 error frequently, I logged in to each of my linode and run netstate command, it was 400-800 connections opened on each node, lots of the connection's state is TIMEWAIT, then I changed the keepalivetimeout from 75 to 15 in nginx.conf, and restart nginx, the situation didn't change to much, it seems three nodes often down at same time, and became up one by one after a while.
does anyone know why this happen? is there anything I can optimise? is the value of 160 connections/second high for nodebalancer which has three backend nodes?
Thank you very much!
here is my nodeblancer's configuration setting:
Port:80
Protocol:http
Algorithm:round robin
Session Stickiness:http cookies
Health Check Type: http valide status
Check Interval:20
Check Timeout:5
Check Attempts :3
here is my nginx.conf
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
# Logs
log_format main '$remote_addr - $remote_user [$time_local] "$request "'
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
#access_log /var/log/nginx/access.log main;
access_log off;
sendfile on;
# TCP optimisation
tcp_nodelay on;
tcp_nopush on;
autoindex off;
map $scheme $fastcgi_https { ## Detect when HTTPS is used
default off;
https on;
}
# Timeout
#fastcgi_read_timeout 240;
# Size Limits & Buffer Overflows
client_body_buffer_size 1K;
client_header_buffer_size 1k;
client_max_body_size 1M;
large_client_header_buffers 2 1k;
# Timeouts, this conf prevent php-fpm cpu overload, a little...
client_body_timeout 10;
client_header_timeout 10;
keepalive_timeout 15;
send_timeout 10;
# Enable keepalive in order to improve First Byte Time
keepalive_requests 150;
# Compression
gzip on;
gzip_min_length 1000;
gzip_buffers 16 8k;
gzip_comp_level 6;
gzip_types text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript image/jpeg;
gzip_disable "MSIE [1-6]\.";
# Pre-compression
gzip_static on;
gzip_http_version 1.1;
gzip_proxied expired no-cache no-store private auth;
gzip_vary on;
# Load config files from the /etc/nginx/conf.d directory
include /etc/nginx/conf.d/*.conf;
}
2 Replies
More information might be helpful. More of the nginx.conf config as hybinet requested, information on memory and CPU utilization during the period you had issues (including your database server, perhaps it was overloaded, preventing the web servers from processing requests fast enough), etc. There may be things that can be done in different situations to improve things depending on where the bottleneck is. It may also be that you simply need more linodes (you're handling a lot of traffic). Perhaps in that case writing a stackscript to add another linode to your cluster in an automated fashion could help you scale demand quickly ("oh, I'm about to do a big promotion, I should run this stack script I wrote to add another pre-configured webserver for a few days"). Remember, linodes are prorated to the day.