why linode down suddently at same time?

Question

why linode down suddently at same time?

performance

forum:kevinkong 12 years, 11 months ago

my e-commerce website architecture is:

1 nodebalancer

3 linode4096 as webserver (nginx+php-fpm, magento)

1 linode4096 as database server.

Because I made promotion for my website yesterday afternoon, so the network flow became 3 times than before, it reached to 160 connections/second, then my customer and I got 503 error frequently, I logged in to each of my linode and run netstate command, it was 400-800 connections opened on each node, lots of the connection's state is TIMEWAIT, then I changed the keepalivetimeout from 75 to 15 in nginx.conf, and restart nginx, the situation didn't change to much, it seems three nodes often down at same time, and became up one by one after a while.

does anyone know why this happen? is there anything I can optimise? is the value of 160 connections/second high for nodebalancer which has three backend nodes?

Thank you very much!

here is my nodeblancer's configuration setting:

Port:80
Protocol:http
Algorithm:round robin
Session Stickiness:http cookies

Health Check Type: http valide status
Check Interval:20
Check Timeout:5
Check Attempts :3

here is my nginx.conf

http {
        include       /etc/nginx/mime.types;
        default_type  application/octet-stream;

        # Logs
        log_format  main  '$remote_addr - $remote_user [$time_local] "$request "'
                '$status $body_bytes_sent "$http_referer" '
                '"$http_user_agent" "$http_x_forwarded_for"';
        #access_log  /var/log/nginx/access.log  main;
        access_log off;

        sendfile        on;

        # TCP optimisation
        tcp_nodelay on; 
        tcp_nopush on;

        autoindex off;
        map $scheme $fastcgi_https { ## Detect when HTTPS is used
                default off;
                https on;
        }

        # Timeout
        #fastcgi_read_timeout 240;

        # Size Limits & Buffer Overflows
        client_body_buffer_size  1K;
        client_header_buffer_size 1k;
        client_max_body_size 1M;
        large_client_header_buffers 2 1k;

        # Timeouts, this conf prevent php-fpm cpu overload, a little...
        client_body_timeout   10;
        client_header_timeout 10;
        keepalive_timeout     15;
        send_timeout          10;

        # Enable keepalive in order to improve First Byte Time
        keepalive_requests    150;

        # Compression
        gzip  on;
        gzip_min_length  1000;
        gzip_buffers      16 8k;
        gzip_comp_level 6;
        gzip_types      text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript image/jpeg;
        gzip_disable     "MSIE [1-6]\.";

        # Pre-compression
        gzip_static on;
        gzip_http_version   1.1;
        gzip_proxied        expired no-cache no-store private auth;
        gzip_vary           on;

        # Load config files from the /etc/nginx/conf.d directory
        include /etc/nginx/conf.d/*.conf;

}

2 Replies

forum:hybinet · Answer 1 · Dec. 9, 2011, 3:52 p.m.

forum:hybinet 12 years, 11 months ago

What are nginx.conf values for workerprocesses and workerconnections? If you're getting 800 simultaneous connections per node, there's a possibility that the default values might not be enough. workerprocesses is usually at the top of the file, and workerconnections is in the "events" section.

http://wiki.nginx.org/CoreModule#worker_processes

http://wiki.nginx.org/EventsModule#worker_connections

forum:Guspaz · Answer 2 · Dec. 9, 2011, 5:01 p.m.

forum:Guspaz 12 years, 11 months ago

Perhaps disabling keepalives entirely might help?

More information might be helpful. More of the nginx.conf config as hybinet requested, information on memory and CPU utilization during the period you had issues (including your database server, perhaps it was overloaded, preventing the web servers from processing requests fast enough), etc. There may be things that can be done in different situations to improve things depending on where the bottleneck is. It may also be that you simply need more linodes (you're handling a lot of traffic). Perhaps in that case writing a stackscript to add another linode to your cluster in an automated fashion could help you scale demand quickly ("oh, I'm about to do a big promotion, I should run this stack script I wrote to add another pre-configured webserver for a few days"). Remember, linodes are prorated to the day.

Compute

Storage

Networking

Databases

Services

Developer Tools

Industries

Pricing

Community

Engage With Us

why linode down suddently at same time?

2 Replies

Reply

Tips: