Troubleshooting OOM Issues on Linode - Rails/Postgres/Nginx

Hello,

I had a general discussion thread where, thanks to hybinet, I realized that probably the issue lies with performance tuning. And Any help would be utmost welcome.

I run a Rails application using nginx on Linode 1536. The application receives around 90,000 hits a day.. And OOM issues have been troubling me a lot since May… :(

Here's Some stats:

 free -m
             total       used       free     shared    buffers     cached
Mem:          1690       1577        112          0          4        232
-/+ buffers/cache:       1341        349
Swap:          255         62        193
ps aux --sort -rss
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1000      7790 28.0 12.3 234176 214216 ?       Sl   11:40   6:58 Rails: /home/ssd/code/current                                                                                          
1000      7764 22.4 12.2 234984 212336 ?       Rl   11:40   5:34 Rails: /home/ssd/code/current                                                                                          
1000      7780 30.6 12.2 233824 212264 ?       Rl   11:40   7:37 Rails: /home/ssd/code/current                                                                                         
1000      7769 25.0  9.8 192128 170076 ?       Sl   11:40   6:13 Rails: /home/ssd/code/current                                                                                          
1000      7785 29.3  9.4 184612 164436 ?       Rl   11:40   7:18 Rails: /home/ssd/code/current                                                                                          
1000      7775 27.6  9.4 183880 163704 ?       Rl   11:40   6:52 Rails: /home/ssd/code/current                                                                                          
1000      7795 29.7  9.2 179608 159816 ?       Sl   11:40   7:22 Rails: /home/ssd/code/current                                                                                         
1000      7843  1.1  5.4 111664 94804 ?        Sl   11:43   0:15 job_runner                              
postgres  7797  3.4  1.9  48920 34276 ?        Rs   11:40   0:51 postgres: ssd ssd_db [local] SELECT                                                                      
postgres  7789  3.3  1.9  47048 33728 ?        Ss   11:40   0:50 postgres: ssd ssd_db [local] idle                                                                        
postgres  7783  3.4  1.9  47004 33696 ?        Ss   11:40   0:52 postgres: ssd ssd_db [local] idle                                                                        
postgres  7772  2.4  1.9  46924 33648 ?        Ss   11:40   0:35 postgres: ssd ssd_db [local] idle                                                                        
postgres  7779  3.5  1.9  46920 33584 ?        Ss   11:40   0:52 postgres: ssd ssd_db [local] idle                                                                        
postgres  7774  2.7  1.9  47032 33528 ?        Ss   11:40   0:41 postgres: ssd ssd_db [local] idle                                                                        
postgres  7794  3.3  1.9  46812 33400 ?        Rs   11:40   0:49 postgres: ssd ssd_db [local] SELECT                                                                      
postgres  9067 11.0  1.8  46572 31280 ?        Rs   12:05   0:00 postgres: ssd ssd_db 127.0.0.1(53516) SELECT                                                             
postgres  7066  0.0  1.5  45052 27604 ?        Ss   11:25   0:01 postgres: writer process                                                                                                    
postgres  7849  0.0  0.3  46060  6476 ?        Ss   11:43   0:00 postgres: ssd ssd_db [local] idle                                                                        
postgres  7947  0.0  0.2  45996  4316 ?        Ss   11:44   0:00 postgres: ssd ssd_db [local] idle                                                                        
nobody   17210  0.0  0.2  62380  4032 ?        Sl   00:02   0:18 /usr/bin/memcached -m 64 -p 11211 -u nobody -l 127.0.0.1
1000      9065  1.0  0.2  11296  3732 ?        S    12:05   0:00 indexer --config /home/ssd/code/releases/20110816134046/config/production.sphinx.conf --rotate post_delta
nobody    7666  0.9  0.1   8772  3116 ?        S    11:40   0:14 nginx: worker process
root      7625  1.2  0.1  19984  2776 ?        Sl   11:40   0:19 PassengerHelperAgent
root      7627  0.2  0.0  17744  1524 ?        Sl   11:40   0:03 Passenger spawn server                                                                                                                  
postgres  2113  0.0  0.0  45052  1260 ?        S    Aug16   0:04 /usr/lib/postgresql/8.4/bin/postgres -D /var/lib/postgresql/8.4/main -c config_file=/etc/postgresql/8.4/main/postgresql.conf
1000      7579  0.0  0.0   3280  1220 pts/0    Ss   11:39   0:00 -bash
1000      9057  0.0  0.0  14888  1000 pts/0    S    12:05   0:00 searchd --pidfile --config /home/ssd/code/releases/20110816134046/config/production.sphinx.conf
postgres  7068  0.0  0.0  45184   760 ?        Ss   11:25   0:00 postgres: autovacuum launcher process                                                                                       
nobody    7630  0.0  0.0   9664   692 ?        Sl   11:40   0:00 PassengerLoggingAgent
1000      9072  0.0  0.0   2480   676 pts/0    R+   12:05   0:00 ps aux --sort -rss
root      7622  0.0  0.0   5540   568 ?        Ssl  11:40   0:00 PassengerWatchdog
postgres  7067  0.0  0.0  45052   556 ?        Ss   11:25   0:00 postgres: wal writer process                                                                                                
postgres  7069  0.0  0.0  13220   512 ?        Ss   11:25   0:00 postgres: stats collector process                                                                                           
root      2563  0.0  0.0 110832   472 ?        Ssl  Aug16   0:05 /usr/sbin/nscd
ntp       2063  0.0  0.0   4460   400 ?        Ss   Aug16   0:03 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 102:104
root         1  0.0  0.0   2736   268 ?        Ss   Aug16   0:00 /sbin/init
root      7665  0.0  0.0   6716   248 ?        Ss   11:40   0:00 nginx: master process /opt/nginx/sbin/nginx
root      2046  0.0  0.0   2428   236 ?        Ss   Aug16   0:00 cron
syslog    1998  0.0  0.0  28572   164 ?        Sl   Aug16   0:01 rsyslogd -c4
root      1985  0.0  0.0   5600    28 ?        Ss   Aug16   0:00 /usr/sbin/sshd -D
www-data 17925  0.0  0.0  65860    28 ?        S    00:17   0:08 /usr/bin/php-cgi -q -b localhost:53217 -c /etc/php5/cgi/php.ini
105       2000  0.0  0.0   2712    12 ?        Ss   Aug16   0:00 dbus-daemon --system --fork
root      1027  0.0  0.0   2368     8 ?        S

~~I also tried to tweak config and read up many articles but I am not able to achieve any great results.

    passenger_max_pool_size 7;
    passenger_pool_idle_time 170;

I will keep posting more details in this thread. Finding more data to share..

Thanks~~

3 Replies

Well, rails is using up ~75% of your RAM… I'm not familiar enough with rails to comment on why it's using almost 200MB per process (as in, I know nothing about Rails and have never used it), but it's your culprit. As far as I can tell, you've got three options:

1) Figure out why Rails is sucking up so much ram per-process and fix it

2) Reduce the number of Rails processes

3) Throw more RAM at the problem

Rails is a ram hog (which is why I don't use it), have you installed ruby enterprise? http://www.rubyenterpriseedition.com/ I've used that before and noticed a decent reduction in ram usage.

Thanks guys for the feedback.

Here's how I overcome the issue:

1. First I reduced passengermaxpool_size to 5 and it gave me an immediate relief to look further into the issue while my website kept crawling.

2. Next, I installed REE as suggested by obs. That indeed brought my memory consumption down by around 30%. What a relief that was.

I was now hungry for more. :) So I researched more and came across some settings for fine tuning garbage collection for REE.

http://engineering.gomiso.com/2011/02/2 … using-ree/">http://engineering.gomiso.com/2011/02/25/adventures-in-scaling-part-1-using-ree/

The response time seems to have improved considerably. Will wait till monday to benchmark it under heavy load.

Will be trying few things more.. Will try to document it here too..

Thanks again…

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct