Performance tuning Rails application
I built a RoR application that uses Nginx, clustered Mongrel servers, memcache, and a Mysql database. My setup is a Linode 360 - so pretty low end from a performance standpoint.
In performance testing it, my throughput maxes out at around 13 requests/second (~1.2m page views/day). During a burst, CPU spikes to around 70-75%, and memory will push slightly into swap. The testing is using my most expensive dynamic page - so this is testing worst case throughput.
I'm struggling with identifying the latest bottleneck though. I suspect it is either memory (e.g. going into swap), or database performance - but not sure how to determine which one is the root cause.
Advice is welcome. Also, any advice on the best way to scale horizontally would be much appreciated - my inclination is to use something like MySql replication to setup two identical instances and use round robin DNS to distribute requests between them.
Joe
7 Replies
How much RAM is MySQL configured to comsume? Is it appropriate for the size of your database? (The indexes, as well as the most commonly accessed data, should fit in RAM.)
Does your "most expensive dynamic page" really need to be up to date to the last tenth of a second? You said you're using memcached. Why not cache some of those query results and page snippets, even for a short duration?
Rails is great and all, but you really want to know what kind of SQL it's generating under the hood. If you're running hundreds of queries for that one page (e.g. 1+N pattern), there's your culprit. If not, run EXPLAIN on some of your SQL queries to find out which query takes the longest. Optimize that first, and then throw the cache at it.
Is there any documentation on how Linode manages CPU sharing on a host? I'm beginning to think it may not handle bursts well.
Joe
As for the time slices, I assume (but I'm guessing) that they run a high rate like 1000 or so; it would make more sense in a virtualization environment to accept higher overhead for more fine-grained sharing.
It's possible that your host has a high CPU load from other nodes (although it's rare). You could always open a ticket to ask for an investigation, and possible migration to a different host if this is the case.
I'd think that, unless you're maxing out the virtual CPUs, you're not running into CPU issues. Regardless of the time slice granularity, pending requests would be queued, and you'd be able to saturate it anyhow.
Try using "pbzip2" or another parallelizing compression tool on a large test file, while watching "htop"… outside of disk I/O, compression is a very CPU-intensive task and you can probably get darned close to 400% most of the time.
How are you testing your pages/second capacity? 20 pages/second sounds really low for serving static files. Here's what I get for a small static HTML page using ab from the server itself:
rtucker@framboise:~$ ab -n 10000 -c 100 http://hoopycat.com/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Server Software: lighttpd/1.4.19
Document Length: 2260 bytes
Time taken for tests: 1.417060 seconds
Requests per second: 7056.86 [#/sec] (mean)
Time per request: 14.171 [ms] (mean)
Time per request: 0.142 [ms] (mean, across all concurrent requests)
Transfer rate: 17387.41 [Kbytes/sec] received
Running it against my PHP-based blog (note drop in -n and -c; it degrades considerably for -c larger than 10, which I'm OK with):
rtucker@framboise:~$ ab -n 1000 -c 10 http://blog.hoopycat.com/
Document Length: 88334 bytes
Time taken for tests: 3.907850 seconds
Requests per second: 255.90 [#/sec] (mean)
Time per request: 39.079 [ms] (mean)
Time per request: 3.908 [ms] (mean, across all concurrent requests)
Transfer rate: 22156.17 [Kbytes/sec] received
That's on a Linode 360, running Ubuntu 8.04, lighttpd, php via fastcgi (tcp), xcache, and b2evolution 3.3.3.
Of course, running ab from my house is a lot worse and causes my NAT router to glow red.
It took me about an hour to bring up a second instance. The throughput on the second instance is much better since there is no db - but still is not where it should be. The goods news is that I have a good workaround to scale the application at least until I hit a database bottleneck.
I think I'm going to take a look at Passenger next to see if it gives me some additional performance.
And from your original post… is that 70-75% figure based on max=100% or max=400%? Usually, vmstat outputs 0 to 100%, but most everything else (including the dashboard charts) outputs 0 to n*100%, where n is 4 in this case. It's confusing, but both methods are correct in their own little way…