"Advanced" RAM determination for a web app.
Ensuring a Google-level of response time for each request.
Here's what I mean: They attribute their success to an early goal of 0.5 sec round trip to the user IIRC for every page. So a big part of their architecture is all about keeping the indexes and other data in RAM, I believe.
I want the same for my web app, which is a search engine. I've already done the obvious: increasing to a Linode with enough RAM to eliminate swapping. But this only ensures that the running applications can be held in RAM, not the data.
So my question: Anybody know any ways of figuring out how often the Linux system is going to disk, as opposed to going to buffers or cache?
I suppose I could just total up the size of the data set (128 MB in my case) and then make sure I have that much plus a normal baseline amount in cache + buffers…
It's a little hard not having a box in front of you, isn't it? I'd normally just look and see how often the drive light is flickering.
10 Replies
Actually, my current turn-around times are an order of magnitude faster: ranging from 8ms to 61ms for new queries. IMO, these times strongly imply that the data set is in RAM.
If you absolutely want your data to be in RAM, you can use a MEMORY table in MySQL or use something like memcached. If your dataset is 128MB, you can theoretically fit all of it in RAM even on a Linode 360. Both of the above methods, however, will lose all data as soon as you restart the corresponding daemon/service, so you still need to use the disk as permanent storage.
[edit] 0.5s response time is not at all hard to achieve, if you're just talking about the HTML page and not the other heavy stuff (javascript, images, etc.) 200ms page generation time + less than 100ms latency to most of continental US = less than 0.3s to most of your American users.
@umdenken:
So my question: Anybody know any ways of figuring out how often the Linux system is going to disk, as opposed to going to buffers or cache?
Well, you should be able to find out how big your index and data set is and if it fits in memory, no ??
If it's too big to stay in memory at all times the database eventually will have to go to the disk (and then your response times will suffer).
you say the data set is 128MB, that's not much. but how big is everything all together with indices, etc (and do you have enough spare RAM available for it) ?
also, you didn't say what kind of DB backend you're using for your data. MySQL ? Postgresql ?
if performance is that crucial you might want to consider to create your own backend.
in that way you have more control over what is stored in memory, how it is accessed, what indices are used, less IO overhead, etc etc.
This is a Rails app serving a database with about 40,000 objects that's 64MB. It's using Sphinx for indexing that's also got an index that on the same order of magnitude.
So, my main processes are: Apache, mod_rails/ruby1.8, mysql, and the Sphinx searchd.
The application is an online version of the Oregon Revised Statutes: pretty much the laws of the state.
I have this running, plus two Wordpress instances and a smaller Rails app. I just switched up to a Linode 540, and I'm seeing these response times of 8-60ms per request in the Rails production log.
The Rails page caching may eventually slow down the database caching: When the app first deploys, every hit comes from the database. But as the app runs, more and more pages are statically served via Apache. Although this means that Rails and the database are bypassed (faster), it does theoretically means that the data set size increases over time. It may be that database accesses get slower because it can no longer completely be cached.
Currently, though, I'm heavily in development, and re-deploying (and clearing the page-cached static pages) every day. So it's not an issue yet.
that should give you a good idea what memory consumption looks like if a lot of different pages are hit (and make subsequent access faster as the data is in cache already).
For some apps, the Linux caching is good enough. For example, for the most part, Postgres relies on the OS to handle caching of the index and data.
Given your response times, I'd agree that it sounds like you are pretty much only hitting memory. I just read the Sphinx documentation, and it sounds like it tries to cache indexes entirely in RAM. If I'm doing my math right, Linode isn't overselling RAM, so there shouldn't be any hidden paging going on.
My suggestion is to install Munin and watch your stats over time. If you see evidence of memory pressure (small disk cache, a lot of CPU time spent in I/O wait), then look for ways to cut memory utilization. Since it sounds like your app ends up serving a lot of cached pages and relies on Sphinx for the indexing, you might be able to cut the memory mySQL tries to grab. You can also see about tweaking your number of webserver processes so they aren't consuming more memory than needed. Or you could just upgrade to a bigger Linode.
Hope that helps.
-James
but you still have the overhead of opening the DB each time you access it and reading the data into memory.
if access times are so crucial why not write your own backend and keep the data in memory at all times ?