Linode losing memory
and 256 MB SWAP.
Running Apache, Sendmail with no traffic to speak of and
Nagios monitoring 5 remote ports.
The system keep running out of memory slowly over time RAM is consumed and NO SWAP space is ever used.
Example:
After a clean reboot I have over 55 megs of available RAM.
Over time 2 hours and 22 minutes of uptime my RAM looks like this…
91416k av, 52128k used, 39288k free
Swap: 263160k av, 0k used, 263160k free
Swap does not EVER get used and RAM keeps counting down.
I have the exact same setup here in my lab running the exact same software:
Mem: 2066300K av, 2045904K used, 20396K free
Swap: 4096524K av, 1321928K used, 2774596K free
Is my Linode acting strange or what?
7 Replies
@ss2chef:
Swap does not EVER get used and RAM keeps counting down.
[…snip…]
Is my Linode acting strange or what?
Someone correct me if I am wrong but isn't this expected behavior. The OS (and system libraries) will cache data that has been previously used by a program even if the program releases the data. The idea is that you may read from that file again soon and this way it is already in memory. The goal of the operating system is to keep the memory full of currently in use items and items that might be asked for in the future. If you ask for something new it tries to figure out which item is not being used and will be least likely to be used in the future. Of course it is not using your swap because there is no point in caching data into swap because either way you will be reading it from disk.
Other than these stats you don't like is the system performing badly?
@eman:
@ss2chef:Swap does not EVER get used and RAM keeps counting down.
[…snip…]
Is my Linode acting strange or what?
Someone correct me if I am wrong but isn't this expected behavior. The OS (and system libraries) will cache data that has been previously used by a program even if the program releases the data. The idea is that you may read from that file again soon and this way it is already in memory. The goal of the operating system is to keep the memory full of currently in use items and items that might be asked for in the future. If you ask for something new it tries to figure out which item is not being used and will be least likely to be used in the future. Of course it is not using your swap because there is no point in caching data into swap because either way you will be reading it from disk.
Other than these stats you don't like is the system performing badly?
Hi and thank you for responding…
As the available RAM counts down to around 3 megs, the system slows to a crawl. This is with almost no network traffic or server use.
I have several dozen Linux hosts in production and have never seen such behavior.
server load is almost non existent and to see RAM slowly counting down is strange. It is supposed to be dynamic.
# iostat 1 90
# free
# ps -e -o pid,cmd,%mem,rss,trs,sz,vsz
# uptime
Do that for both when the system is running normally, and then again while the system is at a crawl.
Does it slow down at predictable times of the day? Like midnight? Top of the hour (i.e. 7:00, 8:00, etc)?
How are you detecting the system is running really slowly? What commands and tools (and output) are you using to determine that?
iostat is from the sysstat utilities, which is from:
http://perso.wanadoo.fr/sebastien.godard/
You posted used/free/total memory numbers, but you're missing something very important – buffer and cached memory information.
Since you're not touching swap at all, it doesn't sound like normal memory starvation issues. Perhaps CPU, network I/O, or disk I/O issues when things are behaving poorly?
Linux's memory scheme is indeed to use as much memory as possible for buffering/caching BUT when apps asks for memory, Linux will take away buffered/cached memory and give it to the application for use. It's a pretty good arrangement.
Also, memory numbers can be confusing sometimes.
Some tools will report VSZ – total size of the virtual address space allocated, which is not the same as amount of memory actually "used" (RSS).
@tronic:
Can you post the output of:
# iostat 1 90 # free # ps -e -o pid,cmd,%mem,rss,trs,sz,vsz # uptime
Do that for both when the system is running normally, and then again while the system is at a crawl.
Does it slow down at predictable times of the day? Like midnight? Top of the hour (i.e. 7:00, 8:00, etc)?
How are you detecting the system is running really slowly? What commands and tools (and output) are you using to determine that?
iostat is from the sysstat utilities, which is from:
http://perso.wanadoo.fr/sebastien.godard/You posted used/free/total memory numbers, but you're missing something very important – buffer and cached memory information.
Since you're not touching swap at all, it doesn't sound like normal memory starvation issues. Perhaps CPU, network I/O, or disk I/O issues when things are behaving poorly?
Linux's memory scheme is indeed to use as much memory as possible for buffering/caching BUT when apps asks for memory, Linux will take away buffered/cached memory and give it to the application for use. It's a pretty good arrangement.
Also, memory numbers can be confusing sometimes.
Some tools will report VSZ – total size of the virtual address space allocated, which is not the same as amount of memory actually "used" (RSS).
My concern is not avail RAM per se, but the fact that SWAP
usage remains at 0% regardless of the load I put on the server.
I find it strange as I have compared to several like boxes as well
as another similarly configured Linode host and all have an
active SWAP usage regardless of server load.
I realize I can crank down the RAM profile apache uses but
with server load at next to nothing apache should not be so
unresponsive…Thoughts?
I'll grab the iostat tools asap.
It's crawling now and here is some info.
free
total used free shared buffers cached
Mem: 91416 88688 2728 0 34788 33728
-/+ buffers/cache: 20172 71244
Swap: 263160 0 263160
ps -e -o pid,cmd,%mem,rss,trs,sz,vsz
PID CMD %MEM RSS TRS SZ VSZ
1 init [3] 0.5 504 23 347 1388
2 [keventd] 0.0 0 0 0 0
3 [ksoftirqd_CPU0] 0.0 0 0 0 0
4 [kswapd] 0.0 0 0 0 0
5 [bdflush] 0.0 0 0 0 0
6 [kupdated] 0.0 0 0 0 0
7 [jfsIO] 0.0 0 0 0 0
8 [jfsCommit] 0.0 0 0 0 0
9 [jfsSync] 0.0 0 0 0 0
10 [xfsbufd] 0.0 0 0 0 0
11 [xfslogd/0] 0.0 0 0 0 0
12 [xfsdatad/0] 0.0 0 0 0 0
13 [mdrecoveryd] 0.0 0 0 0 0
14 [kjournald] 0.0 0 0 0 0
815 /sbin/dhclient - 1.0 992 314 498 1992
865 syslogd -m 0 0.7 680 24 389 1556
869 klogd -x 0.5 460 18 347 1388
914 /usr/sbin/sshd 1.6 1528 265 880 3520
924 xinetd -stayaliv 0.9 860 129 512 2048
934 /usr/sbin/httpd 9.1 8392 289 4822 19288
943 crond 0.6 600 19 360 1440
961 /usr/sbin/atd 0.5 548 12 357 1428
995 /usr/bin/nagios 3.7 3416 262 1776 7104
1001 /sbin/mingetty t 0.4 416 6 342 1368
1002 /usr/sbin/httpd 9.7 8904 289 4871 19484
1003 /usr/sbin/httpd 9.6 8860 289 4861 19444
1004 /usr/sbin/httpd 9.6 8864 289 4861 19444
1005 /usr/sbin/httpd 9.7 8896 289 4856 19424
1006 /usr/sbin/httpd 9.6 8856 289 4859 19436
1007 /usr/sbin/httpd 9.6 8848 289 4861 19444
1008 /usr/sbin/httpd 9.6 8852 289 4857 19428
1009 /usr/sbin/httpd 9.6 8848 289 4857 19428
11105 /usr/sbin/sshd 2.2 2012 265 1692 6768
11110 /usr/sbin/sshd 2.4 2228 265 1701 6804
11111 -bash 1.5 1376 588 1075 4300
11142 su - 1.0 972 16 1027 4108
11143 -bash 1.5 1384 588 1077 4308
11386 ps -e -o pid,cmd 0.7 684 66 660 2640
uptime
13:16:35 up 1 day, 5 min, 1 user, load average: 0.00, 0.00, 0.00
If I might suggest something… it's a lot easier to read columns of numbers if they're in a monospaced font. Easiest way to do that is to start the block with the
tag.
Anyway, you've stated a problem. So what I'm doing is to methodically gather information, then can analyze what the numbers are telling you. Don't want to jump to any conclusions or culprits yet -- the data will point to something. It's fun playing detective. :D
Need some more information. How do you tell that there is a performance problem? Are keystrokes when ssh'd in echoing really slowly? The web pages coming up after a 10-15 second delay? Does ssh session feel normal but web performance suck? Something else?
Looking at your numbers, they seem to add up properly for memory, so it would appear as if your performance issue is somewhere else other than memory.
Explanation:
The output of your ps -e -o ... command has numbers in the RSS column. All that adds up to about 96 MB. [b]BUT[/b]!
I used the pmap utility (it comes with procps so it's probably already installed on your machine) to look at details of how apache processes had its memory allocated.
It looks like most of apache's memory usage is due to shared libraries. Since you have process-based Apache daemons, there will be 10 copies of httpd running, each with its own memory allocation.
However, in reality, the shared libraries will be loaded only once, not 10 times. So it is [b]NOT[/b] 9 MB per apache daemon x 10 = 90 MB.
So one apache process will have shared libraries + private memory allocations, for about 9 MB on your machine. The rest of apache processes will be using only private memory, which is probably about 1.2 MB for each apache process on your system.
So calculation is more like 9 MB + (1.2 * 9) = 19.8 MB for Apache usage, roughly, for your setup. Then you have some other daemons that eats less than 1 MB except for Nagios (which also uses some common shared libs too).
total used free shared buffers cached
Mem: 91416 88688 2728 0 34788 33728
Total available (kernel eats some memory for itself) is 91416 K.
Used up is 88688 K.
Buffers + cache is 34788+33728 = 68516 K.
So if you exclude buffers + cache, you currently had 22.363 MB usable by your apps. Apparently that's about all your apps asked for -- Apache and other daemons... which is in the right ballpark as my calculations.
So to answer your question... everything adds up. memory-wise, and you are not starved for memory. Hence, there was no need to go into swap... so Linux didn't because in reality you had almost 70 MB free.
The numbers reported by ps, even for RSS, don't properly account for more than one instance of shared libs loaded by other processes... so it gives very misleading numbers. That's a subtle gotcha to keep in mind. This is what I meant earlier by memory reporting being tricky/misleading (at first glance) sometimes.
That's why pmap was invented, to break down actual memory usage, and make it easier to see exactly what memory is really being allocated, and what are just merely pointers to the same library (instead of a separate allocation).
So... your performance issue is most likely not memory related.
Possible culprits: sudden spike in CPU usage -- especially if there's a "thundering herd" going on. (short-term spike)
Or maybe there's a sudden burst of disk I/O traffic that forces processes wanting to read or write stuff to stall a bit.
Or maybe there's a process deadlocking on getting some kind of resource such as a race condition. Not very common, but not unheard of, either. I come across this about once or twice a year with various multi-threaded or multiple-processes apps.
Or maybe another Linode system on the same host has temporarily robbed much of the disk I/O, though the I/O queuing stuff that Linode runs tend to prevent a single Linode from eating all available disk I/O, I understand. So probably not a possible culprit.
Or you may be getting a denial of service attack -- check system and application logs to make sure you haven't gotten a burst of unusual traffic.
There's quite a few possibilities, basically. So now just need to know how you are detecting a performance problem, because that will help narrow it down to the offending culprit.
If you REALLY want to force your system into swap to make sure that Linux's swapping algorithm works, there's some utilities written that allocates a lot of memory and modifies its memory pages (forcing kernel to mark them "dirty" and make eligible for potential swapping once real memory runs out).
Unfortunately, I can't remember name or where to get any of these special tools, but definitely a lot easier to see the kernel forced to page stuff in/out of memory with that kind of tool.
My box looks like this:
Physical:
Free : 5.27 MB Used : 132.12 MB Total : 137.39 MB
Swap:
Free: 263.24 Used: MB 249.75 MB TotaL: 512.99 MB
````
That seems pretty normal to me. When I ran sysinfo on my home linux desktop, it was pretty much the same. Linux does like to use as much physical ram as it can. However, I have never seen swap not being used. That is pretty weird.
… since my logs were full of requests that attempt to exploit this recent vunerability, and I hadn't patched Apache at that point. I did get it all patched up, but I still have the same behavior affecting one of the servers right now. I haven't made too many changes with that system, and it only started within the past month, so it's definately not normal, but I haven't had any problems with it lagging the rest of the system for me.
Though I have read a lot about how reported free memory works a lot differently than how most people really think it does, and that it's not uncommon to see that.