Linode frozen for reaching file-max limit?
This is the first time this happens (and I've had this Linode for years):
This morning my linode was frozen, and when I tried to log in, I couldn't execute bash:
Cannot execute /bin/bash: Too many open files in system
Then I rebooted and could log in.
In /var/log/kern.log, I saw this several times:
Jan 6 04:30:03 localhost kernel: VFS: file-max limit 35556 reached
I wonder what would make the linode reach this limit. I only have the standard cronjobs running, but certainly something triggered the problem this morning (and this is the first Sunday of the month)
The box is a simple LAMP setup (on Debian stable).
By the way…
/etc/cron.monthly/ is empty
/etc/cron.weekly# ls -l
total 7
-rwxr-xr-x 1 root root 1370 2007-03-11 22:30 cvs
-rw-r--r-- 1 root root 329 2004-04-15 19:48 find
-rwxr-xr-x 1 root root 520 2005-01-05 14:30 man-db
-rwxr-xr-x 1 root root 259 2007-03-21 03:38 slrn
-rwxr-xr-x 1 root root 1092 2006-05-25 06:38 sysklogd
Did anyone have this problem also?
I'd appreciate any ideas on what could be wrong.
– jp
2 Replies
$ cat /proc/sys/fs/file-nr
2566 0 71806
The first number is the number of allocated handles (open files), the second is the number of unallocated file handles and the third number* is the maximum number of file handles. Watch the first number for a few days–or better yet, graph it with rrdtool--and see if it is growing over time. If so, you can use lsof to see what files are open to determine what's opening so many file handles. If you're like me and lazy, you probably don't want to look at a list of thousands of files to figure out what has all the files open. CLI to the rescue!
$ sudo lsof | awk '{print $1}' | sort | uniq -c | sort -n | tail
23 tlsmgr
24 cyrmaster
38 cron
43 smtpd
62 bash
101 sshd
203 tinyproxy
249 lmtpd
594 apache2
1457 imapd
By default, lsof shows more than just open files, but I can't remember how to turn the other off at the moment. The numbers should still give you an idea of which process is causing the problem.
If, after all of that, you find that it's not a runaway process, then you can increase the maximum number of open files very easily:
$ echo new_max_files > /proc/sys/fs/file-max
As far as I know, the only ramification of doing that is memory usage and the amount of time it takes to find an open file handle when opening a new file. I'm fairly sure that both are linear, but you may want to look into that.
Hope that helps you a little.
- On a 2.6 kernel the second number will always be zero.
I think I found something…
# sudo lsof | awk '{print $1}' | sort | uniq -c | sort -n | tail
114 bash
115 authdaemo
117 amavisd
125 spamd
129 master
137 dovecot-a
180 saslauthd
200 imap-logi
792 apache
21959 sshd
21959 file handles for sshd?
root 27226 0.0 0.6 7692 2288 ? Ss 09:11 0:00 sshd: nagioscheck [priv]
1020 27228 0.0 0.4 7692 1544 ? S 09:11 0:00 sshd: nagioscheck@notty
It seems that there's something wrong with the Nagios ssh plugins. They're connecting to the server from outside, but the processes never die! I'll get that fixed.
Thanks for the lsof tip!