Linode frozen for reaching file-max limit?

Hi,

This is the first time this happens (and I've had this Linode for years):

This morning my linode was frozen, and when I tried to log in, I couldn't execute bash:

Cannot execute /bin/bash: Too many open files in system

Then I rebooted and could log in.

In /var/log/kern.log, I saw this several times:

Jan  6 04:30:03 localhost kernel: VFS: file-max limit 35556 reached

I wonder what would make the linode reach this limit. I only have the standard cronjobs running, but certainly something triggered the problem this morning (and this is the first Sunday of the month)

The box is a simple LAMP setup (on Debian stable).

By the way…

/etc/cron.monthly/ is empty

/etc/cron.weekly# ls -l
total 7
-rwxr-xr-x 1 root root 1370 2007-03-11 22:30 cvs
-rw-r--r-- 1 root root  329 2004-04-15 19:48 find
-rwxr-xr-x 1 root root  520 2005-01-05 14:30 man-db
-rwxr-xr-x 1 root root  259 2007-03-21 03:38 slrn
-rwxr-xr-x 1 root root 1092 2006-05-25 06:38 sysklogd

Did anyone have this problem also?

I'd appreciate any ideas on what could be wrong.

– jp

2 Replies

Just throwing out a guess here, but you may have a process that's opening files and not closing them. There's a proc file to see how many files are open:

$ cat /proc/sys/fs/file-nr
2566    0       71806

The first number is the number of allocated handles (open files), the second is the number of unallocated file handles and the third number* is the maximum number of file handles. Watch the first number for a few days–or better yet, graph it with rrdtool--and see if it is growing over time. If so, you can use lsof to see what files are open to determine what's opening so many file handles. If you're like me and lazy, you probably don't want to look at a list of thousands of files to figure out what has all the files open. CLI to the rescue!

$ sudo lsof | awk '{print $1}' | sort | uniq -c | sort -n | tail 
     23 tlsmgr
     24 cyrmaster
     38 cron
     43 smtpd
     62 bash
    101 sshd
    203 tinyproxy
    249 lmtpd
    594 apache2
   1457 imapd

By default, lsof shows more than just open files, but I can't remember how to turn the other off at the moment. The numbers should still give you an idea of which process is causing the problem.

If, after all of that, you find that it's not a runaway process, then you can increase the maximum number of open files very easily:

$ echo new_max_files > /proc/sys/fs/file-max

As far as I know, the only ramification of doing that is memory usage and the amount of time it takes to find an open file handle when opening a new file. I'm fairly sure that both are linear, but you may want to look into that.

Hope that helps you a little.

  • On a 2.6 kernel the second number will always be zero.

Thanks for your response!

I think I found something…

# sudo lsof | awk '{print $1}' | sort | uniq -c | sort -n | tail 
    114 bash
    115 authdaemo
    117 amavisd
    125 spamd
    129 master
    137 dovecot-a
    180 saslauthd
    200 imap-logi
    792 apache
  21959 sshd

21959 file handles for sshd? :-) I think I know what's happening. I just saw lots of processes like this:

root     27226  0.0  0.6   7692  2288 ?        Ss   09:11   0:00 sshd: nagioscheck [priv]
1020     27228  0.0  0.4   7692  1544 ?        S    09:11   0:00 sshd: nagioscheck@notty

It seems that there's something wrong with the Nagios ssh plugins. They're connecting to the server from outside, but the processes never die! I'll get that fixed.

Thanks for the lsof tip! :-)

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct