Segfaults on host 26 (there's no /lib/tls)

Hello. I run Debian stable with a few backports on host 26, and

daemons have been dying every few days. It's usually after

cronjobs are run (would logrotate have anything to do with it?)

Anyway, these are the ones which died already:

  • apache (version from stable)

  • bind9 (version from stable; died several times)

  • dovecot (backported; died more than twice)

  • amavis (backported; died several times)

I had to put lines in /etc/crontab restarting them daily.

Today aptitude also segfaulted:

host:~# aptitude update

Reading Package Lists… Done

Segmentation faulty Tree… 87%

host:~#

And the next time I ran it, everything worked fine.

There is no /lib/tls, or /usr/lib/tls, and libc6 doesn't seem to install

any files with "tls" in the name.

So… Is anyone else having problems with host 26? Doesn't that seem

like a hardware issue (bad memory)?

(Interesting fact: postfix never died.)

jp

14 Replies

Which kernel are you running? There have been a number of updates to UML that are pending release – I'll try to build a pre-release kernel and get it out to you guys. I seriously doubt this is a hardware issue, though..

-Chris

> Which kernel are you running?

Latest 2.4 (2.4.26-linode31-1um).

> There have been a number of updates to UML that are pending

release – I'll try to build a pre-release kernel and get it out to you guys.

I seriously doubt this is a hardware issue, though..

I thought it would be hardware because if it was the kernel, people with

accounts on all other hosts would have complained already… Anyway,

if you get a new kernel, we try it out! :-)

jp

@jp:

> Which kernel are you running?
Latest 2.4 (2.4.26-linode31-1um).
Although this isn't the final release, would you mind giving 2.6.9-rc2-mm4-linode5 a shot? Before you do, make sure to do an "apt-get install dhcp3-client" to get the required DHCP update for Debian…

Thanks,

-Chris

OK, I rebooted into 2.6.9-rc2-mm4-linode5. Now I'll remove the restart

lines from the cronjob, and I suppose we'll have to wait a few days to see

how the host behaves…

Thanks for your help!

jp

Today something strange happened.

7FD9E4239F 2821 Mon Oct 4 11:34:09 backports-admin@lists.backports.org (host 127.0.0.1[127.0.0.1] said: 451 4.5.0 Error in processing, id=16310-05, decoding2-get-file-types FAILED: run_command (open pipe): Can't fork at /usr/lib/perl/5.6.1/IO/File.pm line 65. at /usr/sbin/amavisd-new line 1125. (in reply to end of DATA command)) RECIPIENT@SOMEWHERE

After restarting amavis, some of the messages were delivered, but others had the same problem.

I guess this is because of the new kernel. What is the right way to fix it?

jp

Well… Apache, mysql and amavisd-new were killed today by the kernel's oom-killer,

as can be seen by the logs below. (Trimmed -- I'll post or email the complete stuff

if necessary).

I don't run any huge websites or applications on this linode. Just this:

  • Apache (almost no dynamic content)

  • MySQL (only for email database)

  • Postfix

  • Amavis+Clamav+Spamassassin

  • Dovecot imapd

And the ordinary stuff (sshd, etc)

No Java, and no heavy stuff.

What's going on?

jp

Oct 4 01:29:17 localhost kernel: oom-killer: gfp_mask=0x1d2

Oct 4 01:29:17 localhost kernel: DMA per-cpu:

Oct 4 01:29:17 localhost kernel: cpu 0 hot: low 8, high 24, batch 4

Oct 4 01:29:17 localhost kernel: cpu 0 cold: low 0, high 8, batch 4

Oct 4 01:29:17 localhost kernel: Normal per-cpu: empty

Oct 4 01:29:17 localhost kernel: HighMem per-cpu: empty

Oct 4 01:29:17 localhost kernel:

Oct 4 01:29:17 localhost kernel: Free pages: 248kB (0kB HighMem)

Oct 4 01:29:17 localhost kernel: Active:6030 inactive:5828 dirty:0 writeback:7 unstable:0 free:62 slab:2119 mapped:5862 pagetables:391

Oct 4 01:29:17 localhost kernel: DMA free:248kB min:256kB low:512kB high:768kB active:24120kB inactive:23312kB present:65536kB

Oct 4 01:29:17 localhost kernel: protections[]: 0 0 0

Oct 4 01:29:17 localhost kernel: Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB

Oct 4 01:29:17 localhost kernel: protections[]: 0 0 0

Oct 4 01:29:27 localhost kernel: HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB

Oct 4 01:29:27 localhost kernel: protections[]: 0 0 0

Oct 4 01:29:27 localhost kernel: DMA: 04kB 18kB 516kB 132kB 064kB 1128kB 0256kB 0512kB 01024kB 02048kB 0*4096kB = 248kB

Oct 4 01:29:27 localhost kernel: Normal: empty

Oct 4 01:29:27 localhost kernel: HighMem: empty

Oct 4 01:29:27 localhost kernel: Swap cache: add 1140080, delete 1140034, find 191763/356157, race 1+8

Oct 4 01:29:27 localhost kernel: Out of Memory: Killed process 17500 (apache).

Oct 4 01:30:12 localhost kernel: Out of Memory: Killed process 12262 (mysqld).

Oct 4 01:30:12 localhost kernel: Out of Memory: Killed process 1233 (mysqld).

Oct 4 01:30:12 localhost kernel: Out of Memory: Killed process 1235 (mysqld).

Oct 4 01:30:12 localhost kernel: Out of Memory: Killed process 1236 (mysqld).

Oct 4 01:30:12 localhost kernel: Out of Memory: Killed process 1239 (mysqld).

Oct 4 01:30:12 localhost kernel: oom-killer: gfp_mask=0x1d2

Oct 4 01:30:12 localhost kernel: DMA per-cpu:

Oct 4 01:30:12 localhost kernel: cpu 0 hot: low 8, high 24, batch 4

Oct 4 01:30:12 localhost kernel: cpu 0 cold: low 0, high 8, batch 4

Oct 4 01:30:12 localhost kernel: Normal per-cpu: empty

Oct 4 01:30:12 localhost kernel: HighMem per-cpu: empty

Oct 4 01:30:12 localhost kernel: Out of Memory: Killed process 11971 (amavisd-new).

This night it was only apache who got killed by the oom killer.

It was at 1:27, and this is the daily cronjob time. I have already disabled aide (thought it was eating up all memory), and processes are still being killed.

This is what is left in cron.daily:

-rwxr-xr-x    1 root     root          314 Jun 11 14:15 amavisd-new
-rwxr-xr-x    1 root     root          502 Jul  4  2002 calendar
-rwxr-xr-x    1 root     root          315 Mar 11  2002 dlocate
-rwxr-xr-x    1 root     root          280 Jul 11 17:25 find
-rwxr-xr-x    1 root     root           51 Apr 23  2002 logrotate
-rwxr-xr-x    1 root     root          708 Mar 14  2002 man-db
-rwxr-xr-x    1 root     root          226 Apr  5  2002 mgetty
-rwxr-xr-x    1 root     root          495 Nov 18  2001 netkit-inetd
-rwxr-xr-x    1 root     root          345 Mar  7  2002 quota
-rwxr-xr-x    1 root     root         2736 Oct  1  2001 standard
-rwxr-xr-x    1 root     root         1197 Jan  3  2002 sysklogd

(I have oimmited disabled entries)

Can any of these eat up lots of memory? This is geting frustrating…

jp

When oom killer is choosing something to kill, it has a preference for processes which are consuming a lot of memory but which are not long lived. The fact that apache seems to get picked first leads me to suspect that it is consuming too much memory. The situation may be being made worse by the io limiter kicking in and delaying swap operations - a possible cause of your meltdown in the early hours of 4th Oct. I suggest that you post the output of:````
ps -e -o pid,cmd,%mem,rss,trs,sz,vsz

````
cat /proc/meminfo

to see what your memory usage is like, and

cat /proc/io_status

to see how your io limiter values are set.

> When oom killer is choosing something to kill, it has a preference for processes which are consuming a lot of memory but which are not long lived. The fact that apache seems to get picked first leads me to suspect that it is consuming too much memory. The situation may be being made worse by the io limiter kicking in and delaying swap operations - a possible cause of your meltdown in the early hours of 4th Oct

Well, it doesn't seem like apache is using too much memory… Amavis uses more.

Maybe some daily cronjob is triggering several processes that use lots of memory (not individually, but collectively)? Is it common for cronjobs to behve like that?

I ask because for two consecutive days, the oom killer was triggered at thetime when the daily cronjob was running.

The information you asked follows.

jp

# ps -e -o pid,cmd,%mem,rss,trs,sz,vsz
  PID CMD              %MEM  RSS  TRS    SZ   VSZ
    1 init [2]          0.1   72   24   318  1272
    2 [ksoftirqd/0]     0.0    0    0     0     0
    3 [events/0]        0.0    0    0     0     0
    4 [khelper]         0.0    0    0     0     0
    5 [kthread]         0.0    0    0     0     0
    6 [kblockd/0]       0.0    0    0     0     0
   17 [pdflush]         0.0    0    0     0     0
   18 [pdflush]         0.0    0    0     0     0
   20 [aio/0]           0.0    0    0     0     0
   19 [kswapd0]         0.0    0    0     0     0
   21 [jfsIO]           0.0    0    0     0     0
   22 [jfsCommit]       0.0    0    0     0     0
   23 [jfsSync]         0.0    0    0     0     0
   24 [xfslogd/0]       0.0    0    0     0     0
   25 [xfsdatad/0]      0.0    0    0     0     0
   26 [xfsbufd]         0.0    0    0     0     0
  652 [kjournald]       0.0    0    0     0     0
  692 [kjournald]       0.0    0    0     0     0
  693 [xfssyncd]        0.0    0    0     0     0
  738 dhclient eth0     0.0    0  378   479  1916
 1040 /usr/sbin/sshd    0.1   92  264   697  2788
 1052 /sbin/syslogd     0.4  264   22   336  1344
 1055 /sbin/klogd       0.2  160   17   316  1264
 1058 /usr/sbin/named   1.3  800  232  2582 10328
 1059 /usr/sbin/named   1.3  804  232  2582 10328
 1061 /usr/sbin/named   1.3  804  232  2582 10328
 1064 /usr/sbin/named   1.3  804  232  2582 10328
 1065 /usr/sbin/named   1.3  804  232  2582 10328
 1071 amavisd (master)  1.5  908  659  7101 28404
 1075 /usr/sbin/spamd   0.0    0  659  5633 22532
 1085 spamd child       0.0    0  659  5633 22532
 1086 spamd child       0.0    0  659  5633 22532
 1087 spamd child       0.0    0  659  5633 22532
 1088 spamd child       0.0    0  659  5633 22532
 1089 spamd child       0.0    0  659  5633 22532
 1090 /usr/sbin/clamd   5.7 3424   35  7987 31948
 1131 /usr/bin/freshcl  0.6  392   27   510  2040
 1136 /usr/sbin/courie  0.0    0    8   362  1448
 1137 /usr/lib/courier  0.0   48   59   515  2060
 1139 /usr/lib/courier  0.0   48   59   515  2060
 1140 /usr/lib/courier  0.0   48   59   515  2060
 1141 /usr/lib/courier  0.0   48   59   515  2060
 1142 /usr/lib/courier  0.0   48   59   515  2060
 1143 /usr/lib/courier  0.0   48   59   515  2060
 1156 /usr/sbin/inetd   0.0   24   15   327  1308
 1167 /bin/sh /usr/bin  0.0    0  473   547  2188
 1209 /usr/sbin/mysqld  1.9 1180 3692 16294 65176
 1216 /usr/sbin/mysqld  1.9 1180 3692 16294 65176
 1217 /usr/sbin/mysqld  1.9 1180 3692 16294 65176
 1218 /usr/sbin/mysqld  1.9 1180 3692 16294 65176
 1315 /usr/lib/postfix  0.3  188   21   628  2512
 1319 qmgr -l -t fifo   0.8  496   34   680  2720
 1353 /usr/lib/postgre  0.3  188 1457  2137  8548
 1355 postgres: stats   0.0   32 1457  2385  9540
 1356 postgres: stats   0.1   84 1457  2148  8592
 1371 /usr/sbin/courie  0.0    0    8   358  1432
 1372 /usr/lib/courier  0.0    0 1452   749  2996
 1375 /usr/lib/courier  0.0    0 1452   749  2996
 1377 /usr/lib/courier  0.0    0 1452   749  2996
 1379 /usr/lib/courier  0.0    0 1452   749  2996
 1381 /usr/lib/courier  0.0    0 1452   749  2996
 1383 /usr/lib/courier  0.0    0 1452   749  2996
 1391 /usr/bin/python2  0.1   72  418  1357  5428
 1392 /usr/bin/python2  1.3  820  418  4596 18384
 1401 /usr/sbin/cron    0.2  172   21   414  1656
 1408 /sbin/getty 3840  0.0    0   10   314  1256
 1454 /usr/sbin/doveco  0.2  156   75   614  2456
 1456 dovecot-auth      0.8  500  105   972  3888
 1486 /usr/sbin/clamd   5.7 3424   35  7987 31948
 2770 SCREEN -S im      1.1  660  243   654  2616
 2771 /bin/bash         0.0    0  473   557  2228
 2774 centericq         2.5 1540 3858  2425  9700
 6509 /usr/sbin/sshd    0.0    0  264  1430  5720
 6512 /usr/sbin/sshd    0.0   48  264  1464  5856
 6513 -bash             0.2  140  473   555  2220
 6911 /usr/sbin/sshd    0.0    0  264  1430  5720
 6914 /usr/sbin/sshd    0.1   60  264  1455  5820
 6915 -bash             0.2  132  473   554  2216
16807 /usr/sbin/mysqld  1.9 1180 3692 16294 65176
18443 /usr/sbin/apache  0.5  324  218 20530 82120
18444 /usr/sbin/apache  8.6 5116  218 21157 84628
18445 /usr/sbin/apache  1.4  836  218 20564 82256
18447 /usr/sbin/apache  3.6 2164  218 20709 82836
18448 /usr/sbin/apache  9.2 5484  218 21276 85104
18449 /usr/sbin/apache  8.6 5128  218 21121 84484
19673 /usr/sbin/apache  5.2 3128  218 21137 84548
19678 /usr/sbin/apache  9.0 5372  218 21223 84892
19679 /usr/sbin/apache  3.0 1816  218 21311 85244
19680 /usr/sbin/apache  0.9  588  218 20572 82288
19967 /usr/sbin/apache  5.5 3312  218 21230 84920
28516 amavisd (child)  23.9 14216 659  7133 28532
28557 /usr/sbin/sshd    1.3  800  264  1427  5708
28559 /usr/sbin/sshd    1.6 1008  264  1461  5844
28560 -bash             1.4  864  473   556  2224
28952 amavisd (virgin   3.0 1796  659  7101 28404
29752 pickup -l -t fif  1.6  956    6   661  2644
30493 imap-login        2.0 1220   73   622  2488
30564 imap-login        2.0 1220   73   622  2488
30566 imap-login        2.0 1220   73   622  2488
30567 -su               2.1 1248  473   555  2220
30571 ps -e -o pid,cmd  1.0  644   59   514  2056
# cat /proc/meminfo
MemTotal:        59352 kB
MemFree:           760 kB
Buffers:           348 kB
Cached:           6744 kB
SwapCached:      11184 kB
Active:          45612 kB
Inactive:         2980 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        59352 kB
LowFree:           760 kB
SwapTotal:      132088 kB
SwapFree:        10052 kB
Dirty:              48 kB
Writeback:           4 kB
Mapped:          42456 kB
Slab:             6896 kB
Committed_AS:   483472 kB
PageTables:       1588 kB
VmallocTotal:   973804 kB
VmallocUsed:       676 kB
VmallocChunk:   973120 kB
# cat /proc/io_status
io_count=1851754 io_rate=555 io_tokens=398756 token_refill=512 token_max=400000

Your system is running out of memory when the cron jobs launch. You are almost out of swap even before you launch them (10052 kB free out of 132088 kB in your post just now).

The %MEM value shows how much physical memory the process is consuming. Apache has low values because it is swapped out. Look at the SZ values - how much virtual memory is being used. Apache and MySQL are the big users. Apache looks like it has a lot of modules loaded. Apache on my Linode uses a quarter as much memory, even with modssl, modphp and all the standard stuff loaded. Also, you have 11 Apache instances running - a lot for a Linode 64 - if your server needs that many, its probably overloaded. Since they're mostly swapped out, Apache doesn't look that busy.

Suggestions:
* Reconfigure Apache so it doesn't load any module that you aren't using - lots of distros configure it with a bunch of stuff, just in case.

  • Reduce the value of MinSpareServers (default 5) to 3 and MaxSpareServers (default 10) to 5 in your Apache config.

  • Tune MySQL to trade speed for less memory usage.
    Also, increase the size of your swap partition (or add a second one). I know that the rule of thumb is swap = 2 * physical, but with what you're trying to run, 128MB of swap just isn't enough. I have 256MB of swap on my Linode 64 - this is what caker recommends.

> * Reconfigure Apache so it doesn't load any module that you aren't using - lots of distros configure it with a bunch of stuff, just in case.

  • Reduce the value of MinSpareServers (default 5) to 3 and MaxSpareServers (default 10) to 5 in your Apache config.

  • Tune MySQL to trade speed for less memory usage.

I see what you mean. I will make those changes.

> Also, increase the size of your swap partition (or add a second one). I know that the rule of thumb is swap = 2 * physical, but with what you're trying to run, 128MB of swap just isn't enough. I have 256MB of swap on my Linode 64 - this is what caker recommends.

Ah, right! I used to admin another Linode, and I think we used to have 256Mb swap, so that's why it never had problems. I'll reorganize the partitions today.

Thanks a lot for the help!

jp

Try adding "set-variable=threadcachesize=40" under the [mysqld] section of your mysql config file. This was reported in this thread:

http://www.linode.com/forums/viewtopic.php?p=4810#4810

I had to do this on a (real) FC2 box. mysql kept creating threads. Might be worth a shot..

-Chris

Also, here is a mysql config file for small-memory footprint (you'll still need to add the threadcachesize config line).

http://www.theshore.net/~caker/uml/my.cnf

-Chris

Guys, thank you a lot!

After putting 128Mb more for swap and changing the apache and mysql configs, my linode went through two nights without any problems. The logs show no process being killed.

BTW, it seems like switching to 2.6 was a good idea too (with 2.4, I don't remember having anything in the logs telling explicitly what happened).

Thanks!

jp

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct