My linode hangs every months or so...
I use linode since years, never got a single problem but since 2013 I'm experiencing some linode hanging and I'm not able to discover why it hangs.
When the server hangs I receive a mail from linode that my server has averaged 20% of cpu usage in the last two hours, this usually happens by night and I find my server hanged in the morning.
Apache does not respond anymore, email does not work, svn not working, any service is working neither SSH.
The only service that it works is lish console.
Have you got an idea on what it can cause this issue?
Thanks.
15 Replies
is there a method to catch the problem?
@sblantipodi:
I think that fail2ban is causing this issues, what do you think?
is there a method to catch the problem?
Why do you think that is it?
Have you looked at the /var/log/fail2ban.log files to see what it is doing?
@Dweeber:
@sblantipodi:I think that fail2ban is causing this issues, what do you think?
is there a method to catch the problem?
Why do you think that is it?
Have you looked at the /var/log/fail2ban.log files to see what it is doing?
because is the most cpu intensive software I have on that linode.
20% of CPU for more than an hours?
this will help me understanding what happens on my server before hanging.
Perhaps you have really huge logs that aren't being rotated? Just guessing..
#!/bin/ksh -p
LOG=/var/tmp/srvr_stat.$(date +%Y%m%d)
{
date
uptime
free
ps aux
echo
echo
} >> $LOG
This'll let you see some basics of what your machine if doing; in particular free memory (are you swapping to death?) and processes using lots of CPU. After your machine crashes you can review the log files to see what happened.
@sweh:
Run this program from cron every 5 minutes:
#!/bin/ksh -p LOG=/var/tmp/srvr_stat.$(date +%Y%m%d) { date uptime free ps aux echo echo } >> $LOG
This'll let you see some basics of what your machine if doing; in particular free memory (are you swapping to death?) and processes using lots of CPU. After your machine crashes you can review the log files to see what happened.
Ok, I modified the program to write a new file every 5 minutes and put this files in a new direcotry every day.
#!/bin/ksh -p
mkdir -p /root/log_for_crash_detect/day_$(date +%Y-%m-%d)
LOG=/root/log_for_crash_detect/day_$(date +%Y-%m-%d)/log_$(date +%Y-%m-%d-%H-%M)
{
date
uptime
free
ps aux
echo
echo
} >> $LOG
In this way it will be easyer to track the problem.
I really suspect that fail2ban is the killer.
This particular linode does not run anything such resource intensive, it runs a mailserver, a svn server, a proxy server and I use it for tunneling.
I think that the problem is in fail2ban because I know it has many problem in analyzing big files.
I rotate my maillog every week but it can be up to 300MB and this may create problems to fail2ban I think.
IN any case I will keep you posted if I discover something more.
Thanks to help me tracking the problem.
@hoopycat:
Disable fail2ban and see if the problem goes away?
this is the second option I have if I'm sure that fail2ban is the killer.
@sblantipodi:
wait a minute. is there a way to sort for CPU usage using ps command?
ps aux --sort '-pcpu'
sorts all processes by cpu
@obs:
@sblantipodi:wait a minute. is there a way to sort for CPU usage using ps command?
ps aux --sort '-pcpu'
sorts all processes by cpu
thanks
I'm thinking that Linode limits CPU usage and some sort of "resource protection" is executed on my linode.
@Nuvini:
I've never heard of Linode "limiting CPU". If you're using too much, they'll tell you, but then again, I've never heard of that either. Though if you do think the host is the cause you can open a ticket to be migrated to another host.
I opened a ticket previously and they saied me that they not limit any resource.
I will belive that, sorry if I doubted