Load Average and CPU
Today I'm having some alerts from monit about resource limit matched for CPU and Load.
I have read some logs (mysql, apache, syslog, messages, etc) and I haven't found anything strange.
I notice that graphs at linode manager have some white intervals during this occurrences.
Where should I look more?
I'm running a 360 linode with nginx proxy -> apache+mod_php and mysql.
This is a board and I have 90.000 visits/month and 320.000 page views/month.
Is this a signal to upgrade
Thanks
7 Replies
Are you running munin so you can look at trending on resource usage.
It may be that you are just having a busy day, that performance is still acceptable, and that you won't see more notices for another 9 months. Or, it could be that you've passed a growth threshold and you'll get notifications every day until you either bump up to a bigger VM plan, or optimize things.
There was no traffic difference. Neither apache or nginx show any jumps.
One hour latter, I have more traffic then when this occurred.
I have reports from members that the site was unaccessible during some seconds.
I have run vmstat and everything was normal.
Besides monit alert, this is the only signals of this occurrence:
~~![](<URL url=)http://img44.imageshack.us/img44/1050/loadday.png
~~![](<URL url=)http://img299.imageshack.us/img299/7733/swapday.png
Thanks~~~~
James
Kind of a roundabout way of saying you might want to look at your CPU wait % in addition to user and system %.
@zunzun:
The graph of load has a blank spot before the spike, that looks odd. Munin was not recording during that time.
Could be that the jump happened right then and not over a period of time and the image wasn't rendered with a line there.
I saw the same thing on my Linode 360 for disk I/O when I was doing a bunch of stuff last week. I had a sudden spike in I/O, not over minutes of even seconds, just BAM. If you were able to look at those graphs in finer detail (zoom in) I bet you'd see a line there. Instead all you get to see is a tiny little rectangle that represents an entire hour.
@Vance:
This may or may not have anything to do with anything, but several weeks ago one Linode 360 in Newark was putting up extremely high CPU wait numbers. The load kept creeping up until the easiest solution was just to reboot. I put in a ticket, but they couldn't track down any cause and the local logs didn't show anything obvious.
Kind of a roundabout way of saying you might want to look at your CPU wait % in addition to user and system %.
I see that every so often on my Fremont based linode. I have a job that runs once an hour; every so often execution time takes 4 or 5 times as long as it should; sometimes even 10 times as long. I/O was very very slow on the machine. When this eventually happened consistently I asked the linode staff to look into it; they couldn't find anything wrong. But they nicely let me migrate to a newer machine. Execution times came back to normal.
Since then I've seen the occasional blip in performance (currently jobs are taking about 40% longer than they should; a couple of days ago peaked at 1200% for one hour).
I'm thinking I/O handling on the Xen linodes isn't yet as delimited as the rate handler the old UML linodes had.