Investigate the cause of high CPU usage?
I have a process that is intermittently using all of my CPU. It is owned by a specific UID and I'm able to get temporary relief by killing the process manually, but it always comes back.
I've noticed that right before the spikes, there is a spike in incoming packets and I'm concerned that my system could be compromised or under attack.
How can I figure out what's going on?
3 Replies
The next time that you see that process ramp up, go ahead and run:
ps fax o uid,%cpu,%mem,time,comm | awk '0+$0 == 999 {print}'
This will output all running processes that belong to UID 999, along with their associated parent process(es) so that you can try to determine the root cause of the issue. For example, if the process were called by a script or another process, then this command will tell you it's name.
In the example output below I changed the UID to 1000, which is my user (my system only has one process belonging to UID 999, so that output less useful as a demonstration):
$ ps fax o uid,%cpu,%mem,time,comm | awk '0+$0 == 1000 {print}'
1000 0.0 0.1 00:00:00 | \_ sshd
1000 0.0 0.1 00:00:00 | \_ bash
1000 0.0 0.0 00:00:00 | \_ ps
1000 0.0 0.0 00:00:00 | \_ awk
1000 0.0 0.1 00:00:00 \_ sshd
1000 0.0 0.1 00:00:00 \_ bash
Here we can actually see the command itself, and that 'ps' and 'awk' were started by 'bash', which was started by 'sshd'. We also see my login shell itself, which is 'bash', which is owned by 'sshd'.
You can also search for processes using high CPU with a command like this:
ps aux | sort -nrk 3,3 | head -n 5
Running that will output the top 5 processes consuming your Linode's CPU resources:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
tomd 7314 99.8 0.0 107952 352 pts/1 R+ 16:42 4:34 yes
tomd 7327 0.0 0.0 107956 656 pts/0 S+ 16:47 0:00 head -n 5
tomd 7326 0.0 0.0 116496 912 pts/0 R+ 16:47 0:00 sort -nrk 3,3
tomd 7325 0.0 0.0 155324 1856 pts/0 R+ 16:47 0:00 ps aux
However you find the offending process, I recommend checking it's logs for clues as to what could be causing the issue. You can also monitor CPU usage in realtime by using any of the following commands:
top
htop
watch "ps aux | sort -nrk 3,3 | head -n 5"
for i in {1..5} ; do top -b -n 1 | head -15 ; echo "" ; sleep 60 ; done
If the spike in CPU usage comes along with an increase in incoming packets, it is most likely a spike in traffic to your Linode, however you can analyze the traffic using tcpdump, WireShark, or any number of other tools designed to analyze network traffic. The specifics here are a bit too heavy for a forum post, but you can find plenty of information on how to interpret the readouts using your preferred search engine. There are a number of different traffic patterns that can indicate an attack, but in pretty much all cases you should see significantly more traffic coming at your Linode than leaving it.
To investigate for a possible compromise, there are a few tools that you can use, including ClamAV and RK Hunter, but since the full scope of a compromise can be very difficult to determine for certain, you may want to go through our guide on Recovering from a Compromise. This may mean re-building your Linode from scratch in order to avoid leaving any infected files behind, so it's a good idea to make regular backups. You can use our Backup service for this, a tool such as rsync or you can follow our guide on Copying a Disk Over SSH. However you do it, be aware that if a file is infected when you copy it over, it will remain infected on the new system, so it may be a good idea to run some malware scans on the new system after migrating your data over from the backup to ensure that none of your data is infected.
If you are seeing High CPU usage in your Linode's graphs, but not in the ps
command, you may want to check if your Linode is making heavy usage of its swap space. This could be tying up your CPU in the background due to the kernel's need to bring things in and out of memory constantly.
If this is the case, you may wish to try setting your Linode's 'swappiness' to a lower value. Swappiness is a value between 0 and 100 with 0 meaning avoid swap usage if at all possible and 100 favoring swap usage.
To see your current swappiness value, you can use the following command:
cat /proc/sys/vm/swappiness
Either one of the following commands will let you set swap on most modern Linux Distributions
sudo sysctl -w vm.swappiness=20
echo 20 >/proc/sys/vm/swappiness
To have the changes persist after a reboot, you will want to edit your /etc/sysctl.conf
file to change the value of vm.swappiness
.