I need to apologise for CPU usage on dallas723.
For a while, I've been developing a script to help me simulate what bash would be like running through a low-baud connection. I did this on my local computer, but when it didn't work (it was outputting random files, and I wondered if maybe the method I was using was one Cygwin didn't like), I tried uploading and running it on my Linode on March 9th. No such luck, it didn't work either.
What I didn't realise was that it wasn't just outputting random files, it was spawning processes named them, too. On March 9th, it spawned four copies of /usr/bin/not (which is a program testing tool included with "llvm", as I later discovered), which were passed invalid arguments (my program had basically tried to run "not found"; yeah, my program was really buggy and I still don't understand why it did that). In response, it seems /usr/bin/not went into an infinite loop and used 100% CPU on all four processes. As the system has four simulated cores, this caused the VM to take up a full 100% of its allocated CPU.
I didn't notice this until last night due to unrelated reasons, and seeing an unknown executable (at the time) combined with taking 100% CPU caused me to believe my server had had a major security breach. After taking some forensics, I took networking offline and started looking into it with a friend of mine, who offered to help. 3 hours later, we had together determined exactly what was going on. At that point I killed the processes.
My main point with this post, though, was to apologise for hogging my portion of the CPU time for the last month-and-a-half. That was irresponsible of me. I've now enabled the email alert thresholds in the LPM so I'll know if that starts happening again a lot sooner.
It also frightens me that I had apparently not noticed these runaway processes before. I will pay a lot more attention to my server in future.
Thanks for reading, and again, I'm sorry.
2 Replies
I'm curious, though — what exactly was not
running? All the not command does is invert the exit status of the command passed to it on the command line, and I can't think of any system where found
is a command!
As for what 'not' was doing exactly, I seriously have no idea. When I was taking forensics I did use 'gcore' to take a core dump of each of the four processes, so I should hopefully be able to analyse those to work out exactly what it was doing. But for now, I don't know.
I got a response to my support ticket, btw, just letting me know that it didn't look like I caused noticeable issues for anyone else on the same host. I'm really glad about that, and that makes me all the more impressed with how far Linode has come!