How do I fix the recent performance issues that seem to occur w/o any changes to resource consumption patterns, etc.?
I've been using Linode for a few years now, and I've never had a problem. I've loved it; I've had websites running for us and our clients for a few years.
However, I've noticed some performance issues lately without any changes in usage or resource consumption spikes.
For example, two weeks ago, I needed to reboot a Linode because it simply stopped responding. I couldn't even ping or SSH into it suddenly.
Then I had the same issue and had to reboot last week.
Then this week, I had to reboot again.
It seems strange to me that the exact same configuration worked for years, but lately, the Linode is suddenly freezing every week.
Right now (as of writing), I had to restart again, and I'm getting these error messages on the Linodes panel, such as:
Stats for this Linode are not available yet
CPU, Network, and Disk stats will be available shortly
For reference, I'm using the Newark data center.
Am I the only one that's noticing this?
If so, should I migrate my website to another data center?
I could up the resources, but I don't see why I should when this config has been up and running for years w/o issue. I also tested various configs on local virtual machines, and my tests indicate that my current config is actually overkill.
10 Replies
✓ Best Answer
Thanks, everyone. I think I may have figured it out.
I just went through the instance and started systematically stripping out as much as I could to make it even more bare-bones than it already was.
Looks like I got my old performance back…maybe.
I won't know till the instance analytics are back up.
This won't fix your issue but maybe it will give you some insight as to what MIGHT BE happening.
Linode is currently having an issue with their analytics. You should subscribe to their updates (https://status.linode.com).
When your node stops responding, it could mean a few things. What are you hosting on the box? What Linode plan? What errors are you getting from your logs?
Are you running MySQL and/or Apache? WordPress? It may have run out of RAM and/or disk space. You can check your free disk space by running df -h
. Use top
, htop
and free -hm
to check your memory usage.
FWIW in the last few months I'm seeing much more bots hit our servers. As your site(s) get more popular they can attract more attention, good and bad. It doesn't take much for a script kiddie with wpscan to cause some problems. Web traffic is bursty in general so what resources you used last year won't necessarily be enough this year.
Thanks [@Dave] (/community/user/Dave); I checked out their updates page.
I subscribed to their update regarding the Newark data center and the Analytics incidents. Hopefully, they're on it and resolving the issues shortly.
As for what I'm doing with that Linode, it's the Linode 2 GB (1 CPU, 2 GB RAM, 50 GB Storage) option. It's only running two WordPress sites for a client of ours. I've set it up to use Nginx with a PHP FPM setup and MySQL (actually MariaDB), so only one instance of PHP is used for processing CGI requests for lower resource consumption. These are very small websites, with ~10 pages between the two sites. The most complex feature they have are a contact form.
In the past, that same Linode instance barely reached 10% CPU, and usually, less than half its RAM is ever consumed at any given moment. And as far as storage is concerned, both websites (including the database) are less than 3 GB (out of 50 GB).
Though I wish the analytics were up so I could double-check.
Since my client runs a local business, it barely gets traffic. The monthly transfer struggles to hit 1 GB, so I don't think anyone is slamming it with scripts. Also, I don't see any errors in the logs.
In short, this is a microscopic operation, and even the smallest instance had been overkill in the past. Maybe more investigation will yield more answers.
Once again, thanks @Dave, for your suggestions; I've put them to use and I'll keep digging.
Oh wow, @acanton77, as soon as I was about to pack it up with the last reply, I was just told we had to restart it again.
…Also, I was getting broken pages signing into the community portal earlier…not going to lie…getting a bit nervous here.
So I gave your link to the other post a read as well.
I hope what that thread was saying is not the case, but if so, I may need to head over to AWS (or any Terraform-compatible vendor).
It's not that it'll be a heavy lift to migrate all of our infrastructure; mostly, everything is automated. I can do that in a day once I build out the equivalent Terraform scripts; it's just that I don't feel like putting in the effort since everything here just (used to) work.
I'll try a few things first, and then I'll reassess our options.
Thanks again @acanton77, I guess it's not just us experiencing problems
Is it possible that one of your WP sites is using some plugin that has a memory leak or memory access violation? Still, while that would crash the site, I don't see how it would bring down your entire Linode. This is Linux, not Windows! :-)
I'm not a systems guy. Someone here might know.
One thing I would do is make a mySQLDump of your databases and then reinstall WP for each site and import the data (I like to use phpMyAdmin for backup and import.)
It is a bit of work, but probably less so than spinning up a new server or moving to Digital Ocean or Vultur, etc.
Have you created a support ticket with Linode/Akamai? My guess is they will tell you that your problem is 'out of scope.'
There is a possibility that a WP plugin can cause issues on the site, but if that were the case, I think it would have done it a year ago at some point and not repeatedly recently unless it's some weird update.
Then again, you said it shouldn't bring the whole server down, which I, too, expected.
As far as reinstalling WP, I created an exact copy of the site on low-powered VMs and never saw the same issues. I assumed that given the same CPU, RAM, storage, OS, etc., the same issue should manifest, but it didn't. So I tried it with an even more low-powered netbook that uses an SD card as storage (yeah, really low-powered; it was a $100 machine). Still, the exact website ran smoothly w/o issue.
I suspect the Linode resources are not what is being presented due to the underlying big iron being overloaded w/ VMs. It seems that they're looking at the resource consumption of the average VM and noticing that they're not hitting their max provisioned resources and overloading the underlying big iron in order to get more billing per instance per rack.
That may make sense because when I cut some processes down, it suddenly started running smoothly again like it used to. But I find it strange that ~10% CPU will freeze it (by the way, the analytics came back), but cutting it down to ~3-4% fixed it.
By the way, migrating to Digital Ocean or Vultr or etc., would be just as much effort as the local VM tests I mentioned above. But, if this nonsense keeps up, I'm out.
Again, did you file a support ticket with Akamai? If so, what have they told you?
If not, you should do so.
This sounds like your instance is running out of memory and the OOM killer is killing a process that's key to the functioning of your server. Often in these situations, things will be running fine then all of the sudden they aren't.
To check for this, you can run the following command as mentioned in this post on the Community Site titled Low memory error:
grep -i kill /var/log/messages*
You mentioned,
"I've set it up to use Nginx with a PHP FPM setup and MySQL (actually MariaDB), so only one instance of PHP is used for processing CGI requests for lower resource consumption."
The OOM killer will often decide to kill PHP and MySQL when it is freeing up memory. This post titled How do I fix php-fpm out of memory? has some instructions for optimizing your PHP service while you can use the MySQLTuner for optimizing you MySQL DB performance.
I suggest running through these steps and if you don't see any improvement, following acanton77's advice and opening a ticket with the Support Team. They can take a look at the host, check for CPU steal, noisy neighbors, etc.
I had a similar experience with my $5 nanode VPS which has been running for a long time without any issues. For the last two week weeks the I started getting high cpu alerts frequently and after some time the nanode will go offline. If I check the Lish console from their dash board I can see kernal panic and OOM related errors. When I restart the nanode the it run fine for an hour or so then again the high cpu alerts will come and the nanode will go offline with a kernel panic. I submitted two support tickets explaining this problem. Eventhough they respond quickly they just point you to the existing articles or community postings. I tried everything but nothing seems to solve the problem with this vps. One thing I noticed is the support team always suggest to upgrade to a higher plan for better performance. I created a VPS in another VPS provider with same configuration and installed all my services there. For two days I did not get any high cpu, Kernel panics, OOM killings in the new VPS with that VPS provider and the performance of the VPS also appears better. While on my linode even after stopping all my services still the cpu usage is very high. So either my VPS is hacked (But I could not find any evidence of that) or Akamai is intentioally slowing down these nanodes to force users to upgrade. Also I noticed that for similar or slighlty lower price the other VPS provider gives twice RAM. So I upgraded my VPS to that plan and deleted the nanode and said good bye to lindoe/Akamai.