Newark Data Center Power Failure
One of our servers is still down. I can ping it but cannot connect to it vis lish, ssh, etc. Longview has no recent activity, but the graphs in the Linode Manager show that the CPU has a little activity (including a spike from a cron job).
Anyone else having issues?
15 Replies
I can connect via LISH, but nothing else.
all normal services are running (including ssh, ftp, http, etc).
I've turned off iptables in case it was a firewall issue.
I've rebooted.
I've recreated /etc/resolv.conf and restarted the networking service.
Any ideas?
I've tried connecting to it from one of our test VPSs located in the same data center. The test VPS is running normally.
the test VPS cannot connect to the problem VPS via LISH, ssh, ftp, http, etc.
the problem VPS cannot connect to the test VPS via LISH, ssh, ftp, http, etc.
the problem VPS can ping domains not on the problem server and get the response.
I can use wget on the problem VPS to get webpages from sites located on the problem VPS, but not from any other server.
Support has suggested booting into 'Rescue Mode' and performing a filesystem check. I'm currently cloning the file system and will try rescue mode.
I'll check 'route -n' once the fsck is done. The 'e2fsck -f' has been running for over an hour and it's still on 'Pass 1'. It's an 82 GB file system image.
I've never run into an fsck that has taken this long. Ugh.
I'd hate to lose the hour that it's been running. Is there any way to check if the VPS is still running in recovery mode without losing the progress of the fsck (if there has been any)?
Here's the output of 'route -n':
[root@www ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 198.74.60.1 0.0.0.0 UG 0 0 0 eth0
0.0.0.0 66.175.213.1 0.0.0.0 UG 0 0 0 eth0
0.0.0.0 66.175.212.1 0.0.0.0 UG 0 0 0 eth0
0.0.0.0 66.175.210.1 0.0.0.0 UG 0 0 0 eth0
0.0.0.0 50.116.48.1 0.0.0.0 UG 0 0 0 eth0
50.116.48.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
66.175.210.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
66.175.212.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
66.175.213.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
198.74.60.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
I'm not sure why 198.74.60.1 is in that list, though I assume it's our gateway at Linode (resolves to gw-li557.linode.com).
@hoopycat:
Or check the charts on the dashboard; fsck should show up as some nontrivial amount of disk I/O
I was in rescue mode and I didn't see any activity on the graphs during the 1 1/2 hours it was in rescue mode.
@Main Street James:
The fsck finished. I had lost the LISH connection again (it's been only lasting a few minutes at a time but doesn't always respond when trying to reconnect).
Here's the output of 'route -n':
[root@www ~]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 198.74.60.1 0.0.0.0 UG 0 0 0 eth0 0.0.0.0 66.175.213.1 0.0.0.0 UG 0 0 0 eth0 0.0.0.0 66.175.212.1 0.0.0.0 UG 0 0 0 eth0 0.0.0.0 66.175.210.1 0.0.0.0 UG 0 0 0 eth0 0.0.0.0 50.116.48.1 0.0.0.0 UG 0 0 0 eth0 50.116.48.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 66.175.210.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 66.175.212.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 66.175.213.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 198.74.60.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
I'm not sure why 198.74.60.1 is in that list, though I assume it's our gateway at Linode (resolves to gw-li557.linode.com).
You should only have one entry starting 0.0.0.0 what's the contents of your network config file? And what's the primary IP of the node (ie. the one assigned to eth0).
The primary IP is 50.116.48.0. The 66.175.X.X IPs are additional IPs used for SSLs for ecommerce sites on that server.
Which config file(s) would you like to see?
James
We've never had any problems with this server in the past and I haven't changed the configuration on this server for quite some time.
The routing table should look something like this
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 50.116.33.1 0.0.0.0 UG 100 0 0 eth0
50.116.33.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
50.116.37.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
50.116.38.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
50.116.39.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
173.230.133.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.128.0 0.0.0.0 255.255.128.0 U 0 0 0 eth0
198.74.52.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
This is from a box with multiple SSL certs on it.
Thank you for your help. This issue has been resolved by the Linode Support staff (who have graciously put up with my pestering nature while dealing with the aftermath of last night's power outage). Support has resolved a configuration issue on their end and now everything is responding correctly.
obs,
This VPS is running CentOS. I am in the planning stages of moving all the sites to Ubuntu LTS servers so I'm not going to try to figure out why my routing table seems to be a bit funky (though it may turn out to be a rabbit I chase anyway). I'm going to wait and see the reviews of 14.04 LTS before deciding whether to go with 14.04 or if I should stick with 12.04 (which I have on other VPSs).
Thanks again,
James