Server no longer available after abandoned upgrade
sudo aptitude update
sudo aptitude safe-upgrade
Whilst the second command (sudo aptitude safe-upgrade) was running, I (foolishly) decided to shut the server down (to bring down Apache and any other daemons that may be running on the server), whilst the OS and other software was being upgraded.
I lost my SSH connection to the server (unsuprisingly), and the install was aborted by the server being shut down. After rebooting the server, I have been unbale to log into the server again.
Here is the console output when I attempt to login remotely using the Lish Ajax Console:
XENBUS: Device with no driver: device/console/0
md: Waiting for all devices to be available before autodetect
md: If you don't use raid, use raid=noautodetect
md: Autodetecting RAID arrays.
md: Scanned 0 and added 0 devices.
md: autorun ...
md: ... autorun DONE.
REISERFS warning (device xvda): super-6502 reiserfs_getopt: unknown mount option "nobarrier"
EXT3-fs: barriers not enabled
kjournald starting. Commit interval 5 seconds
EXT3-fs (xvda): mounted filesystem with writeback data mode
VFS: Mounted root (ext3 filesystem) readonly on device 202:0.
devtmpfs: mounted
Freeing unused kernel memory: 668k freed
Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 84k freed
Freeing unused kernel memory: 1356k freed
init: udevtrigger main process (1203) terminated with status 1
init: udevtrigger post-stop process (1205) terminated with status 1
init: udevmonitor main process (1202) killed by TERM signal
I thought there maybe something messed up with the filesystem, so I run fsck. Here is the output:
root@hvc0:~# fsck -fy /dev/xvdb
fsck from util-linux 2.19.1
e2fsck 1.42-WIP (02-Jul-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/xvdb: 111349/1286144 files (6.5% non-contiguous), 1798314/5120000 blocks
root@hvc0:~#
AFAIK, this means that there is nothing wrong with the filesystem - so I have no idea what else to do. My server is currently unreachable, and I can't SSH in to try to rerun the install in case that was the cause of all of this.
Does anyone have any idea on what the issue could be and how I may resolve this and get the server back online?
22 Replies
You can use lish to access the linode directly:
@Guspaz:
You don't need to stop any daemons to do an upgrade; the package manager will handle all that for you.
You can use lish to access the linode directly:
http://library.linode.com/troubleshooti … node-shell">http://library.linode.com/troubleshooting/using-lish-the-linode-shell
Thanks for the link. I managed to login using Lish shell. However, the console output is exactly the same as it was for the AJAX console. i.e. I am logged in as root@hvc0. Normally, I am logged in as myself@linode123 - so I still can't login into my server - or at least, I don't know how to get access to my server (as either one of my accounts on the server or root on that server) - so the server is still down.
sudo dpkg --configure -a
sudo apt-get -f install
sudo apt-get --fix-missing install
sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade
sudo reboot
Source:
This is the console output:
md: ... autorun DONE.
REISERFS warning (device xvda): super-6502 reiserfs_getopt: unknown mount option "nobarrier"
EXT3-fs: barriers not enabled
kjournald starting. Commit interval 5 seconds
EXT3-fs (xvda): mounted filesystem with writeback data mode
VFS: Mounted root (ext3 filesystem) readonly on device 202:0\.
devtmpfs: mounted
Freeing unused kernel memory: 668k freed
Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 84k freed
Freeing unused kernel memory: 1356k freed
init: udevtrigger main process (1203) terminated with status 1
init: udevtrigger post-stop process (1206) terminated with status 1
init: udevmonitor main process (1202) killed by TERM signal
I am surprised at how difficult it is proving to be just to get access to the server after an aborted upgrade…
If you're running a distro/custom kernel you may need to put
# hvc0 - getty
#
# This service maintains a getty on hvc0 from the point the system is
# started until it is shut down again.
start on stopped rc RUNLEVEL=[2345]
stop on runlevel [!2345]
respawn
exec /sbin/getty -8 38400 hvc0
In /etc/init/hvc0.conf
@drpks:
Try:
sudo dpkg --configure -a sudo apt-get -f install sudo apt-get --fix-missing install sudo apt-get update sudo apt-get upgrade sudo apt-get dist-upgrade sudo reboot
Source:
https://answers.launchpad.net/ubuntu/+s … ion/154945">https://answers.launchpad.net/ubuntu/+source/update-manager/+question/154945
Thanks for the info. However, in order to type those command, I first need to login to the server. At the moment, I am not even being presented with a login screen, so I can't login to the server in order to type those commands.
HTH
@obs:
Reboot into rescue mode, mount the drive and then edit it there see here
http://library.linode.com/troubleshooti … escue-mode">http://library.linode.com/troubleshooting/finnix-rescue-mode
Hi, thanks for your input. The problem is that I am not familiar with a lot of the terminology being used. Although I'm a software architect/developer, I have ZERO sysadmin skills, so the statement "mount the drive" unfortunately, dosen't relay much information to me.
I am aware that the 'mount' command is used to mount devices/drives, but thats about it. More specifically, I dont know which drive it is I am supposed to mount or indeed how to find out the devices/drives on my linode.
I did spend a fair bit of time this afternoon (several hours actually) in the finnix rescue mode (the console output is shown in one of my earlier messages). The net result is that when I login the rescue mode, I am logged in as root@hdvc0 (or something similar).
I typed ls, and it reported 0 files. The server has been down since early this morning (over 10 hrs ago) and I am still unable to even log into the server.
To say that I am getting slightly frustrated would be a gross understatement. Having said that, I fully appreciate that you are all trying to help me - of your own volition, so I will try my best not antagonize anyone.
Thank you all for your helpful feedback.
mkdir -p /mnt/rescue
mount /dev/xvda /mnt/rescue
nano /mnt/rescue/etc/init/hvc0.conf
Then paste what I posted and press ctrl+x then y
This will make the directory /mnt/rescue, mount /dev/xvda on /mnt/rescue then edit /etc/init/hvc0.conf in the mounted volume
I know this is an unmanaged service, but I should imagine it's something that support will be familiar with and could help you correct it very quickly.
Thanks for your help. I typed the commands you suggested. However when attempting to mount, the system complained that the device looked like a swap disk and faield to mount. This is correct however, as I remembered that I set /dev/xvda as my swap space. I tried the command then with /dev/xvdb instead (which is where my data resides), and I was able to proceed to open the .conf file with the nano editor.
However, the (hvc0.conf) file is not empty. It contains the following lines:
hvc0 - getty
#
This service maintains a getty on hvc0 from the point the system is
started until it is shut down again.
start on stopped rc RUNLEVEL=[2345]
stop on runlevel [!2345]
respawn
exec /sbin/getty -8 38400 hvc0
I thought it best to ask whether to:
1. Overwrite the contents of the file entirely with the new commands
2. Add the new commands to the BEGINING of the file OR
3. Add the new commands to the END of the file
I look forward to your response, and once again, thanks for your help. At least, now I feel I am making some progress.
@Mr Nod:
Have you raised a ticket with support?
I know this is an unmanaged service, but I should imagine it's something that support will be familiar with and could help you correct it very quickly.
Yes, I raised a support ticket early this morning. Support made a few suggestions (Reboot in Rescue mode and check with fsck). After that failed to resolve the issue, support suggested that I come in here and see if I could get some help from the community.
@obs:
The contents are the same, sounds like you're using a distro/custom kernel, reboot using a Linode kernel see if that boots.
We may be talking at cross purposes here. The 'content' I was about to insert in the .conf file is:
sudo dpkg --configure -a
sudo apt-get -f install
sudo apt-get --fix-missing install
sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade
sudo reboot
Which is what you previously suggested. This is however different from the contents of the hvc0.conf file. Quite clearly, I had misunderstood your previous instruction - I'm glad I decided to double check before going ahead with replacing the file contents.
On the matter of a custum distro, I don't think this is the case. I am running a 64bit version of the Ubuntu 10.0.4 LTS. I made this choice so that it is compatable with my local dev machine, and so that C++ applications I have written locally, can be deployed to run on the server.
Last but not the least, I assume that the statement "reboot using a Linode kernel see if that boots" means to stop the server (running in safe mode) and rebooting normally (since I don't have a custom distro).
I double checked my linode setting configuration, and it seems I may have given you incorrect information about the device mappings. I include a snapshot of my configuration below (hopefully, it helps someone notice why this situation has arisen).
![](
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 10
ip6_tables: (C) 2000-2006 Netfilter Core Team
IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
NET: Registered protocol family 15
Bridge firewalling registered
Ebtables v2.0 registered
Registering the dns_resolver key type
registered taskstats version 1
XENBUS: Device with no driver: device/console/0
md: Waiting for all devices to be available before autodetect
md: If you don't use raid, use raid=noautodetect
md: Autodetecting RAID arrays.
md: Scanned 0 and added 0 devices.
md: autorun ...
md: ... autorun DONE.
REISERFS warning (device xvda): super-6502 reiserfs_getopt: unknown mount option "nobarrier"
EXT3-fs: barriers not enabled
kjournald starting. Commit interval 5 seconds
EXT3-fs (xvda): mounted filesystem with writeback data mode
VFS: Mounted root (ext3 filesystem) readonly on device 202:0\.
devtmpfs: mounted
Freeing unused kernel memory: 668k freed
Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 84k freed
Freeing unused kernel memory: 1356k freed
init: udevtrigger main process (1203) terminated with status 1
init: udevtrigger post-stop process (1204) terminated with status 1
init: udevmonitor main process (1202) killed by TERM signal
SOMEONE must know what that means SURELY? especially the last three messages relating to terminated processes?
What can be causing the processes to be terminated during a login - that dosen't look right (even to a neophyte like me)
@morpheous:
I am surprised at how difficult it is proving to be just to get access to the server after an aborted upgrade…
The upgrade was interupted while it was making changes to your filesystem, so your server was left in an undefined state and now it doesn't boot – hence no access.
udev, the device manager that creates the device nodes in /dev, is hosed. Try this:
Boot into rescue mode, then run:
mkdir -p /mnt/rescue
mount /dev/xvda /mnt/rescue
chroot /mnt/rescue /bin/bash
dpkg --configure -a
Reboot the server normally.
@morpheous:
@obs:The contents are the same, sounds like you're using a distro/custom kernel, reboot using a Linode kernel see if that boots.
We may be talking at cross purposes here. The 'content' I was about to insert in the .conf file is:
sudo dpkg --configure -a sudo apt-get -f install sudo apt-get --fix-missing install sudo apt-get update sudo apt-get upgrade sudo apt-get dist-upgrade sudo reboot
I didn't post that morpheous did your hvc0.conf file is exactly how it should be.
You're running an old linode kernel but a linode one none the less (you should switch to 3.2 when this is all fixed)
Also you should create a backup plan for the future just in case something like this happens again.
@pclissold:
@morpheous:I am surprised at how difficult it is proving to be just to get access to the server after an aborted upgrade…
The upgrade was interupted while it was making changes to your filesystem, so your server was left in an undefined state and now it doesn't boot – hence no access.udev, the device manager that creates the device nodes in /dev, is hosed. Try this:
Boot into rescue mode, then run:
mkdir -p /mnt/rescue mount /dev/xvda /mnt/rescue chroot /mnt/rescue /bin/bash dpkg --configure -a
Reboot the server normally.
Peter, I am forever indebted to you!. Thanks for your clear, simple and straight forward explanation and solution. The server is up and running again, I am beside myself with joy, and have gained more fear (awe?) of the mysterious inner workings of the Linux OS and its sysadmins!
Thank you so much for helping me out of the hole I inadvertently dug myself into!.
@morpheous:
Peter, I am forever indebted to you!
:D I'm pleased that you're fixed. You should run (most of) the rest of the code from drpks' solution to finish cleaning up the mess:
sudo apt-get -f install
sudo apt-get --fix-missing install
sudo apt-get update
sudo apt-get upgrade
Reboot again. (Don't issue reboots from within your Linode – they don't work as expected because there is code on the host that doesn't get invoked unless you reboot from the control panel or Lish.)
Save that trick in your toolbox -- chrooting into a busted system is one of the most powerful ways of fixing things I know.
@morpheous:
Thank you so much for helping me out of the hole I inadvertently dug myself into!.
It's good to hear that you were able to recover. You may find that some additional preparation eases the risk in future upgrades, since there's always a risk of failure notwithstanding your "pulling the plug", so to speak, in this case :-)
For example, if you have enough spare disk space in your Linode, you can clone your system image before attempting something like the upgrade, so worst case you just revert and start over. If you don't have enough space, you can clone to a temporary Linode to act as backup during the process. In fact, before doing significant upgrades (even application ones), I'll sometimes clone to a new Linode and experiment with the upgrade on that Linode (while the production system keeps running) just to feel good about the process.
If you have backups enabled on your Linode, you can also use snapshots for this purpose, or a snapshot/restore to set up the clone Linode for testing without needing any downtime on the production system.
I'd probably also run any large system upgrade beneath "screen" just in case you lose your network connection, there won't be any risk of blocking or interrupting the process.
– David
@db3l:
For example, if you have enough spare disk space in your Linode, you can clone your system image before attempting something like the upgrade, so worst case you just revert and start over. If you don't have enough space, you can clone to a temporary Linode to act as backup during the process. In fact, before doing significant upgrades (even application ones), I'll sometimes clone to a new Linode and experiment with the upgrade on that Linode (while the production system keeps running) just to feel good about the process.
If you have backups enabled on your Linode, you can also use snapshots for this purpose, or a snapshot/restore to set up the clone Linode for testing without needing any downtime on the production system.
I'd probably also run any large system upgrade beneath "screen" just in case you lose your network connection, there won't be any risk of blocking or interrupting the process.
That sounds like a smart thing to do. I don't as yet have backup installed, but I do have some spare disk space and I very much like the idea of cloning a system or even cloning a linode to act as a backup during an install or upgrade.
I remember investigating cloning a while back, but it seemed quite complicated/convoluted to me at the time. Could you please point me to some docs or a gentle tutorial that shows how I may do that. I suspect that I may be more "motivated" to learn how to do that after the debacle today!.
@morpheous:
I remember investigating cloning a while back, but it seemed quite complicated/convoluted to me at the time. Could you please point me to some docs or a gentle tutorial that shows how I may do that. I suspect that I may be more "motivated" to learn how to do that after the debacle today!.
I don't know if there's any documentation in the library, but it's awfully simple. If you have the disk space, just give it a shot.
(edit: There are some documents in the library - try
If you just want to duplicate a disk image locally, click on the image and click the duplicate button.
To clone part or all of a Linode to another Linode, use the Clone tab, then select a profile (which includes the associated disk images) or selected images. You can select what other Linode to duplicate to.
Neither operation is destructive to the original profile/disk image, so safe to experiment with. You should, however, have your Linode shut down while doing the duplication/clone to ensure the integrity of the copy (something that isn't as necessary with a backup snapshot since that uses filesystem snapshots, though you may still need to worry about things such as databases).
-- David