Server no longer available after abandoned upgrade

Question

Server no longer available after abandoned upgrade

general

I am running a headless install of Ubuntu 10.0.4 LTS on a linode. I was carrying out an OS upgrade earlier on today. I SSH'd into the sever and typed the following commands:

sudo aptitude update

sudo aptitude safe-upgrade

Whilst the second command (sudo aptitude safe-upgrade) was running, I (foolishly) decided to shut the server down (to bring down Apache and any other daemons that may be running on the server), whilst the OS and other software was being upgraded.

I lost my SSH connection to the server (unsuprisingly), and the install was aborted by the server being shut down. After rebooting the server, I have been unbale to log into the server again.

Here is the console output when I attempt to login remotely using the Lish Ajax Console:

XENBUS: Device with no driver: device/console/0
md: Waiting for all devices to be available before autodetect
md: If you don't use raid, use raid=noautodetect
md: Autodetecting RAID arrays.
md: Scanned 0 and added 0 devices.
md: autorun ...
md: ... autorun DONE.
REISERFS warning (device xvda): super-6502 reiserfs_getopt: unknown mount option "nobarrier"
EXT3-fs: barriers not enabled
kjournald starting. Commit interval 5 seconds
EXT3-fs (xvda): mounted filesystem with writeback data mode
VFS: Mounted root (ext3 filesystem) readonly on device 202:0.
devtmpfs: mounted
Freeing unused kernel memory: 668k freed
Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 84k freed
Freeing unused kernel memory: 1356k freed
init: udevtrigger main process (1203) terminated with status 1
init: udevtrigger post-stop process (1205) terminated with status 1
init: udevmonitor main process (1202) killed by TERM signal

I thought there maybe something messed up with the filesystem, so I run fsck. Here is the output:

root@hvc0:~# fsck -fy /dev/xvdb
fsck from util-linux 2.19.1
e2fsck 1.42-WIP (02-Jul-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/xvdb: 111349/1286144 files (6.5% non-contiguous), 1798314/5120000 blocks
root@hvc0:~#

AFAIK, this means that there is nothing wrong with the filesystem - so I have no idea what else to do. My server is currently unreachable, and I can't SSH in to try to rerun the install in case that was the cause of all of this.

Does anyone have any idea on what the issue could be and how I may resolve this and get the server back online?

22 Replies

forum:Guspaz · Answer 1 · Jan. 25, 2012, 12:36 p.m.

forum:Guspaz 12 years, 10 months ago

You don't need to stop any daemons to do an upgrade; the package manager will handle all that for you.

You can use lish to access the linode directly:

http://library.linode.com/troubleshooti … node-shell">http://library.linode.com/troubleshooting/using-lish-the-linode-shell

forum:morpheous · Answer 2 · Jan. 25, 2012, 12:57 p.m.

forum:morpheous 12 years, 10 months ago

~~@Guspaz:~~

You don't need to stop any daemons to do an upgrade; the package manager will handle all that for you.

You can use lish to access the linode directly:

http://library.linode.com/troubleshooti … node-shell">http://library.linode.com/troubleshooting/using-lish-the-linode-shell

Thanks for the link. I managed to login using Lish shell. However, the console output is exactly the same as it was for the AJAX console. i.e. I am logged in as root@hvc0. Normally, I am logged in as myself@linode123 - so I still can't login into my server - or at least, I don't know how to get access to my server (as either one of my accounts on the server or root on that server) - so the server is still down.

forum:drpks · Answer 3 · Jan. 25, 2012, 1:13 p.m.

forum:drpks 12 years, 10 months ago

Try:

sudo dpkg --configure -a
sudo apt-get -f install
sudo apt-get --fix-missing install
sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade
sudo reboot

Source: https://answers.launchpad.net/ubuntu/+s … ion/154945">https://answers.launchpad.net/ubuntu/+source/update-manager/+question/154945

forum:morpheous · Answer 4 · Jan. 25, 2012, 2:21 p.m.

forum:morpheous 12 years, 10 months ago

Just to clarify, when I attempt to access the server, I don't even get a login prompt, so I CAN'T LOGIN TO THE SERVER.

This is the console output:

md: ... autorun DONE.                                                                               
REISERFS warning (device xvda): super-6502 reiserfs_getopt: unknown mount option "nobarrier"        
EXT3-fs: barriers not enabled                                                                       
kjournald starting.  Commit interval 5 seconds                                                      
EXT3-fs (xvda): mounted filesystem with writeback data mode                                         
VFS: Mounted root (ext3 filesystem) readonly on device 202:0\.                                       
devtmpfs: mounted                                                                                   
Freeing unused kernel memory: 668k freed                                                            
Write protecting the kernel read-only data: 10240k                                                  
Freeing unused kernel memory: 84k freed                                                             
Freeing unused kernel memory: 1356k freed                                                           
init: udevtrigger main process (1203) terminated with status 1                                      
init: udevtrigger post-stop process (1206) terminated with status 1                                 
init: udevmonitor main process (1202) killed by TERM signal

I am surprised at how difficult it is proving to be just to get access to the server after an aborted upgrade…

forum:obs · Answer 5 · Jan. 25, 2012, 2:29 p.m.

forum:obs 12 years, 10 months ago

Are you running a linode kernel or a distro/custom kernel?

If you're running a distro/custom kernel you may need to put

# hvc0 - getty
#
# This service maintains a getty on hvc0 from the point the system is
# started until it is shut down again.

start on stopped rc RUNLEVEL=[2345]
stop on runlevel [!2345]

respawn
exec /sbin/getty -8 38400 hvc0

In /etc/init/hvc0.conf

forum:morpheous · Answer 6 · Jan. 25, 2012, 2:30 p.m.

forum:morpheous 12 years, 10 months ago

~~@drpks:~~

Try:
sudo dpkg --configure -a
sudo apt-get -f install
sudo apt-get --fix-missing install
sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade
sudo reboot
Source: https://answers.launchpad.net/ubuntu/+s … ion/154945">https://answers.launchpad.net/ubuntu/+source/update-manager/+question/154945

Thanks for the info. However, in order to type those command, I first need to login to the server. At the moment, I am not even being presented with a login screen, so I can't login to the server in order to type those commands.

HTH

forum:obs · Answer 7 · Jan. 25, 2012, 2:33 p.m.

forum:obs 12 years, 10 months ago

Reboot into rescue mode, mount the drive and then edit it there see here http://library.linode.com/troubleshooti … escue-mode">http://library.linode.com/troubleshooting/finnix-rescue-mode

forum:morpheous · Answer 8 · Jan. 25, 2012, 2:59 p.m.

forum:morpheous 12 years, 10 months ago

~~@obs:~~

Reboot into rescue mode, mount the drive and then edit it there see here http://library.linode.com/troubleshooti … escue-mode">http://library.linode.com/troubleshooting/finnix-rescue-mode

Hi, thanks for your input. The problem is that I am not familiar with a lot of the terminology being used. Although I'm a software architect/developer, I have ZERO sysadmin skills, so the statement "mount the drive" unfortunately, dosen't relay much information to me.

I am aware that the 'mount' command is used to mount devices/drives, but thats about it. More specifically, I dont know which drive it is I am supposed to mount or indeed how to find out the devices/drives on my linode.

I did spend a fair bit of time this afternoon (several hours actually) in the finnix rescue mode (the console output is shown in one of my earlier messages). The net result is that when I login the rescue mode, I am logged in as root@hdvc0 (or something similar).

I typed ls, and it reported 0 files. The server has been down since early this morning (over 10 hrs ago) and I am still unable to even log into the server.

To say that I am getting slightly frustrated would be a gross understatement. Having said that, I fully appreciate that you are all trying to help me - of your own volition, so I will try my best not antagonize anyone.

Thank you all for your helpful feedback.

forum:obs · Answer 9 · Jan. 25, 2012, 3:23 p.m.

forum:obs 12 years, 10 months ago

I assume you're using xvda for your root drive you probably are, boot into rescue mode and type

mkdir -p /mnt/rescue
mount /dev/xvda /mnt/rescue
nano /mnt/rescue/etc/init/hvc0.conf

Then paste what I posted and press ctrl+x then y

This will make the directory /mnt/rescue, mount /dev/xvda on /mnt/rescue then edit /etc/init/hvc0.conf in the mounted volume

forum:Mr Nod · Answer 10 · Jan. 25, 2012, 3:45 p.m.

forum:Mr Nod 12 years, 10 months ago

Have you raised a ticket with support?

I know this is an unmanaged service, but I should imagine it's something that support will be familiar with and could help you correct it very quickly.

forum:morpheous · Answer 11 · Jan. 25, 2012, 4:12 p.m.

forum:morpheous 12 years, 10 months ago

Hi obs,

Thanks for your help. I typed the commands you suggested. However when attempting to mount, the system complained that the device looked like a swap disk and faield to mount. This is correct however, as I remembered that I set /dev/xvda as my swap space. I tried the command then with /dev/xvdb instead (which is where my data resides), and I was able to proceed to open the .conf file with the nano editor.

However, the (hvc0.conf) file is not empty. It contains the following lines:

hvc0 - getty

#

This service maintains a getty on hvc0 from the point the system is

started until it is shut down again.

start on stopped rc RUNLEVEL=[2345]

stop on runlevel [!2345]

respawn

exec /sbin/getty -8 38400 hvc0

I thought it best to ask whether to:

1. Overwrite the contents of the file entirely with the new commands

2. Add the new commands to the BEGINING of the file OR

3. Add the new commands to the END of the file

I look forward to your response, and once again, thanks for your help. At least, now I feel I am making some progress.

forum:morpheous · Answer 12 · Jan. 25, 2012, 4:16 p.m.

forum:morpheous 12 years, 10 months ago

~~@Mr Nod:~~

Have you raised a ticket with support?

I know this is an unmanaged service, but I should imagine it's something that support will be familiar with and could help you correct it very quickly.

Yes, I raised a support ticket early this morning. Support made a few suggestions (Reboot in Rescue mode and check with fsck). After that failed to resolve the issue, support suggested that I come in here and see if I could get some help from the community.

forum:obs · Answer 13 · Jan. 25, 2012, 4:17 p.m.

forum:obs 12 years, 10 months ago

The contents are the same, sounds like you're using a distro/custom kernel, reboot using a Linode kernel see if that boots.

forum:morpheous · Answer 14 · Jan. 25, 2012, 4:46 p.m.

forum:morpheous 12 years, 10 months ago

~~@obs:~~

The contents are the same, sounds like you're using a distro/custom kernel, reboot using a Linode kernel see if that boots.

We may be talking at cross purposes here. The 'content' I was about to insert in the .conf file is:

sudo dpkg --configure -a
sudo apt-get -f install
sudo apt-get --fix-missing install
sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade
sudo reboot

Which is what you previously suggested. This is however different from the contents of the hvc0.conf file. Quite clearly, I had misunderstood your previous instruction - I'm glad I decided to double check before going ahead with replacing the file contents.

On the matter of a custum distro, I don't think this is the case. I am running a 64bit version of the Ubuntu 10.0.4 LTS. I made this choice so that it is compatable with my local dev machine, and so that C++ applications I have written locally, can be deployed to run on the server.

Last but not the least, I assume that the statement "reboot using a Linode kernel see if that boots" means to stop the server (running in safe mode) and rebooting normally (since I don't have a custom distro).

I double checked my linode setting configuration, and it seems I may have given you incorrect information about the device mappings. I include a snapshot of my configuration below (hopefully, it helps someone notice why this situation has arisen).

~~![](" />~~

forum:morpheous · Answer 15 · Jan. 25, 2012, 4:52 p.m.

forum:morpheous 12 years, 10 months ago

This is the console response when I try a Lish Ajax console login:

TCP cubic registered                                                                                
Initializing XFRM netlink socket                                                                    
NET: Registered protocol family 10                                                                  
ip6_tables: (C) 2000-2006 Netfilter Core Team                                                       
IPv6 over IPv4 tunneling driver                                                                     
NET: Registered protocol family 17                                                                  
NET: Registered protocol family 15                                                                  
Bridge firewalling registered                                                                       
Ebtables v2.0 registered                                                                            
Registering the dns_resolver key type                                                               
registered taskstats version 1                                                                      
XENBUS: Device with no driver: device/console/0                                                     
md: Waiting for all devices to be available before autodetect                                       
md: If you don't use raid, use raid=noautodetect                                                    
md: Autodetecting RAID arrays.                                                                      
md: Scanned 0 and added 0 devices.                                                                  
md: autorun ...                                                                                     
md: ... autorun DONE.                                                                               
REISERFS warning (device xvda): super-6502 reiserfs_getopt: unknown mount option "nobarrier"        
EXT3-fs: barriers not enabled                                                                       
kjournald starting.  Commit interval 5 seconds                                                      
EXT3-fs (xvda): mounted filesystem with writeback data mode                                         
VFS: Mounted root (ext3 filesystem) readonly on device 202:0\.                                       
devtmpfs: mounted                                                                                   
Freeing unused kernel memory: 668k freed                                                            
Write protecting the kernel read-only data: 10240k                                                  
Freeing unused kernel memory: 84k freed                                                             
Freeing unused kernel memory: 1356k freed                                                           
init: udevtrigger main process (1203) terminated with status 1                                      
init: udevtrigger post-stop process (1204) terminated with status 1                                 
init: udevmonitor main process (1202) killed by TERM signal

SOMEONE must know what that means SURELY? especially the last three messages relating to terminated processes?

What can be causing the processes to be terminated during a login - that dosen't look right (even to a neophyte like me)

forum:pclissold · Answer 16 · Jan. 25, 2012, 6:26 p.m.

forum:pclissold 12 years, 10 months ago

~~@morpheous:~~

I am surprised at how difficult it is proving to be just to get access to the server after an aborted upgrade…
The upgrade was interupted while it was making changes to your filesystem, so your server was left in an undefined state and now it doesn't boot – hence no access.

udev, the device manager that creates the device nodes in /dev, is hosed. Try this:

Boot into rescue mode, then run:

mkdir -p /mnt/rescue
mount /dev/xvda /mnt/rescue
chroot /mnt/rescue /bin/bash
dpkg --configure -a

Reboot the server normally.

forum:obs · Answer 17 · Jan. 25, 2012, 6:49 p.m.

forum:obs 12 years, 10 months ago

~~@morpheous:~~

~~@obs:~~

The contents are the same, sounds like you're using a distro/custom kernel, reboot using a Linode kernel see if that boots.

We may be talking at cross purposes here. The 'content' I was about to insert in the .conf file is:
sudo dpkg --configure -a
sudo apt-get -f install
sudo apt-get --fix-missing install
sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade
sudo reboot

I didn't post that morpheous did your hvc0.conf file is exactly how it should be.

You're running an old linode kernel but a linode one none the less (you should switch to 3.2 when this is all fixed)

Also you should create a backup plan for the future just in case something like this happens again.

forum:morpheous · Answer 18 · Jan. 25, 2012, 7:03 p.m.

forum:morpheous 12 years, 10 months ago

~~@pclissold:~~

~~@morpheous:~~

I am surprised at how difficult it is proving to be just to get access to the server after an aborted upgrade…
The upgrade was interupted while it was making changes to your filesystem, so your server was left in an undefined state and now it doesn't boot – hence no access.

udev, the device manager that creates the device nodes in /dev, is hosed. Try this:

Boot into rescue mode, then run:
mkdir -p /mnt/rescue
mount /dev/xvda /mnt/rescue
chroot /mnt/rescue /bin/bash
dpkg --configure -a
Reboot the server normally.

Peter, I am forever indebted to you!. Thanks for your clear, simple and straight forward explanation and solution. The server is up and running again, I am beside myself with joy, and have gained more fear (awe?) of the mysterious inner workings of the Linux OS and its sysadmins!

Thank you so much for helping me out of the hole I inadvertently dug myself into!.

forum:pclissold · Answer 19 · Jan. 25, 2012, 7:14 p.m.

forum:pclissold 12 years, 10 months ago

~~@morpheous:~~

Peter, I am forever indebted to you!
:D I'm pleased that you're fixed. You should run (most of) the rest of the code from drpks' solution to finish cleaning up the mess:

sudo apt-get -f install
sudo apt-get --fix-missing install
sudo apt-get update
sudo apt-get upgrade

Reboot again. (Don't issue reboots from within your Linode – they don't work as expected because there is code on the host that doesn't get invoked unless you reboot from the control panel or Lish.)

Save that trick in your toolbox -- chrooting into a busted system is one of the most powerful ways of fixing things I know.

forum:db3l · Answer 20 · Jan. 25, 2012, 7:26 p.m.

forum:db3l 12 years, 10 months ago

~~@morpheous:~~

Thank you so much for helping me out of the hole I inadvertently dug myself into!.
It's good to hear that you were able to recover. You may find that some additional preparation eases the risk in future upgrades, since there's always a risk of failure notwithstanding your "pulling the plug", so to speak, in this case :-)

For example, if you have enough spare disk space in your Linode, you can clone your system image before attempting something like the upgrade, so worst case you just revert and start over. If you don't have enough space, you can clone to a temporary Linode to act as backup during the process. In fact, before doing significant upgrades (even application ones), I'll sometimes clone to a new Linode and experiment with the upgrade on that Linode (while the production system keeps running) just to feel good about the process.

If you have backups enabled on your Linode, you can also use snapshots for this purpose, or a snapshot/restore to set up the clone Linode for testing without needing any downtime on the production system.

I'd probably also run any large system upgrade beneath "screen" just in case you lose your network connection, there won't be any risk of blocking or interrupting the process.

– David

forum:morpheous · Answer 21 · Jan. 25, 2012, 9:29 p.m.

forum:morpheous 12 years, 10 months ago

~~@db3l:~~

For example, if you have enough spare disk space in your Linode, you can clone your system image before attempting something like the upgrade, so worst case you just revert and start over. If you don't have enough space, you can clone to a temporary Linode to act as backup during the process. In fact, before doing significant upgrades (even application ones), I'll sometimes clone to a new Linode and experiment with the upgrade on that Linode (while the production system keeps running) just to feel good about the process.

If you have backups enabled on your Linode, you can also use snapshots for this purpose, or a snapshot/restore to set up the clone Linode for testing without needing any downtime on the production system.

I'd probably also run any large system upgrade beneath "screen" just in case you lose your network connection, there won't be any risk of blocking or interrupting the process.

That sounds like a smart thing to do. I don't as yet have backup installed, but I do have some spare disk space and I very much like the idea of cloning a system or even cloning a linode to act as a backup during an install or upgrade.

I remember investigating cloning a while back, but it seemed quite complicated/convoluted to me at the time. Could you please point me to some docs or a gentle tutorial that shows how I may do that. I suspect that I may be more "motivated" to learn how to do that after the debacle today!.

forum:db3l · Answer 22 · Jan. 25, 2012, 10:03 p.m.

forum:db3l 12 years, 10 months ago

~~@morpheous:~~

I remember investigating cloning a while back, but it seemed quite complicated/convoluted to me at the time. Could you please point me to some docs or a gentle tutorial that shows how I may do that. I suspect that I may be more "motivated" to learn how to do that after the debacle today!.
I don't know if there's any documentation in the library, but it's awfully simple. If you have the disk space, just give it a shot.

(edit: There are some documents in the library - try http://library.linode.com/linode-platfo … isk-images">http://library.linode.com/linode-platform/manager/managing-disk-images and http://library.linode.com/linode-platfo … one-linode">http://library.linode.com/linode-platform/manager/clone-linode)

If you just want to duplicate a disk image locally, click on the image and click the duplicate button.

To clone part or all of a Linode to another Linode, use the Clone tab, then select a profile (which includes the associated disk images) or selected images. You can select what other Linode to duplicate to.

Neither operation is destructive to the original profile/disk image, so safe to experiment with. You should, however, have your Linode shut down while doing the duplication/clone to ensure the integrity of the copy (something that isn't as necessary with a backup snapshot since that uses filesystem snapshots, though you may still need to worry about things such as databases).

-- David

Compute

Storage

Networking

Databases

Services

Developer Tools

Industries

Pricing

Community

Engage With Us

Server no longer available after abandoned upgrade

22 Replies

hvc0 - getty

This service maintains a getty on hvc0 from the point the system is

started until it is shut down again.

Reply

Tips: