Reboot: host56 (graceful)

Linode Staff

The xen beta box is going to be rebooted in a few. Under heavy load (a few migrations, a deployment, and a resize), it looks like it triggered CONFIGDETECTSOFTLOCKUP and created a few zombie domains, preventing people from booting. I'm going to grab the latest Xen updates, turn that off, update the host kernel and reboot.

Those that are still up and running should see a graceful shutdown in a few minutes…

-Chris

16 Replies

Great to hear it was what amounts to an instrumentation problem and not a real failure.

Xen seems to be working out really well.

Any idea when the nodes will come back up? :-)

…oh. It's in the queue. Nevermind!

About half have been booted already, and it's working its way through the rest.

-Chris

It doesn't seem to have worked…

xen_linode_boot: failed to get domid
xen_linode_boot: warning - li-network might not have ran

Yeah, I've seen that one as well. Issue another reboot and it should work.

-Chris

My Linode reported the same error after the host initiated restart:

xen_linode_boot: failed to get domid
xen_linode_boot: warning - li-network might not have ran

LPM showed it as 'Powered off'. (I'm at work, where ssh is only allowed to predefined hosts - not including my Linode - so I guess it was off but I'm not totally sure).

I issued a boot command and got the same error message and Linode still shown as 'Powered off'.

A second boot command again gave the error messages but the Linode was shown by LPM as 'Running'.

A reboot command produced a successful shutdown followed by a boot with error messages and a 'Powered off' Linode.

Another boot command and it came up without error messages and LPM shows 'Running'.

The Linode is attempting to boot into a vanilla Debian 3.1 distro, so I don't think it's a problem with the system.

@caker:

The xen beta box is going to be rebooted in a few. Under heavy load (a few migrations, a deployment, and a resize), it looks like it triggered CONFIGDETECTSOFTLOCKUP and created a few zombie domains, preventing people from booting. I'm going to grab the latest Xen updates, turn that off, update the host kernel and reboot.

Is host56 experiencing more problems today, or has anyone else noticed anything wrong? For at least the past 2 hours, the performance has been absolutely horrible.

Yes, it's been struggling… Maybe Caker's migrating some folks to the new Xen box right now.

We found another bug in Xen. It looks related to what we hit last night. I've got an email thread going on the xen-devel mailing list.

If you want to be un-migrated, please open a support ticket and specify if we can just "reset" you back to the host you were previously on without moving the disks, or if you need your disk images moved.

-Chris

Is there a forecast for when things will be better? Are we having to wait for the Xen developers to fix something, or can we go back to the state where things were working fine?

And if we do choose to move our disk images, will that happen at a reasonable speed, or will it be subject to the same slowdown?

If there's a chance that rebooting the host will make things better, I'd say let's try it. It's not doing me a whole lot of good as is…

It'll take at least another reboot for those people that can't boot currently.

I've already suspended pending migrations to the box, so anyone with a migration pending, you'll need to hold off for now.

Things seemed to work fine until a certain threshold of number of linodes on the machine was hit. If we can get a few people off the machine, I think we'll be ok while this gets resolved.

To answer your question re speed of migrating off … I honestly don't know at this point. The disk performance might be being masked by this bug in Xen, since a few of us were able to totally thrash the box without any other domains even noticing. I've also been able to get easily 60M/sec reads, so something weird is going on.

If you're just worried about performance, check back in about 10 minutes. There's one final migration that was currently underway when this happeneed, that's about to finish…

-Chris

@caker:

If you want to be un-migrated, please open a support ticket and specify if we can just "reset" you back to the host you were previously on without moving the disks, or if you need your disk images moved.

I moved from a Dallas host to the Xen host. Is there any availability in Fremont? (I've trying to avoid an IP change)

Thanks!

@egatenby:

I moved from a Dallas host to the Xen host. Is there any availability in Fremont? (I've trying to avoid an IP change)

Yes, that would be best – it would involve migrating your disk images again (no big deal).

Send me a new ticket with this request for tracking…

-Chris

I'm seeing this error message right now.

> xenlinodeboot: failed to get domid

xenlinodeboot: warning - li-network might not have ran

I'm on host56.

By the way, I had scheduled an attempt to upgrade Fedora Core 3 to Fedora Core 4 in my Linode today. But it seems this host is under load today. The upgrade process is a little resource intensive… When would be a good time to try this again?

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct