AlmaLinux 8.6 - GRUB 2 boot failures and resolution

Linode Staff

To give context: Was running cPanel on CentOS 7. With the sunsetting of CentOS for this usecase, I decided to migrate (via the cPanel fork of the ELevate tool) to AlmaLinux 8.6. The migration worked relatively well, except the Linode GRUB 2 boot situation really threw a wrench into it (issue reported).

I thought that all was well and resolved a bit over a month ago and the system was booting fine after modifying the GRUB config:

$ nano /etc/default/grub
### set: GRUB_ENABLE_BLSCFG=false
$ grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.18.0-372.9.1.el8.x86_64
Found initrd image: /boot/initramfs-4.18.0-372.9.1.el8.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-4f09fa5fdd3642fa85221d7c11370603
Found initrd image: /boot/initramfs-0-rescue-4f09fa5fdd3642fa85221d7c11370603.img
done

Yesterday I figured it was time to check things out after having a nice stable month of operation. I created a backup snapshot of the system, jumped in and issued yum update. Thinking that all was well, I rebooted the server.

Upon reboot the server would not start. LISH was giving the same results prior to my last fix. I issued the commands that allowed me to boot before via LISH with GRUB 2:

set root=(hd0)
linux /boot/vmlinuz-4.18.0-372.9.1.el8.x86_64 root=/dev/sda ro crashkernel=auto rhgb console=ttyS0,19200n8 net.ifnames=0
initrd /boot/initramfs-4.18.0-372.9.1.el8.x86_64.img
boot

The kernel was still present, but the system would not boot and responded with dracut.

This alarmed me and I didn't put much thought into the situation since I had not planned for a prolonged downtime and troubleshooting stint. I proceeded to issue a recovery / reload of the last saved snapshot backup state.

After restoration was finished I booted up… but the system would not boot. Again, this was very alarming since I had "fixed it" back over a month prior and it was booting at that time.

Just for sanity, I went ahead and issued the above commands to GRUB 2 and it proceeded to boot after SELinux had its way with things…

When I got into the system I checked out /etc/default/grub to see if anything had changed and lo and behold, blasted GRUB_ENABLE_BLSCFG was back to true(!):

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL="serial"
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb console=ttyS0,19200n8 net.ifnames=0"
GRUB_DISABLE_RECOVERY="true"
GRUB_DISABLE_OS_PROBER=true
GRUB_SERIAL_COMMAND="serial --speed=19200 --unit=0 --word=8 --parity=no --stop=1"
GRUB_DISABLE_LINUX_UUID=true
GRUB_GFXMODE=1024x768x32
GRUB_GFXPAYLOAD_LINUX=text
GRUB_ENABLE_BLSCFG=true

Of course this was immensely frustrating and upon checking the change date of the file, I noticed that it had - SOMEHOW - been modified the date AFTER I had made changes and the system was working/booting sans trouble.

Does anyone have any idea what may have provoked such a change?

It seems as if others have seen similar problems perhaps with RockyLinux: https://serverfault.com/questions/1079875/how-to-prevent-changes-of-grub-enable-blscfg-in-etc-default-grub

Clearly something that is not obviously apparent is happening with GRUB 2 and/or in other scenarios.

So:

A) I do not know if the problem is that Linode's GRUB 2 boot system (Linode Web control panel ? specific Linode ? Configurations ? configuration in question: Edit ? Boot Settings: Select a Kernel: set to GRUB 2) is dynamically and independently making changes to this config file? Or is the Linux distribution (operating system) making the change (which would mean it's happening in any CentOS successor system like Alma or Rocky)…?

And:
B) Furthermore, should I and/or is there a way to remove Linode's GRUB 2 boot system from being involved here? I thought I could just change over to Direct Disk, but LISH doesn't even get to any loading screen and fails to function itself altogether.

1 Reply

I'm having the same issues on a number of servers, and it seems to affect things like CSF too, but maybe that's Kernel dependant.

Talking to Linode/Cpanel about a fix, but they are both brushing it off as my problem.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct