Grub Fallback: Boot good kernel if new one crashes

It’s hard to believe but I didn’t know about Grub fallback feature. So every time when I needed to reboot remote server into a new kernel I had to test it on local server to make sure it won’t panic on remote unit. And if kernel panic still happened I had to ask somebody who has physical access to the server to reboot the hardware choose proper kernel in Grub. It’s all boring and not healthful – it’s much better to use Grub’s native fallback feature.

Grub is default boot loader in most Linux distributions today, at least major distros like Centos/Fedora/RedHat, Debian/Ubuntu/Mint, Arch use Grub. This makes it possible to use Grub fallback feature just out of the box. Here is example scenario.

There is remote server hosted in New Zealand and you (sitting in Denmark) have access to it over the network only (no console server). In this case you cannot afford that the new kernel makes server unreachable, e.g. if new kernel crash during boot it won’t load network interface drivers so your Linux box won’t appear online until somebody reboots it into workable kernel. Thankfully Grub can be configured to try loading new kernel once and if it fails Grub will load another kernel according to configuration. You can see my example grub.conf below:

default=saved
timeout=5
splashimage=(hd0,1)/boot/grub/splash.xpm.gz
hiddenmenu
fallback 0 1
title Fedora OpenVZ (2.6.32-042stab053.5)
        root (hd0,1)
        kernel /boot/vmlinuz-2.6.32-042stab053.5 ro root=UUID=6fbdddf9-307c-49eb-83f5-ca1a4a63f584 rd_MD_UUID=1b9dc11a:d5a084b5:83f6d993:3366bbe4 rd_NO_LUKS rd_NO_LVM rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=sv-latin1 rhgb quiet crashkernel=auto
        initrd /boot/initramfs-2.6.32-042stab053.5.img
        savedefault fallback
title Fedora (2.6.35.12-88.fc14.i686)
        root (hd0,1)
        kernel /boot/vmlinuz-2.6.35.12-88.fc14.i686 ro root=UUID=6fbdddf9-307c-49eb-83f5-ca1a4a63f584 rd_MD_UUID=1b9dc11a:d5a084b5:83f6d993:3366bbe4 rd_NO_LUKS rd_NO_LVM rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=sv-latin1 rhgb quiet
        initrd /boot/initramfs-2.6.35.12-88.fc14.i686.img
        savedefault fallback

According to this configuration Grub will try to load ‘Fedora OpenVZ’ kernel once and if it fails system will be loaded into good ‘Fedora’ kernel. If ‘Fedora OpenVZ’ loads well you’ll be able to reach the server over the network after reboot. Notice lines ‘default=saved’ and ‘savedefault fallback’ which are mandatory to make fallback feature working.

Alternative way

I’ve heard that official Grub fallback feature may work incorrectly on RHEL5 (and Centos 5) so there is elegant workaround (found here):

1. Add param ‘panic=5′ to your new kernel line so it looks like below:

title Fedora OpenVZ (2.6.32-042stab053.5)
        root (hd0,1)
        kernel /boot/vmlinuz-2.6.32-042stab053.5 ro root=UUID=6fbdddf9-307c-49eb-83f5-ca1a4a63f584 rd_MD_UUID=1b9dc11a:d5a084b5:83f6d993:3366bbe4 rd_NO_LUKS rd_NO_LVM rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=sv-latin1 rhgb quiet crashkernel=auto panic=5
        initrd /boot/initramfs-2.6.32-042stab053.5.img

This param will make crashed kernel to reboot itself in 5 seconds.

2. Point default Grub param to good kernel, e.g. ‘default=0′.

3. Type in the following commands (good kernel appears in grub.conf first and new kernel is second one):

# grub
 
grub> savedefault --default=1 --once
savedefault --default=1 --once
grub> quit

This will make Grub to boot into new kernel once and if it fails it will load good kernel. Now you can reboot the server and make sure it will 100% appear online in a few minutes. I usually prefer native Grub fallback feature but if you see it doesn’t work for you it makes sense to try above mentioned workaround.

Why Mosh is better than SSH?

Mosh screenshot

Mosh (stands for Mobile Shell) is replacement of SSH for remote connections to Unix/Linux systems. It brings a few noticeable advantages over well known SSH connections. In brief, it’s faster and more responsive, especially on long delay and/or unreliable links.

Key benefits of Mosh

  • Stays connected if your IP is changed. Roaming feature of Mosh allows you to move between Internet connections and keep Mosh session online. For example, if your wifi connection changes IP you don’t need to reconnect.
  • Keeps session after loosing connection. For example, if you lost Internet connection for some time, or your laptop went offline due to exhausted battery – you’ll be able to pick up previously opened Mosh session easily.
  • No root rights needed to use Mosh. Unlike SSH Mosh server is not a daemon that needs to listen on specific port to accept incoming connections from clients. Mosh server and client are executables that could be run by ordinary user.
  • The same credentials for remote login. Mosh uses SSH for authorization so in order to open connection you need the same credentials as before.
  • Responsive Ctrl+C combination. Unlike SSH Mosh doesn’t fill up network buffers so even if you accidentally requested to output 100 MB file you’ll be able to hit Ctrl+C and stop it immediately.
  • Better for slow or lagged links. Have you ever tried to use SSH on satellite link where average RTT is 600 ms or more? Wish Mosh you don’t need to wait until server replies to see your typing. It works in CLI and such programs as vi or emacs so on it makes it possible to do the job slow connections more comfortably.

Well, there are some disadvantages too:

  • No IPv6 support.
  • UTF-8 only.

Mosh is available for all major Linux distributions, FreeBSD and Mac OS X systems:

Ubuntu (12.04 LTS) or Debian (testing/unstable):

sudo apt-get install mosh

Gentoo:

emerge net-misc/mosh

Arch Linux:

packer -S mobile-shell-git

FreeBSD:

portmaster net/mosh

Mac OS X:

<a  class="colorbox" href="https://github.com/downloads/keithw/mosh/mosh-1.1.3-2.pkg">mosh-1.1.3-2.pkg</a>

Sources: mosh-1.1.3.tar.gz

Project’s website

P.S. It’s better that combination of SSH and GNU Screen.

Add physical NIC to XenServer

If you add new physical network interface to the hardware that runs XenServer it won’t appear in XenCenter by default.

In order to attach it to VMs or change its settings you’ll need to type in a few commands to XenServer’s CLI.

1. Connect XenServer via SSH using root rights:

ssh root@192.168.10.1 -v

2. Make sure that new NIC is attached to hardware and detected by Linux, in below command’s output you can see there are three Ethernet controllers (the last one was just attached to hardware):

[root@localhost ~]# lspci  | grep -i ethernet
10:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
1e:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5723 Gigabit Ethernet PCIe (rev 10)
30:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)

As you can see this NIC isn’t shown in XenCenter and below command doesn’t show its UID among detected interfaces:

root@localhost ~]# xe pif-list
uuid ( RO)                  : 095abcc1-4d64-7925-200f-a91d558ec872
                device ( RO): eth1
    currently-attached ( RO): true
                  VLAN ( RO): -1
          network-uuid ( RO): 9da74476-ffcb-6824-25ad-62d46f34e252
 
uuid ( RO)                  : 555844b2-4061-47e0-52ef-01e42f182eef
                device ( RO): eth0
    currently-attached ( RO): true
                  VLAN ( RO): -1
          network-uuid ( RO): 90a0e347-9246-7ac9-c939-30983602c14e

As well as no new eth2 in ifconfig’s output

[root@localhost ~]# ifconfig     
eth0      Link encap:Ethernet  HWaddr 68:B5:99:E3:1C:56  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1953 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2475 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:201110 (196.3 KiB)  TX bytes:1929408 (1.8 MiB)
          Interrupt:19 
 
eth1      Link encap:Ethernet  HWaddr 00:30:4F:33:43:6E  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:110 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:14435 (14.0 KiB)  TX bytes:0 (0.0 b)
          Interrupt:17 Base address:0xe000
[root@localhost ~]# ifconfig eth2
ifconfig: interface eth2 does not exist

3. Solution is pretty easy – you just need to find out UUID of XenServer host to which you’d like to attach new NIC. You can do it by the following commands:

[root@localhost ~]# xe host-list 
uuid ( RO)                : c5ab0df3-440a-4164-b1a4-6febf1ff0052
          name-label ( RW): XenServer HP Proliant ML 110
    name-description ( RW): Default install of XenServer

and

[root@localhost ~]# xe pif-scan host-uuid=c5ab0df3-440a-4164-b1a4-6febf1ff0052

That’s it, from now you’ll see new NIC in XenCenter.

[root@localhost ~]# xe pif-list
uuid ( RO)                  : 095abcc1-4d64-7925-200f-a91d558ec872
                device ( RO): eth1
    currently-attached ( RO): true
                  VLAN ( RO): -1
          network-uuid ( RO): 9da74476-ffcb-6824-25ad-62d46f34e252
 
uuid ( RO)                  : 555844b2-4061-47e0-52ef-01e42f182eef
                device ( RO): eth0
    currently-attached ( RO): true
                  VLAN ( RO): -1
          network-uuid ( RO): 90a0e347-9246-7ac9-c939-30983602c14e
 
uuid ( RO)                  : 7f3b59d7-1508-835a-b268-4476bbac33d5
                device ( RO): eth2
    currently-attached ( RO): false
                  VLAN ( RO): -1
          network-uuid ( RO): 9584917b-e49a-f075-f1e0-8ba2c4a4bf02