Jaron’s Blog

Automatic fallback to the last bootable state with systemd-boot and ZFS

With rolling-release Linux, breaking software changes are inevitable. When combined with a filesystem like ZFS that isn’t supported by the kernel, things will break often. When something breaks that renders the system unbootable it can be a real pain, since distribution install and rescue images often need ZFS support added manually before you can chroot into the broken system. When the breakage happens while I’m travelling, the situation is hopeless until I have physical access again.

I just put together a very clean way to dramatically mitigate the risk of an unbootable system, combining features of systemd-boot and ZFS. Every time the system is successfully booted, it automatically clones the state of the root filesystem and backs up the UKI. This fallback UKI is available from the systemd-boot menu with its root pointing at the clone filesytem. Therefore, a fallback involving the kernel, initramfs, and root filesystem from the previous successful boot is always available if anything goes wrong while booting a new configuration.

The script and systemd unit files from this post are in a git repository, and I have written a PKGBUILD for creating an Arch Linux package to install them. Below, I detail how the system works.

Creating the fallback root filesystem and UKI

At every boot, the system runs the following script with the name of the default ZFS root filesystem (in my case, default) passed to it:

/usr/bin/prepare-lastboot
#!/bin/bash

DEFAULT="$1"
LASTBOOT="lastboot"

EFI_DIR="$(bootctl -p)"
ROOT_FS=$(findmnt -no SOURCE /)
BASE="${ROOT_FS%/*}"

# default UKI name used by kernel-install
UKI_NAME="$(cat /etc/machine-id)-$(uname -r).efi"

if [ "$ROOT_FS" = "$BASE/$DEFAULT" ]; then
  echo "sucessfully booted to $BASE/$DEFAULT, preparing lastboot"

  # destroy the old lastboot file system and children
  zfs destroy -Rv "$BASE/$DEFAULT@$LASTBOOT"

  # make a new lastboot file system
  zfs snap -r "$BASE/$DEFAULT@$LASTBOOT"

  for OLDFS in $(zfs list -rHo name -t filesystem "$BASE/$DEFAULT"); do
    NEWFS=$(sed "s|^$BASE/$DEFAULT|$BASE/$LASTBOOT-$DEFAULT|" <<< $OLDFS)
    echo "cloning $OLDFS to $NEWFS"
    zfs clone -o canmount=noauto -o mountpoint=none "$OLDFS@$LASTBOOT" "$NEWFS"
  done

  zfs set mountpoint=/ "$BASE/$LASTBOOT-$DEFAULT"

  # backup the current UKI
  echo "copying $UKI_NAME to $LASTBOOT-$DEFAULT.efi"
  cp "$EFI_DIR/EFI/Linux/$UKI_NAME" "$EFI_DIR/EFI/Linux/$LASTBOOT-$DEFAULT.efi"
else
  echo "not booted to $BASE/$DEFAULT, taking no action"
fi

If the name of the filesystem mounted to root is not the same as the passed argument, the script does nothing. This is to prevent the fallback from backing up itself.

If the indicated filesystem is mounted to root, the script does several things:

  1. It deletes old snapshots called @lastboot and their descendant clones.
  2. It creates a recursive @lastboot snapshot of the default root filesystem and clones it to a lastboot-default filesystem.
  3. Finally, it copies the UKI responsible for the last boot to lastboot-default.efi.

What does this look like in practice? On my laptop, my default root filesystems look like

% zfs list -r zroot/root/default
NAME                                USED  AVAIL  REFER  MOUNTPOINT
zroot/root/default                 40.0G  67.0G  11.0G  /
zroot/root/default/var              653M  67.0G    98K  none
zroot/root/default/var/lib          653M  67.0G    98K  none
zroot/root/default/var/lib/pacman   653M  67.0G   130M  /var/lib/pacman

with /var/lib/pacman saved under the root ZFS tree because I want rollbacks of the root filesystem to have compatible rollbacks of the package manager database. The script creates the snapshots and clones

% zfs list -rt all | grep lastboot
zroot/root/default@lastboot                 2.40M      -  11.0G  -
zroot/root/default/var@lastboot                0B      -    98K  -
zroot/root/default/var/lib@lastboot            0B      -    98K  -
zroot/root/default/var/lib/pacman@lastboot     0B      -   130M  -
zroot/root/lastboot-default                   16K  67.0G  11.0G  /
zroot/root/lastboot-default/var                8K  67.0G    98K  none
zroot/root/lastboot-default/var/lib            8K  67.0G    98K  none
zroot/root/lastboot-default/var/lib/pacman     8K  67.0G   130M  none

after destroying any @lastboot snapshots and clones from a previous invocation. Finally, it copies my existing UKI from /efi/EFI/Linux/316fc9cb38d6469ea9acc93638ffdfa6-6.5.9-arch2-1.efi to /efi/EFI/Linux/lastboot-default.efi. This assumes that the UKI has been installed to the UEFI partition using the standard naming scheme of the kernel-install utility.

Configuring the fallback UKI

Because it is a copy of the system’s regular UKI, the lastboot-default.efi UKI has within it a kernel command line that explicitly selects the default root for booting:

# lsinitrd /efi/EFI/Linux/lastboot-default.efi | head -n 7
objcopy: /dev/null: file truncated
initrd in UEFI: /efi/EFI/Linux/lastboot-default.efi: 28M
OS Release: Arch Linux (arch-rolling)
Kernel Version: 6.5.9-arch2-1 (linux@archlinux) #1 SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 00:52:20 +0000
Command line:
root=zfs:zroot/root/default
systemd.gpt_auto=no
panic=3

In order for the lastboot UKI to use the lastboot root filesystem instead, the kernel command line needs to be amended. This might have been a chore, having to disassemble, edit, reassemble and resign the UKI. However, systemd-boot offers an extremely easy solution!

The systemd UKI generator ukify has the ability to create PE binaries with auxiliary data, like a new kernel command line. Then for a UKI file in the UEFI partition called foo.efi, systemd-boot knows to look for such auxiliary binaries with paths foo.efi.extra.d/*.addon.efi. For add-ons that contain a kernel command line, systemd-boot appends this command line to that of the original UKI file, allowing the original root directive to be overridden. Moreover, these files can be signed and integrated into the secure boot stack of your system.

So, I create a file with the alternate lastboot root with

/usr/lib/systemd/ukify build --cmdline "root=zfs:zroot/root/lastboot-default" \
    --output=/efi/EFI/Linux/lastboot-default.efi.extra.d/cmdline.addon.efi

and then sign it with the sbctl utility for managing secure boot. This only needs to be done once, the since same command line add-on is able to modify every new lastboot UKI. In the end, the contents of /efi/EFI/Linux are

# tree /efi/EFI/Linux
/efi/EFI/Linux
├── 316fc9cb38d6469ea9acc93638ffdfa6-6.5.9-arch2-1.efi
├── lastboot-default.efi
└── lastboot-default.efi.extra.d
    └── cmdline.addon.efi

2 directories, 3 files

This is very convenient: once such an add-on is placed in the UEFI partition and signed, our scheme of simply copying and renaming the old UKI just works. The new lastboot UKI is signed because the old UKI was signed, and the incorrect root directive in the kernel command line is quietly overridden by systemd-boot because the filename of the lastboot UKI matches the extra.d directory containing the add-on.

Making it run after each successful boot

To make this script run after every successful boot, I use a systemd timer that runs a fixed time after boot (in this case, 30 seconds). Don’t do anything hasty after booting: you don’t want to bring your system into an unbootable state within those 30 seconds, or the lastboot fallback with also be broken!

The systemd service and timer files are

/usr/lib/systemd/system/prepare-lastboot@.service
[Unit]
Description = Prepares lastboot filesystem of %i for fallback

[Service]
Type = oneshot
ExecStart = /usr/bin/prepare-lastboot "%i"
/usr/lib/systemd/system/prepare-lastboot@.timer
[Unit]
Description = Run prepare-lastboot after a successful boot of %i

[Timer]
OnBootSec = 30

[Install]
WantedBy = timers.target

which I enable by running

# systemctl enable prepare-lastboot@default.timer

The action as it appears in the system journal:

% journalctl -u prepare-lastboot@default.service -b0
Oct 27 12:08:01 WOPR prepare-lastboot[1708]: sucessfully booted to zroot/root/default, preparing lastboot
Oct 27 12:08:00 WOPR systemd[1]: Starting Prepares lastboot filesystem of default for fallback...
Oct 27 12:08:01 WOPR prepare-lastboot[1721]: will destroy zroot/root/default@lastboot
Oct 27 12:08:01 WOPR prepare-lastboot[1721]: will destroy zroot/root/default/var@lastboot
Oct 27 12:08:01 WOPR prepare-lastboot[1721]: will destroy zroot/root/default/var/lib@lastboot
Oct 27 12:08:01 WOPR prepare-lastboot[1721]: will destroy zroot/root/default/var/lib/pacman@lastboot
Oct 27 12:08:01 WOPR prepare-lastboot[1721]: will reclaim 797K
Oct 27 12:08:01 WOPR prepare-lastboot[1721]: will destroy zroot/root/lastboot-default/var/lib/pacman
Oct 27 12:08:01 WOPR prepare-lastboot[1721]: will destroy zroot/root/lastboot-default/var/lib
Oct 27 12:08:01 WOPR prepare-lastboot[1721]: will destroy zroot/root/lastboot-default/var
Oct 27 12:08:01 WOPR prepare-lastboot[1721]: will destroy zroot/root/lastboot-default
Oct 27 12:08:01 WOPR prepare-lastboot[1708]: cloning zroot/root/default to zroot/root/lastboot-default
Oct 27 12:08:01 WOPR prepare-lastboot[1708]: cloning zroot/root/default/var to zroot/root/lastboot-default/var
Oct 27 12:08:01 WOPR prepare-lastboot[1708]: cloning zroot/root/default/var/lib to zroot/root/lastboot-default/var/lib
Oct 27 12:08:01 WOPR prepare-lastboot[1708]: cloning zroot/root/default/var/lib/pacman to zroot/root/lastboot-default/var/lib/pacman
Oct 27 12:08:02 WOPR prepare-lastboot[1708]: copying 316fc9cb38d6469ea9acc93638ffdfa6-6.5.9-arch2-1.efi to lastboot-default.efi
Oct 27 12:08:02 WOPR systemd[1]: prepare-lastboot@default.service: Deactivated successfully.
Oct 27 12:08:02 WOPR systemd[1]: Finished Prepares lastboot filesystem of default for fallback.

And it’s as easy as that! After running the script for the first time, there is a new entry in the bootloader associated with the fallback lastboot-default UKI file. That fallback and its associated fallback root filesystem are updated every time the system successfully boots the default filesystem, and won’t update otherwise. At any time, the lastboot UKI and its filesystem will contain the same kernel version and so are compatible for boot, and the whole thing is compatible with secure boot and therefore can’t be tampered with offline.

All-in-all, a very nice, clean solution: I don’t have to think about anything, and if something goes badly wrong I can select the alternate boot entry and have it come up instantly. Moreover, the same thing should be true remotely: after the tries run out for the default entry to boot, systemd-boot will automatically fall back on the lastboot entry.