With rolling-release Linux, breaking software changes are inevitable. When combined with a filesystem like ZFS that isn’t supported by the kernel, things will break often. When something breaks that renders the system unbootable it can be a real pain, since distribution install and rescue images often need ZFS support added manually before you can chroot into the broken system. When the breakage happens while I’m travelling, the situation is hopeless until I have physical access again.
I just put together a very clean way to dramatically mitigate the risk of an unbootable system, combining features of systemd-boot and ZFS. Every time the system is successfully booted, it automatically clones the state of the root filesystem and backs up the UKI. This fallback UKI is available from the systemd-boot menu with its root pointing at the clone filesytem. Therefore, a fallback involving the kernel, initramfs, and root filesystem from the previous successful boot is always available if anything goes wrong while booting a new configuration.
The script and systemd unit files from this post are in a git repository, and I have written a PKGBUILD for creating an Arch Linux package to install them. Below, I detail how the system works.
At every boot, the system runs the following script with the name of the
default ZFS root filesystem (in my case, default
) passed to it:
/usr/bin/prepare-lastboot
If the name of the filesystem mounted to root is not the same as the passed argument, the script does nothing. This is to prevent the fallback from backing up itself.
If the indicated filesystem is mounted to root, the script does several things:
@lastboot
and their descendant clones.@lastboot
snapshot of the default
root filesystem and clones it to a lastboot-default
filesystem.lastboot-default.efi
.What does this look like in practice? On my laptop, my default root filesystems look like
with /var/lib/pacman
saved under the root ZFS tree because I want rollbacks
of the root filesystem to have compatible rollbacks of the package manager
database. The script creates the snapshots and clones
after destroying any @lastboot
snapshots and clones from a previous
invocation. Finally, it copies my existing UKI from
/efi/EFI/Linux/316fc9cb38d6469ea9acc93638ffdfa6-6.5.9-arch2-1.efi
to
/efi/EFI/Linux/lastboot-default.efi
. This assumes that the UKI has been installed to the UEFI
partition using the standard naming scheme of the kernel-install
utility.
Because it is a copy of the system’s regular UKI, the lastboot-default.efi
UKI has within it a kernel command line that explicitly selects the default
root for booting:
In order for the lastboot
UKI to
use the lastboot
root filesystem instead, the kernel command line needs to be
amended. This might have been a chore, having to disassemble, edit, reassemble
and resign the UKI. However, systemd-boot
offers an extremely easy
solution!
The systemd UKI generator ukify
has the ability to create PE binaries with
auxiliary data, like a new kernel command line. Then for a UKI file in the UEFI
partition called foo.efi
, systemd-boot knows to look for such auxiliary
binaries with paths foo.efi.extra.d/*.addon.efi
. For add-ons that
contain a kernel command line, systemd-boot appends this command line to that
of the original UKI file, allowing the original root
directive to be
overridden. Moreover, these files can be signed and integrated into the secure
boot stack of your system.
So, I create a file with the alternate lastboot
root with
and then sign it with the sbctl utility for managing secure boot. This only
needs to be done once, the since same command line add-on is able to modify
every new lastboot UKI. In the end, the contents of /efi/EFI/Linux
are
This is very convenient: once such an add-on is placed in the UEFI partition
and signed, our scheme of simply copying and renaming the old UKI just works.
The new lastboot UKI is signed because the old UKI was signed, and the
incorrect root
directive in the kernel command line is quietly overridden by
systemd-boot because the filename of the lastboot UKI matches the extra.d
directory containing the add-on.
To make this script run after every successful boot, I use a systemd timer that
runs a fixed time after boot (in this case, 30 seconds). Don’t do anything
hasty after booting: you don’t want to bring your system into an unbootable
state within those 30 seconds, or the lastboot
fallback with also be broken!
The systemd service and timer files are
/usr/lib/systemd/system/prepare-lastboot@.service
/usr/lib/systemd/system/prepare-lastboot@.timer
which I enable by running
The action as it appears in the system journal:
And it’s as easy as that! After running the script for the first time, there is
a new entry in the bootloader associated with the fallback lastboot-default
UKI file. That fallback
and its associated fallback root filesystem are updated every time the system
successfully boots the default filesystem, and won’t update otherwise. At
any time, the lastboot UKI and its filesystem will contain the same kernel
version and so are compatible for boot, and the whole thing is compatible with
secure boot and therefore can’t be tampered with offline.
All-in-all, a very nice, clean solution: I don’t have to think about anything, and if something goes badly wrong I can select the alternate boot entry and have it come up instantly. Moreover, the same thing should be true remotely: after the tries run out for the default entry to boot, systemd-boot will automatically fall back on the lastboot entry.