Prototype Shepherd PID 1 boot on FreeBSD

This commit is contained in:
2026-04-02 13:29:17 +02:00
parent c62c89b078
commit f5ffd111ee
5 changed files with 655 additions and 14 deletions

View File

@@ -0,0 +1,178 @@
# Post-Phase-10: local Shepherd-as-PID-1 boot prototype on FreeBSD
Date: 2026-04-02
## Goal
Begin the next post-Phase-10 runtime-integration pass by exploring two related directions:
- reduce reliance on the `freebsd-init + rc.d + shepherd` bridge, and
- compare Fruix's boot path with how Guix boots Shepherd as PID 1.
The concrete goal for this subphase was not yet a full real-VM migration of the main boot track, but a first validated local prototype where the generated FreeBSD image boots with Shepherd as PID 1 under QEMU/UEFI.
## Comparison with Guix
Guix's system model treats Shepherd as a first-class root service.
In `~/repos/guix/gnu/services/shepherd.scm`, the key pattern is:
- `shepherd-root-service-type` extends the boot service graph
- `shepherd-boot-gexp` ultimately does an `execl` of Shepherd as PID 1
- higher-level system services extend that root Shepherd instance declaratively
In other words, Guix does not merely start Shepherd from a late init script; it composes the system boot graph around Shepherd directly.
Fruix is not at that level of native service composition yet, but this subphase adopts the same basic architectural direction:
- boot into a generated Shepherd config directly,
- let Shepherd own the service graph from PID 1,
- and keep the imperative compatibility/bootstrap logic as small as possible.
## Implementation
### 1. Added an explicit init-mode to the FreeBSD operating-system model
`modules/fruix/system/freebsd.scm` now has an `init-mode` field on the declarative operating-system record.
Supported values are currently:
- `freebsd-init+rc.d-shepherd`
- `shepherd-pid1`
The existing boot path remains the default.
### 2. Added a generated PID 1 launcher for the `shepherd-pid1` mode
For `shepherd-pid1`, the generated system now contains:
- `boot/fruix-pid1`
and the generated loader configuration adds:
- `init_exec="/run/current-system/boot/fruix-pid1"`
That means FreeBSD `init(8)` directly `exec`s the generated Fruix launcher as its very first action, replacing itself as PID 1.
The launcher currently performs the minimum bootstrap steps needed before turning control over to Shepherd:
- remount `/` read-write on this very-early path
- mount declared non-root filesystems such as:
- `devfs` on `/dev`
- `tmpfs` on `/tmp`
- set the hostname
- run `/run/current-system/activate`
- export the Guile/Shepherd runtime environment
- `exec` Shepherd directly
Because Shepherd is a Guile script, the actual PID 1 process image is the Guile interpreter running Shepherd. The important validation point is that Shepherd's own pidfile records PID 1 and the service socket is owned by that process.
### 3. Generated a different Shepherd config for PID 1 mode
For `shepherd-pid1`, the generated `shepherd/init.scm` now includes the minimal helper procedures it needs inline, rather than importing the repo-side `(fruix shepherd freebsd)` module at runtime.
This avoids depending on checkout-only Scheme modules being present in the guest.
The PID 1 config currently starts a minimal service graph:
- `fruix-logger`
- `netif` through FreeBSD `service(8)`
- `sshd` through FreeBSD `service(8)`
- `fruix-ready`
So this prototype still uses some FreeBSD rc scripts as service implementations, but now under Shepherd control rather than under `/etc/rc` as the primary boot manager.
### 4. Made activation more store-friendly for this early-boot path
The generated activation script now treats:
- `cap_mkdb /etc/login.conf`
- `pwd_mkdb -p /etc/master.passwd`
as best-effort operations.
That matters because Fruix currently symlinks these files from the immutable system closure, and on the very early PID 1 path they should not be allowed to abort the whole boot.
## Validation
### New PID 1 template
Added:
- `tests/system/phase11-shepherd-pid1-operating-system.scm.in`
This declares the same minimal FreeBSD Fruix guest shape as the current Phase 9 system, but with:
- `#:init-mode 'shepherd-pid1`
### New local QEMU validation harness
Added:
- `tests/system/run-phase11-shepherd-pid1-qemu.sh`
This harness:
- builds the image through the canonical `fruix system image` path
- boots it locally with QEMU/TCG + UEFI
- injects the root SSH key
- waits for the ready marker over forwarded SSH
- verifies that Shepherd is running and that Shepherd's pidfile says PID 1
### Successful run
Passing validation run:
- `PASS phase11-shepherd-pid1-qemu`
- workdir: `/tmp/pid1-qemu6-1775128407`
Key validated results:
```text
ready_marker=ready
run_current_system_target=/frx/store/8b44506c37da85cebf265c813ed3a9d2a42408b077ac85854e7d6209d2f910ec-fruix-system-fruix-freebsd
shepherd_pid=1
shepherd_socket=present
shepherd_status=running
sshd_status=running
pid1_command=[guile]
boot_backend=qemu-uefi-tcg
init_mode=shepherd-pid1
```
The important detail is:
- `shepherd_pid=1`
which shows that the running Shepherd instance in the guest is the system's PID 1 process.
## Assessment
This is a meaningful architectural step beyond the earlier `rc.d` bridge milestone.
Fruix now has a validated local boot path where:
- the generated image boots on FreeBSD,
- the generated launcher becomes PID 1 via `init_exec`,
- Shepherd itself owns PID 1,
- networking and SSH come up under Shepherd-managed service ordering,
- and the ready marker still appears.
## Remaining limitations
This is still a prototype, not yet the replacement for the main boot path.
Notable current limitations:
- the PID 1 path still relies on a small generated shell launcher before entering Shepherd
- some early boot/runtime actions are still expressed imperatively there
- the Guile/Shepherd local-runtime compatibility-prefix shims are not eliminated yet; they remain part of activation for the currently locally built runtime artifacts
- this subphase validated the path locally under QEMU/TCG, not yet on the real XCP-ng VM
## Recommended next step
Use this validated local PID 1 prototype as the base for the next subphase:
1. try the `shepherd-pid1` image on the real XCP-ng VM
2. if that succeeds, decide whether `shepherd-pid1` should become a selectable supported boot mode rather than just a prototype
3. continue reducing the remaining compatibility-prefix shims by moving the Guile/Shepherd runtime artifacts toward a more native store-aware arrangement