Prototype Shepherd PID 1 boot on FreeBSD
This commit is contained in:
178
docs/reports/postphase10-shepherd-pid1-qemu-freebsd.md
Normal file
178
docs/reports/postphase10-shepherd-pid1-qemu-freebsd.md
Normal file
@@ -0,0 +1,178 @@
|
||||
# Post-Phase-10: local Shepherd-as-PID-1 boot prototype on FreeBSD
|
||||
|
||||
Date: 2026-04-02
|
||||
|
||||
## Goal
|
||||
|
||||
Begin the next post-Phase-10 runtime-integration pass by exploring two related directions:
|
||||
|
||||
- reduce reliance on the `freebsd-init + rc.d + shepherd` bridge, and
|
||||
- compare Fruix's boot path with how Guix boots Shepherd as PID 1.
|
||||
|
||||
The concrete goal for this subphase was not yet a full real-VM migration of the main boot track, but a first validated local prototype where the generated FreeBSD image boots with Shepherd as PID 1 under QEMU/UEFI.
|
||||
|
||||
## Comparison with Guix
|
||||
|
||||
Guix's system model treats Shepherd as a first-class root service.
|
||||
|
||||
In `~/repos/guix/gnu/services/shepherd.scm`, the key pattern is:
|
||||
|
||||
- `shepherd-root-service-type` extends the boot service graph
|
||||
- `shepherd-boot-gexp` ultimately does an `execl` of Shepherd as PID 1
|
||||
- higher-level system services extend that root Shepherd instance declaratively
|
||||
|
||||
In other words, Guix does not merely start Shepherd from a late init script; it composes the system boot graph around Shepherd directly.
|
||||
|
||||
Fruix is not at that level of native service composition yet, but this subphase adopts the same basic architectural direction:
|
||||
|
||||
- boot into a generated Shepherd config directly,
|
||||
- let Shepherd own the service graph from PID 1,
|
||||
- and keep the imperative compatibility/bootstrap logic as small as possible.
|
||||
|
||||
## Implementation
|
||||
|
||||
### 1. Added an explicit init-mode to the FreeBSD operating-system model
|
||||
|
||||
`modules/fruix/system/freebsd.scm` now has an `init-mode` field on the declarative operating-system record.
|
||||
|
||||
Supported values are currently:
|
||||
|
||||
- `freebsd-init+rc.d-shepherd`
|
||||
- `shepherd-pid1`
|
||||
|
||||
The existing boot path remains the default.
|
||||
|
||||
### 2. Added a generated PID 1 launcher for the `shepherd-pid1` mode
|
||||
|
||||
For `shepherd-pid1`, the generated system now contains:
|
||||
|
||||
- `boot/fruix-pid1`
|
||||
|
||||
and the generated loader configuration adds:
|
||||
|
||||
- `init_exec="/run/current-system/boot/fruix-pid1"`
|
||||
|
||||
That means FreeBSD `init(8)` directly `exec`s the generated Fruix launcher as its very first action, replacing itself as PID 1.
|
||||
|
||||
The launcher currently performs the minimum bootstrap steps needed before turning control over to Shepherd:
|
||||
|
||||
- remount `/` read-write on this very-early path
|
||||
- mount declared non-root filesystems such as:
|
||||
- `devfs` on `/dev`
|
||||
- `tmpfs` on `/tmp`
|
||||
- set the hostname
|
||||
- run `/run/current-system/activate`
|
||||
- export the Guile/Shepherd runtime environment
|
||||
- `exec` Shepherd directly
|
||||
|
||||
Because Shepherd is a Guile script, the actual PID 1 process image is the Guile interpreter running Shepherd. The important validation point is that Shepherd's own pidfile records PID 1 and the service socket is owned by that process.
|
||||
|
||||
### 3. Generated a different Shepherd config for PID 1 mode
|
||||
|
||||
For `shepherd-pid1`, the generated `shepherd/init.scm` now includes the minimal helper procedures it needs inline, rather than importing the repo-side `(fruix shepherd freebsd)` module at runtime.
|
||||
|
||||
This avoids depending on checkout-only Scheme modules being present in the guest.
|
||||
|
||||
The PID 1 config currently starts a minimal service graph:
|
||||
|
||||
- `fruix-logger`
|
||||
- `netif` through FreeBSD `service(8)`
|
||||
- `sshd` through FreeBSD `service(8)`
|
||||
- `fruix-ready`
|
||||
|
||||
So this prototype still uses some FreeBSD rc scripts as service implementations, but now under Shepherd control rather than under `/etc/rc` as the primary boot manager.
|
||||
|
||||
### 4. Made activation more store-friendly for this early-boot path
|
||||
|
||||
The generated activation script now treats:
|
||||
|
||||
- `cap_mkdb /etc/login.conf`
|
||||
- `pwd_mkdb -p /etc/master.passwd`
|
||||
|
||||
as best-effort operations.
|
||||
|
||||
That matters because Fruix currently symlinks these files from the immutable system closure, and on the very early PID 1 path they should not be allowed to abort the whole boot.
|
||||
|
||||
## Validation
|
||||
|
||||
### New PID 1 template
|
||||
|
||||
Added:
|
||||
|
||||
- `tests/system/phase11-shepherd-pid1-operating-system.scm.in`
|
||||
|
||||
This declares the same minimal FreeBSD Fruix guest shape as the current Phase 9 system, but with:
|
||||
|
||||
- `#:init-mode 'shepherd-pid1`
|
||||
|
||||
### New local QEMU validation harness
|
||||
|
||||
Added:
|
||||
|
||||
- `tests/system/run-phase11-shepherd-pid1-qemu.sh`
|
||||
|
||||
This harness:
|
||||
|
||||
- builds the image through the canonical `fruix system image` path
|
||||
- boots it locally with QEMU/TCG + UEFI
|
||||
- injects the root SSH key
|
||||
- waits for the ready marker over forwarded SSH
|
||||
- verifies that Shepherd is running and that Shepherd's pidfile says PID 1
|
||||
|
||||
### Successful run
|
||||
|
||||
Passing validation run:
|
||||
|
||||
- `PASS phase11-shepherd-pid1-qemu`
|
||||
- workdir: `/tmp/pid1-qemu6-1775128407`
|
||||
|
||||
Key validated results:
|
||||
|
||||
```text
|
||||
ready_marker=ready
|
||||
run_current_system_target=/frx/store/8b44506c37da85cebf265c813ed3a9d2a42408b077ac85854e7d6209d2f910ec-fruix-system-fruix-freebsd
|
||||
shepherd_pid=1
|
||||
shepherd_socket=present
|
||||
shepherd_status=running
|
||||
sshd_status=running
|
||||
pid1_command=[guile]
|
||||
boot_backend=qemu-uefi-tcg
|
||||
init_mode=shepherd-pid1
|
||||
```
|
||||
|
||||
The important detail is:
|
||||
|
||||
- `shepherd_pid=1`
|
||||
|
||||
which shows that the running Shepherd instance in the guest is the system's PID 1 process.
|
||||
|
||||
## Assessment
|
||||
|
||||
This is a meaningful architectural step beyond the earlier `rc.d` bridge milestone.
|
||||
|
||||
Fruix now has a validated local boot path where:
|
||||
|
||||
- the generated image boots on FreeBSD,
|
||||
- the generated launcher becomes PID 1 via `init_exec`,
|
||||
- Shepherd itself owns PID 1,
|
||||
- networking and SSH come up under Shepherd-managed service ordering,
|
||||
- and the ready marker still appears.
|
||||
|
||||
## Remaining limitations
|
||||
|
||||
This is still a prototype, not yet the replacement for the main boot path.
|
||||
|
||||
Notable current limitations:
|
||||
|
||||
- the PID 1 path still relies on a small generated shell launcher before entering Shepherd
|
||||
- some early boot/runtime actions are still expressed imperatively there
|
||||
- the Guile/Shepherd local-runtime compatibility-prefix shims are not eliminated yet; they remain part of activation for the currently locally built runtime artifacts
|
||||
- this subphase validated the path locally under QEMU/TCG, not yet on the real XCP-ng VM
|
||||
|
||||
## Recommended next step
|
||||
|
||||
Use this validated local PID 1 prototype as the base for the next subphase:
|
||||
|
||||
1. try the `shepherd-pid1` image on the real XCP-ng VM
|
||||
2. if that succeeds, decide whether `shepherd-pid1` should become a selectable supported boot mode rather than just a prototype
|
||||
3. continue reducing the remaining compatibility-prefix shims by moving the Guile/Shepherd runtime artifacts toward a more native store-aware arrangement
|
||||
Reference in New Issue
Block a user