Prototype Shepherd PID 1 boot on FreeBSD

This commit is contained in:
2026-04-02 13:29:17 +02:00
parent c62c89b078
commit f5ffd111ee
5 changed files with 655 additions and 14 deletions

View File

@@ -2421,3 +2421,68 @@ Next recommended step:
1. begin the next post-Phase-10 cleanup/polish pass outside the plan milestones
2. prioritize replacing the current Guile / Shepherd compatibility-prefix shims with a more native store-path-aware runtime arrangement
3. consider adding richer deploy/vm-oriented `fruix` commands beyond the now-canonical `system build/rootfs/image` path
## 2026-04-02 — Post-Phase-10: local Shepherd-as-PID-1 prototype booted on FreeBSD
Completed work:
- began the next post-Phase-10 runtime-integration pass by exploring a Shepherd-as-PID-1 boot mode for Fruix on FreeBSD
- compared the approach with Guix's root Shepherd design in:
- `~/repos/guix/gnu/services/shepherd.scm`
- wrote the subphase report:
- `docs/reports/postphase10-shepherd-pid1-qemu-freebsd.md`
- extended the declarative FreeBSD operating-system model in:
- `modules/fruix/system/freebsd.scm`
- added an `init-mode` field with:
- `freebsd-init+rc.d-shepherd`
- `shepherd-pid1`
- generated loader configuration now sets:
- `init_exec="/run/current-system/boot/fruix-pid1"`
when `init-mode` is `shepherd-pid1`
- generated systems in PID 1 mode now include:
- `boot/fruix-pid1`
- the generated activation script now treats `cap_mkdb` / `pwd_mkdb` as best-effort so immutable store-backed config files do not abort this early boot path
- added a dedicated Shepherd-PID-1 operating-system template:
- `tests/system/phase11-shepherd-pid1-operating-system.scm.in`
- added a dedicated local QEMU/UEFI validation harness:
- `tests/system/run-phase11-shepherd-pid1-qemu.sh`
Important findings:
- FreeBSD's `init(8)` already has a suitable handoff mechanism for this experiment via:
- `init_exec`
- compared with Guix, the current Fruix implementation is still much more imperative, but it now follows the same broad direction:
- boot into Shepherd directly as PID 1 rather than merely starting Shepherd late from rc.d
- the first PID 1 attempt failed because the generated Shepherd config imported a repo-side module:
- `(fruix shepherd freebsd)`
that was not present inside the guest runtime; the fix was to inline the small helper procedures needed by the generated config itself
- the early PID 1 path also exposed that store-backed `/etc/login.conf` and `/etc/master.passwd` updates must be best-effort rather than fatal on this bootstrap path
- for the current locally built runtime artifacts, the compatibility-prefix shims are still needed; this subphase did not eliminate them yet, but it did remove the larger `rc.d` boot-manager dependency from the local prototype path
Validation:
- `tests/system/run-phase11-shepherd-pid1-qemu.sh` now passes
- passing run workdir:
- `/tmp/pid1-qemu6-1775128407`
- validated local guest state included:
- `ready_marker=ready`
- `shepherd_pid=1`
- `shepherd_socket=present`
- `shepherd_status=running`
- `sshd_status=running`
- `boot_backend=qemu-uefi-tcg`
- `init_mode=shepherd-pid1`
Current assessment:
- Fruix now has a working local FreeBSD prototype where Shepherd itself is PID 1
- this is not yet the new mainline boot path, but it proves that the project can move beyond the earlier `freebsd-init+rc.d-shepherd` bridge architecture
- the PID 1 process image appears as Guile because Shepherd is launched as a Guile script, but the decisive validation point is that:
- `/var/run/shepherd.pid` contains `1`
- this subphase was validated locally under QEMU/TCG + UEFI; the next meaningful test is the real XCP-ng VM
Next recommended step:
1. try the `shepherd-pid1` image on the real XCP-ng VM
2. if it boots there too, decide whether to keep `shepherd-pid1` as an experimental selectable boot mode or advance it further toward the main Fruix boot path
3. continue reducing the remaining Guile / Shepherd compatibility-prefix shims now that the broader `rc.d` boot-manager dependency has been locally bypassed

View File

@@ -0,0 +1,178 @@
# Post-Phase-10: local Shepherd-as-PID-1 boot prototype on FreeBSD
Date: 2026-04-02
## Goal
Begin the next post-Phase-10 runtime-integration pass by exploring two related directions:
- reduce reliance on the `freebsd-init + rc.d + shepherd` bridge, and
- compare Fruix's boot path with how Guix boots Shepherd as PID 1.
The concrete goal for this subphase was not yet a full real-VM migration of the main boot track, but a first validated local prototype where the generated FreeBSD image boots with Shepherd as PID 1 under QEMU/UEFI.
## Comparison with Guix
Guix's system model treats Shepherd as a first-class root service.
In `~/repos/guix/gnu/services/shepherd.scm`, the key pattern is:
- `shepherd-root-service-type` extends the boot service graph
- `shepherd-boot-gexp` ultimately does an `execl` of Shepherd as PID 1
- higher-level system services extend that root Shepherd instance declaratively
In other words, Guix does not merely start Shepherd from a late init script; it composes the system boot graph around Shepherd directly.
Fruix is not at that level of native service composition yet, but this subphase adopts the same basic architectural direction:
- boot into a generated Shepherd config directly,
- let Shepherd own the service graph from PID 1,
- and keep the imperative compatibility/bootstrap logic as small as possible.
## Implementation
### 1. Added an explicit init-mode to the FreeBSD operating-system model
`modules/fruix/system/freebsd.scm` now has an `init-mode` field on the declarative operating-system record.
Supported values are currently:
- `freebsd-init+rc.d-shepherd`
- `shepherd-pid1`
The existing boot path remains the default.
### 2. Added a generated PID 1 launcher for the `shepherd-pid1` mode
For `shepherd-pid1`, the generated system now contains:
- `boot/fruix-pid1`
and the generated loader configuration adds:
- `init_exec="/run/current-system/boot/fruix-pid1"`
That means FreeBSD `init(8)` directly `exec`s the generated Fruix launcher as its very first action, replacing itself as PID 1.
The launcher currently performs the minimum bootstrap steps needed before turning control over to Shepherd:
- remount `/` read-write on this very-early path
- mount declared non-root filesystems such as:
- `devfs` on `/dev`
- `tmpfs` on `/tmp`
- set the hostname
- run `/run/current-system/activate`
- export the Guile/Shepherd runtime environment
- `exec` Shepherd directly
Because Shepherd is a Guile script, the actual PID 1 process image is the Guile interpreter running Shepherd. The important validation point is that Shepherd's own pidfile records PID 1 and the service socket is owned by that process.
### 3. Generated a different Shepherd config for PID 1 mode
For `shepherd-pid1`, the generated `shepherd/init.scm` now includes the minimal helper procedures it needs inline, rather than importing the repo-side `(fruix shepherd freebsd)` module at runtime.
This avoids depending on checkout-only Scheme modules being present in the guest.
The PID 1 config currently starts a minimal service graph:
- `fruix-logger`
- `netif` through FreeBSD `service(8)`
- `sshd` through FreeBSD `service(8)`
- `fruix-ready`
So this prototype still uses some FreeBSD rc scripts as service implementations, but now under Shepherd control rather than under `/etc/rc` as the primary boot manager.
### 4. Made activation more store-friendly for this early-boot path
The generated activation script now treats:
- `cap_mkdb /etc/login.conf`
- `pwd_mkdb -p /etc/master.passwd`
as best-effort operations.
That matters because Fruix currently symlinks these files from the immutable system closure, and on the very early PID 1 path they should not be allowed to abort the whole boot.
## Validation
### New PID 1 template
Added:
- `tests/system/phase11-shepherd-pid1-operating-system.scm.in`
This declares the same minimal FreeBSD Fruix guest shape as the current Phase 9 system, but with:
- `#:init-mode 'shepherd-pid1`
### New local QEMU validation harness
Added:
- `tests/system/run-phase11-shepherd-pid1-qemu.sh`
This harness:
- builds the image through the canonical `fruix system image` path
- boots it locally with QEMU/TCG + UEFI
- injects the root SSH key
- waits for the ready marker over forwarded SSH
- verifies that Shepherd is running and that Shepherd's pidfile says PID 1
### Successful run
Passing validation run:
- `PASS phase11-shepherd-pid1-qemu`
- workdir: `/tmp/pid1-qemu6-1775128407`
Key validated results:
```text
ready_marker=ready
run_current_system_target=/frx/store/8b44506c37da85cebf265c813ed3a9d2a42408b077ac85854e7d6209d2f910ec-fruix-system-fruix-freebsd
shepherd_pid=1
shepherd_socket=present
shepherd_status=running
sshd_status=running
pid1_command=[guile]
boot_backend=qemu-uefi-tcg
init_mode=shepherd-pid1
```
The important detail is:
- `shepherd_pid=1`
which shows that the running Shepherd instance in the guest is the system's PID 1 process.
## Assessment
This is a meaningful architectural step beyond the earlier `rc.d` bridge milestone.
Fruix now has a validated local boot path where:
- the generated image boots on FreeBSD,
- the generated launcher becomes PID 1 via `init_exec`,
- Shepherd itself owns PID 1,
- networking and SSH come up under Shepherd-managed service ordering,
- and the ready marker still appears.
## Remaining limitations
This is still a prototype, not yet the replacement for the main boot path.
Notable current limitations:
- the PID 1 path still relies on a small generated shell launcher before entering Shepherd
- some early boot/runtime actions are still expressed imperatively there
- the Guile/Shepherd local-runtime compatibility-prefix shims are not eliminated yet; they remain part of activation for the currently locally built runtime artifacts
- this subphase validated the path locally under QEMU/TCG, not yet on the real XCP-ng VM
## Recommended next step
Use this validated local PID 1 prototype as the base for the next subphase:
1. try the `shepherd-pid1` image on the real XCP-ng VM
2. if that succeeds, decide whether `shepherd-pid1` should become a selectable supported boot mode rather than just a prototype
3. continue reducing the remaining compatibility-prefix shims by moving the Guile/Shepherd runtime artifacts toward a more native store-aware arrangement