Validate Shepherd PID 1 boot on XCP-ng
This commit is contained in:
@@ -2486,3 +2486,58 @@ Next recommended step:
|
||||
1. try the `shepherd-pid1` image on the real XCP-ng VM
|
||||
2. if it boots there too, decide whether to keep `shepherd-pid1` as an experimental selectable boot mode or advance it further toward the main Fruix boot path
|
||||
3. continue reducing the remaining Guile / Shepherd compatibility-prefix shims now that the broader `rc.d` boot-manager dependency has been locally bypassed
|
||||
|
||||
## 2026-04-02 — Post-Phase-10: Shepherd-as-PID-1 boot also passed on the real XCP-ng VM
|
||||
|
||||
Completed work:
|
||||
|
||||
- took the locally validated `shepherd-pid1` boot mode and tested it on the real XCP-ng deployment path
|
||||
- wrote the follow-up report:
|
||||
- `docs/reports/postphase10-shepherd-pid1-xcpng-freebsd.md`
|
||||
- expanded the Shepherd-PID-1 operating-system template so the generated guest remains compatible with both local virtio and the real Xen NIC path:
|
||||
- `tests/system/phase11-shepherd-pid1-operating-system.scm.in`
|
||||
- now includes:
|
||||
- `ifconfig_xn0=SYNCDHCP`
|
||||
- `ifconfig_em0=SYNCDHCP`
|
||||
- `ifconfig_vtnet0=SYNCDHCP`
|
||||
- added a dedicated real-VM Shepherd-PID-1 deployment/validation harness:
|
||||
- `tests/system/run-phase11-shepherd-pid1-xcpng.sh`
|
||||
|
||||
Validation:
|
||||
|
||||
- `tests/system/run-phase11-shepherd-pid1-xcpng.sh` now passes on the operator-approved VM and existing VDI:
|
||||
- VM `90490f2e-e8fc-4b7a-388e-5c26f0157289`
|
||||
- VDI `0f1f90d3-48ca-4fa2-91d8-fc6339b95743`
|
||||
- passing run workdir:
|
||||
- `/tmp/pid1-xcpng-1775129768`
|
||||
- passing real-guest metadata confirmed:
|
||||
- `ready_marker=ready`
|
||||
- `run_current_system_target=/frx/store/2940c952e9d35e47f98fe62f296be2b6ab4fceb3eee8248d6a7823decd42a305-fruix-system-fruix-freebsd`
|
||||
- `pid1_command=[guile]`
|
||||
- `shepherd_pid=1`
|
||||
- `shepherd_socket=present`
|
||||
- `shepherd_status=running`
|
||||
- `sshd_status=running`
|
||||
- `init_mode=shepherd-pid1`
|
||||
|
||||
Important findings:
|
||||
|
||||
- the local QEMU PID 1 prototype was not a simulator-only artifact; the same general boot design also works on the real XCP-ng/Xen guest
|
||||
- as expected for a Guile-script entry point, the PID 1 process image shows up as Guile, but the meaningful architectural check is that:
|
||||
- `/var/run/shepherd.pid` contains `1`
|
||||
- this means Fruix has now validated two distinct real-VM boot architectures on FreeBSD:
|
||||
- `freebsd-init+rc.d-shepherd`
|
||||
- `shepherd-pid1`
|
||||
- however, this still does not remove the current Guile / Shepherd compatibility-prefix shims; those remain a separate runtime-artifact issue rather than an init-manager issue
|
||||
|
||||
Current assessment:
|
||||
|
||||
- Shepherd-as-PID-1 is now no longer merely a local prototype; it is validated on the real XCP-ng VM as well
|
||||
- this significantly strengthens the path toward a more Guix-like Fruix system architecture on FreeBSD
|
||||
- the main remaining native-runtime gap is now the baked-prefix / compatibility-shim problem, not whether Fruix can boot with Shepherd as PID 1
|
||||
|
||||
Next recommended step:
|
||||
|
||||
1. focus directly on eliminating the remaining Guile / Shepherd compatibility-prefix shims from the guest runtime
|
||||
2. preserve `shepherd-pid1` as an experimental selectable boot mode while that cleanup proceeds
|
||||
3. once the runtime-prefix issue is reduced, reassess whether `shepherd-pid1` should replace the older `freebsd-init+rc.d-shepherd` path as the preferred Fruix boot architecture
|
||||
|
||||
114
docs/reports/postphase10-shepherd-pid1-xcpng-freebsd.md
Normal file
114
docs/reports/postphase10-shepherd-pid1-xcpng-freebsd.md
Normal file
@@ -0,0 +1,114 @@
|
||||
# Post-Phase-10: Shepherd-as-PID-1 boot validated on the real XCP-ng FreeBSD VM
|
||||
|
||||
Date: 2026-04-02
|
||||
|
||||
## Goal
|
||||
|
||||
Take the locally validated Shepherd-as-PID-1 Fruix boot prototype and test it on the real operator-approved XCP-ng VM.
|
||||
|
||||
Target objects remained the same constrained deployment path used for Phase 9:
|
||||
|
||||
- VM: `90490f2e-e8fc-4b7a-388e-5c26f0157289`
|
||||
- VDI: `0f1f90d3-48ca-4fa2-91d8-fc6339b95743`
|
||||
|
||||
The concrete goal for this subphase was to confirm that the new `shepherd-pid1` init mode was not merely a local QEMU curiosity, but could also:
|
||||
|
||||
- boot on the real Xen guest,
|
||||
- reach DHCP and SSH,
|
||||
- keep Shepherd running as PID 1,
|
||||
- and still reach the Fruix ready marker.
|
||||
|
||||
## Result
|
||||
|
||||
The real XCP-ng boot succeeded.
|
||||
|
||||
A new deployment/validation harness was added:
|
||||
|
||||
- `tests/system/run-phase11-shepherd-pid1-xcpng.sh`
|
||||
|
||||
This harness reuses the existing real-VM deployment method:
|
||||
|
||||
- build a full-size image matching the existing VDI
|
||||
- convert it to dynamic VHD
|
||||
- overwrite the existing VDI
|
||||
- boot the real VM
|
||||
- rediscover the guest by MAC/IP
|
||||
- validate the booted guest over SSH
|
||||
|
||||
The new Shepherd-PID-1 image passes that full path.
|
||||
|
||||
## Validation
|
||||
|
||||
Passing real-VM run:
|
||||
|
||||
- `PASS phase11-shepherd-pid1-xcpng`
|
||||
- workdir: `/tmp/pid1-xcpng-1775129768`
|
||||
|
||||
Validated metadata from the real guest:
|
||||
|
||||
```text
|
||||
ready_marker=ready
|
||||
run_current_system_target=/frx/store/2940c952e9d35e47f98fe62f296be2b6ab4fceb3eee8248d6a7823decd42a305-fruix-system-fruix-freebsd
|
||||
pid1_command=[guile]
|
||||
shepherd_pid=1
|
||||
shepherd_socket=present
|
||||
shepherd_status=running
|
||||
sshd_status=running
|
||||
guest_ip=192.168.213.62
|
||||
boot_backend=xcp-ng-xo-cli
|
||||
init_mode=shepherd-pid1
|
||||
```
|
||||
|
||||
The key architectural confirmation is:
|
||||
|
||||
- `shepherd_pid=1`
|
||||
|
||||
That shows the running Shepherd instance in the real guest is PID 1.
|
||||
|
||||
As in the local QEMU prototype, the process image is Guile because Shepherd is launched as a Guile script; however, the service manager itself is the PID 1 process according to Shepherd's own pidfile and control socket state.
|
||||
|
||||
## What changed to make the real VM pass
|
||||
|
||||
The most important refinement after the first local PID 1 work was making the generated activation path more tolerant of immutable store-backed configuration files during very early boot.
|
||||
|
||||
Specifically, the generated activation script now treats these as best-effort:
|
||||
|
||||
- `cap_mkdb /etc/login.conf`
|
||||
- `pwd_mkdb -p /etc/master.passwd`
|
||||
|
||||
That matters because on the PID 1 path they happen earlier and should not abort the system if the current `/etc` representation is not suitable for in-place database regeneration.
|
||||
|
||||
The Shepherd-PID-1 operating-system template was also expanded to keep the NIC configuration broad enough for both local virtio and the real Xen path:
|
||||
|
||||
- `ifconfig_xn0=SYNCDHCP`
|
||||
- `ifconfig_em0=SYNCDHCP`
|
||||
- `ifconfig_vtnet0=SYNCDHCP`
|
||||
|
||||
## Assessment
|
||||
|
||||
This is a stronger result than the earlier local-only prototype.
|
||||
|
||||
Fruix now has a real deployment-validated FreeBSD boot mode where:
|
||||
|
||||
- FreeBSD `init(8)` hands off immediately via `init_exec`
|
||||
- the generated Fruix launcher performs the minimal bootstrap
|
||||
- Shepherd becomes PID 1
|
||||
- networking and SSH still work on the real XCP-ng VM
|
||||
- and the system still reaches the Fruix ready marker
|
||||
|
||||
That means the project has now validated both of these boot architectures on the real VM:
|
||||
|
||||
1. `freebsd-init+rc.d-shepherd`
|
||||
2. `shepherd-pid1`
|
||||
|
||||
## Remaining limitations
|
||||
|
||||
This does not yet eliminate the current locally built Guile/Shepherd compatibility-prefix shims.
|
||||
|
||||
Those shims are still needed because the locally staged runtime artifacts continue to embed historical build prefixes. The current result proves that the broader init/boot-manager dependency can be removed, but it does not yet fully solve the store-native runtime-prefix problem.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The Shepherd-as-PID-1 Fruix boot mode now works not only under local QEMU/UEFI, but also on the real operator-approved XCP-ng VM.
|
||||
|
||||
This substantially strengthens the case that Fruix can move beyond the transitional `rc.d` bridge design and toward a more Guix-like PID-1-centered system architecture on FreeBSD.
|
||||
Reference in New Issue
Block a user