7.7 KiB
Phase 9 completion: Fruix FreeBSD reached the ready marker on XCP-ng
Date: 2026-04-02
Goal
Complete the first minimal Phase 9 boot milestone on the active FreeBSD track.
Because local bhyve is unavailable in this Xen environment, the active validation target remained the operator-approved XCP-ng VM and its existing VDI:
- VM:
90490f2e-e8fc-4b7a-388e-5c26f0157289 - VDI:
0f1f90d3-48ca-4fa2-91d8-fc6339b95743
The required Phase 9 outcomes for this completion step were:
- boot the generated Fruix image on the real VM,
- reach the generated ready marker,
- keep Shepherd running in the guest,
- keep SSH available for operator access,
- and validate the declared system closure from inside the guest.
Result
This phase now succeeds on the active XCP-ng path.
tests/system/run-phase9-xcpng-boot.sh now passes end-to-end and verifies:
- boot on the real XCP-ng VM,
- DHCP on the guest NIC,
- root SSH access via the injected key,
/run/current-systempointing at the generated Fruix closure under/frx/store,- the ready marker at
/var/lib/fruix/ready, fruix-shepherdrunning,sshdrunning,- and a minimal operator-facing home directory for the declared
operatoraccount.
Successful run metadata:
- workdir:
/tmp/phase9-xcpng-pass-1775113189 - guest IP:
192.168.213.62 - closure path:
/frx/store/0fe459ea22156510e64cea794b7a001151b59625bd5f12a488d6851e1c6d2198-fruix-system-fruix-freebsd
- image path:
/frx/store/73f5757f8b58cf15fd97fc9a9704664d4b1d390d547fffff68c129a85d6cc368-fruix-bhyve-image-fruix-freebsd/disk.img
Representative successful metadata values from the passing XCP-ng run:
ready_marker=ready
run_current_system_target=/frx/store/0fe459ea22156510e64cea794b7a001151b59625bd5f12a488d6851e1c6d2198-fruix-system-fruix-freebsd
shepherd_status=running
sshd_status=running
operator_home_listing=/home/operator
uname_output=FreeBSD 15.0-STABLE
logger_log=fruix-shepherd-started
Root causes resolved
The remaining Phase 9 blocker turned out not to be a single Guile bug, but a chain of runtime integration gaps.
1. Guile needed a usable UTF-8 locale in the guest
With the minimal image as originally staged, guest Guile started in a plain C locale and crashed very early in locale-string conversion paths.
The fix for the current prototype track was to stage the minimal locale data needed for C.UTF-8:
/usr/share/locale/C.UTF-8/LC_CTYPE
and to start Shepherd with:
LANG=C.UTF-8LC_ALL=C.UTF-8
2. The copied Guile / Shepherd runtimes still contained baked-in source-prefix paths
The locally built Guile, guile-gnutls/fibers, and Shepherd artifacts were originally installed under temporary validation prefixes such as:
/tmp/guile-freebsd-validate-install/tmp/guile-gnutls-freebsd-validate-install/tmp/shepherd-freebsd-validate-install
Even after those trees were copied into /frx/store, a number of runtime references still pointed back to the original prefixes:
- Guile system load paths compiled into
libguile - Shepherd launcher scripts
- Fibers and GnuTLS Scheme modules
- Shepherd configuration module paths
The immediate prototype fix was to make activation recreate compatibility symlinks at those original prefixes, but pointing at the actual store items in the guest.
This keeps the running system store-backed while unblocking the existing locally built Guile/Shepherd artifacts.
3. The Shepherd process needed to be detached correctly from rc startup
Starting Shepherd with a simple shell background & was not sufficient on the real boot path. The process could exit when the invoking shell/session disappeared, which made the ready marker appear transiently while Shepherd itself did not remain up.
The fix was to launch Shepherd through FreeBSD daemon(8):
/usr/sbin/daemon -c -f -p "$pidfile" -o /var/log/shepherd-bootstrap.out ...
This gave the guest a stable long-lived Shepherd daemon process and made onestatus/socket checks reliable.
4. The initial Shepherd config used helper APIs that were not actually present in the guest runtime
The generated Shepherd config originally used:
mkdir-pcall-with-output-filewith#:append
Those choices were too optimistic for the minimal Scheme environment being staged.
The fix was to replace them with simpler portable logic:
- a local recursive directory-creation helper based on
mkdir - explicit append-mode logging via
open-file "a"
5. The XCP-ng harness itself had an SSH-key bug
The first end-to-end rerun of tests/system/run-phase9-xcpng-boot.sh failed because it used the public key file as the SSH identity file.
The harness now distinguishes:
ROOT_AUTHORIZED_KEY_FILEfor guest key injectionROOT_SSH_PRIVATE_KEY_FILEfor the host-side SSH login
with the private key defaulting to:
~/.ssh/id_ed25519
Code-level changes that closed the blocker
modules/fruix/packages/freebsd.scm
Extended the minimal runtime again to support the final ready-state path:
- staged
/usr/sbin/daemon - staged
/usr/share/locale/C.UTF-8/LC_CTYPE
modules/fruix/system/freebsd.scm
Completed the Guile/Shepherd guest runtime integration by:
- generating activation that recreates compatibility symlinks from the historical build prefixes to the real
/frx/storeitems - exporting locale and Guile runtime path variables in the Shepherd rc script:
LANGLC_ALLGUILE_SYSTEM_PATHGUILE_SYSTEM_COMPILED_PATHGUILE_SYSTEM_EXTENSIONS_PATH- the existing site path variables
- starting Shepherd through
daemon(8)instead of a fragile shell background job - fixing the generated Shepherd config so the logger and ready-marker services work in the guest
- exposing the staged locale data into the rootfs via
/usr/share/locale
tests/system/run-phase9-xcpng-boot.sh
Improved the real XCP-ng harness by:
- separating the injected public key from the SSH private key actually used for login
- preserving the successful passing metadata for the full ready-marker path
Validation details
Local reproduction and verification
Before re-testing on XCP-ng, the failure was reproduced and fixed in two faster environments:
- a host-side chroot into the generated image root partition
- local QEMU/TCG boots with UEFI and SSH forwarding
That produced a much tighter debug loop for:
- locale staging,
- baked-prefix compatibility,
- and Shepherd daemon lifetime.
Real XCP-ng validation
The final proof remained the real VM.
The passing XCP-ng run verified all of the following from the booted guest over SSH:
cat /var/lib/fruix/readyreturnsready/usr/local/etc/rc.d/fruix-shepherd onestatussucceedsservice sshd onestatussucceedsreadlink /run/current-systemmatches the generated Fruix closure/home/operatorexists
Assessment against Phase 9 goals
9.1 deterministic ready state
Satisfied on the active XCP-ng track.
The guest now boots to a deterministic ready marker:
/var/lib/fruix/ready
9.2 in-guest Shepherd and core-service validation
Satisfied on the active XCP-ng track.
The guest now validates:
- Shepherd active
- generated configuration in effect
- system closure mounted through
/run/current-system sshdavailable for remote operator access
9.3 minimal operator usability
Satisfied on the active XCP-ng track.
A human operator can now:
- discover the DHCP address,
- log in over SSH with the injected root key,
- inspect
/run/current-system, - inspect the ready marker,
- and inspect Shepherd/log state in the guest.
Conclusion
Phase 9 is complete for the current FreeBSD prototype track, using the active XCP-ng replacement path in place of unavailable local bhyve.
The Fruix image now boots as a real FreeBSD VM, reaches the generated ready state, runs Shepherd successfully, and supports a minimal operator workflow over SSH.