Files
fruix/docs/reports/phase9-xcpng-ready-boot-freebsd.md

7.7 KiB

Phase 9 completion: Fruix FreeBSD reached the ready marker on XCP-ng

Date: 2026-04-02

Goal

Complete the first minimal Phase 9 boot milestone on the active FreeBSD track.

Because local bhyve is unavailable in this Xen environment, the active validation target remained the operator-approved XCP-ng VM and its existing VDI:

  • VM: 90490f2e-e8fc-4b7a-388e-5c26f0157289
  • VDI: 0f1f90d3-48ca-4fa2-91d8-fc6339b95743

The required Phase 9 outcomes for this completion step were:

  • boot the generated Fruix image on the real VM,
  • reach the generated ready marker,
  • keep Shepherd running in the guest,
  • keep SSH available for operator access,
  • and validate the declared system closure from inside the guest.

Result

This phase now succeeds on the active XCP-ng path.

tests/system/run-phase9-xcpng-boot.sh now passes end-to-end and verifies:

  • boot on the real XCP-ng VM,
  • DHCP on the guest NIC,
  • root SSH access via the injected key,
  • /run/current-system pointing at the generated Fruix closure under /frx/store,
  • the ready marker at /var/lib/fruix/ready,
  • fruix-shepherd running,
  • sshd running,
  • and a minimal operator-facing home directory for the declared operator account.

Successful run metadata:

  • workdir: /tmp/phase9-xcpng-pass-1775113189
  • guest IP: 192.168.213.62
  • closure path:
    • /frx/store/0fe459ea22156510e64cea794b7a001151b59625bd5f12a488d6851e1c6d2198-fruix-system-fruix-freebsd
  • image path:
    • /frx/store/73f5757f8b58cf15fd97fc9a9704664d4b1d390d547fffff68c129a85d6cc368-fruix-bhyve-image-fruix-freebsd/disk.img

Representative successful metadata values from the passing XCP-ng run:

ready_marker=ready
run_current_system_target=/frx/store/0fe459ea22156510e64cea794b7a001151b59625bd5f12a488d6851e1c6d2198-fruix-system-fruix-freebsd
shepherd_status=running
sshd_status=running
operator_home_listing=/home/operator
uname_output=FreeBSD 15.0-STABLE
logger_log=fruix-shepherd-started

Root causes resolved

The remaining Phase 9 blocker turned out not to be a single Guile bug, but a chain of runtime integration gaps.

1. Guile needed a usable UTF-8 locale in the guest

With the minimal image as originally staged, guest Guile started in a plain C locale and crashed very early in locale-string conversion paths.

The fix for the current prototype track was to stage the minimal locale data needed for C.UTF-8:

  • /usr/share/locale/C.UTF-8/LC_CTYPE

and to start Shepherd with:

  • LANG=C.UTF-8
  • LC_ALL=C.UTF-8

2. The copied Guile / Shepherd runtimes still contained baked-in source-prefix paths

The locally built Guile, guile-gnutls/fibers, and Shepherd artifacts were originally installed under temporary validation prefixes such as:

  • /tmp/guile-freebsd-validate-install
  • /tmp/guile-gnutls-freebsd-validate-install
  • /tmp/shepherd-freebsd-validate-install

Even after those trees were copied into /frx/store, a number of runtime references still pointed back to the original prefixes:

  • Guile system load paths compiled into libguile
  • Shepherd launcher scripts
  • Fibers and GnuTLS Scheme modules
  • Shepherd configuration module paths

The immediate prototype fix was to make activation recreate compatibility symlinks at those original prefixes, but pointing at the actual store items in the guest.

This keeps the running system store-backed while unblocking the existing locally built Guile/Shepherd artifacts.

3. The Shepherd process needed to be detached correctly from rc startup

Starting Shepherd with a simple shell background & was not sufficient on the real boot path. The process could exit when the invoking shell/session disappeared, which made the ready marker appear transiently while Shepherd itself did not remain up.

The fix was to launch Shepherd through FreeBSD daemon(8):

  • /usr/sbin/daemon -c -f -p "$pidfile" -o /var/log/shepherd-bootstrap.out ...

This gave the guest a stable long-lived Shepherd daemon process and made onestatus/socket checks reliable.

4. The initial Shepherd config used helper APIs that were not actually present in the guest runtime

The generated Shepherd config originally used:

  • mkdir-p
  • call-with-output-file with #:append

Those choices were too optimistic for the minimal Scheme environment being staged.

The fix was to replace them with simpler portable logic:

  • a local recursive directory-creation helper based on mkdir
  • explicit append-mode logging via open-file "a"

5. The XCP-ng harness itself had an SSH-key bug

The first end-to-end rerun of tests/system/run-phase9-xcpng-boot.sh failed because it used the public key file as the SSH identity file.

The harness now distinguishes:

  • ROOT_AUTHORIZED_KEY_FILE for guest key injection
  • ROOT_SSH_PRIVATE_KEY_FILE for the host-side SSH login

with the private key defaulting to:

  • ~/.ssh/id_ed25519

Code-level changes that closed the blocker

modules/fruix/packages/freebsd.scm

Extended the minimal runtime again to support the final ready-state path:

  • staged /usr/sbin/daemon
  • staged /usr/share/locale/C.UTF-8/LC_CTYPE

modules/fruix/system/freebsd.scm

Completed the Guile/Shepherd guest runtime integration by:

  • generating activation that recreates compatibility symlinks from the historical build prefixes to the real /frx/store items
  • exporting locale and Guile runtime path variables in the Shepherd rc script:
    • LANG
    • LC_ALL
    • GUILE_SYSTEM_PATH
    • GUILE_SYSTEM_COMPILED_PATH
    • GUILE_SYSTEM_EXTENSIONS_PATH
    • the existing site path variables
  • starting Shepherd through daemon(8) instead of a fragile shell background job
  • fixing the generated Shepherd config so the logger and ready-marker services work in the guest
  • exposing the staged locale data into the rootfs via /usr/share/locale

tests/system/run-phase9-xcpng-boot.sh

Improved the real XCP-ng harness by:

  • separating the injected public key from the SSH private key actually used for login
  • preserving the successful passing metadata for the full ready-marker path

Validation details

Local reproduction and verification

Before re-testing on XCP-ng, the failure was reproduced and fixed in two faster environments:

  1. a host-side chroot into the generated image root partition
  2. local QEMU/TCG boots with UEFI and SSH forwarding

That produced a much tighter debug loop for:

  • locale staging,
  • baked-prefix compatibility,
  • and Shepherd daemon lifetime.

Real XCP-ng validation

The final proof remained the real VM.

The passing XCP-ng run verified all of the following from the booted guest over SSH:

  • cat /var/lib/fruix/ready returns ready
  • /usr/local/etc/rc.d/fruix-shepherd onestatus succeeds
  • service sshd onestatus succeeds
  • readlink /run/current-system matches the generated Fruix closure
  • /home/operator exists

Assessment against Phase 9 goals

9.1 deterministic ready state

Satisfied on the active XCP-ng track.

The guest now boots to a deterministic ready marker:

  • /var/lib/fruix/ready

9.2 in-guest Shepherd and core-service validation

Satisfied on the active XCP-ng track.

The guest now validates:

  • Shepherd active
  • generated configuration in effect
  • system closure mounted through /run/current-system
  • sshd available for remote operator access

9.3 minimal operator usability

Satisfied on the active XCP-ng track.

A human operator can now:

  • discover the DHCP address,
  • log in over SSH with the injected root key,
  • inspect /run/current-system,
  • inspect the ready marker,
  • and inspect Shepherd/log state in the guest.

Conclusion

Phase 9 is complete for the current FreeBSD prototype track, using the active XCP-ng replacement path in place of unavailable local bhyve.

The Fruix image now boots as a real FreeBSD VM, reaches the generated ready state, runs Shepherd successfully, and supports a minimal operator workflow over SSH.