Complete Phase 9 Fruix boot on XCP-ng

This commit is contained in:
2026-04-02 09:11:50 +02:00
parent 4b69118d06
commit 43677ffd78
5 changed files with 376 additions and 16 deletions

View File

@@ -2217,3 +2217,77 @@ Next recommended step:
- deterministic boot readiness
- in-guest Shepherd validation
- minimal operator usability
## 2026-04-02 — Phase 9 completed on the active XCP-ng FreeBSD track
Completed work:
- resolved the in-guest Guile/Shepherd blocker that remained after the earlier DHCP+SSH checkpoint
- wrote the completion report:
- `docs/reports/phase9-xcpng-ready-boot-freebsd.md`
- extended the staged runtime again in:
- `modules/fruix/packages/freebsd.scm`
- added `/usr/sbin/daemon`
- added `/usr/share/locale/C.UTF-8/LC_CTYPE`
- completed the guest runtime integration in:
- `modules/fruix/system/freebsd.scm`
- activation now recreates compatibility symlinks for the currently locally built Guile / guile-extra / Shepherd prefixes, but points them at the real `/frx/store` items in the guest
- the rootfs now exposes `/usr/share/locale`
- the generated Shepherd config no longer relies on missing `mkdir-p` or unsupported `call-with-output-file #:append` behavior
- the Shepherd rc script now exports `LANG=C.UTF-8` / `LC_ALL=C.UTF-8`
- the Shepherd rc script now exports explicit Guile system/site path variables
- Shepherd is now started through FreeBSD `daemon(8)` so it remains alive after rc/session teardown
- corrected the XCP-ng harness in:
- `tests/system/run-phase9-xcpng-boot.sh`
- it now uses a distinct SSH private key file for login instead of incorrectly trying to authenticate with the public key file
Important findings:
- the original guest Guile failure had multiple layers:
- missing UTF-8 locale data in the image
- baked-in temporary install-prefix references inside the copied Guile / guile-extra / Shepherd artifacts
- and Shepherd process lifetime issues caused by a fragile shell-background startup path
- reproducing the problem in a host-side chroot into the generated image root partition made the final debugging loop much tighter than repeated full VM imports alone
- after locale staging and compatibility-prefix recreation, Guile and Shepherd became runnable in the guest, but Shepherd still exited too early on the real boot path until it was launched via `daemon(8)`
- after those fixes, the full ready-marker path became reliable enough for end-to-end XCP-ng validation
Final validation:
- `tests/system/run-phase9-xcpng-boot.sh` now passes end-to-end against:
- VM `90490f2e-e8fc-4b7a-388e-5c26f0157289`
- VDI `0f1f90d3-48ca-4fa2-91d8-fc6339b95743`
- passing run workdir:
- `/tmp/phase9-xcpng-pass-1775113189`
- passing real-guest metadata confirmed:
- `ready_marker=ready`
- `shepherd_status=running`
- `sshd_status=running`
- `run_current_system_target=/frx/store/0fe459ea22156510e64cea794b7a001151b59625bd5f12a488d6851e1c6d2198-fruix-system-fruix-freebsd`
- `operator_home_listing=/home/operator`
- `logger_log=fruix-shepherd-started`
Current assessment:
- Phase 9 is now complete for the active FreeBSD prototype track, using the XCP-ng replacement path adopted for this environment
- the generated Fruix image now reaches all currently required first-boot milestones on the real VM:
- kernel boot: yes
- root mount: yes
- DHCP: yes
- SSH: yes
- Shepherd: yes
- ready marker: yes
- minimal operator usability: yes
- this establishes the first real Fruix-on-FreeBSD VM that:
- boots from the declaratively generated image,
- reaches the generated ready state,
- keeps Shepherd running,
- and remains inspectable over SSH as a minimally usable system
Next recommended step:
1. begin the next post-Phase-9 cleanup/native-integration pass from `docs/PLAN_2.md` Optional Phase 10
2. prioritize replacing the current compatibility shims for locally built Guile / Shepherd prefixes with a more native store-path-aware Fruix runtime arrangement
3. clean up remaining non-fatal boot noise observed during Phase 9, such as:
- login-class warnings around `daemon`
- the `gpart: Unknown command: show` rc noise
- residual syslog/cron/runtime polish issues where they still matter

View File

@@ -0,0 +1,227 @@
# Phase 9 completion: Fruix FreeBSD reached the ready marker on XCP-ng
Date: 2026-04-02
## Goal
Complete the first minimal Phase 9 boot milestone on the active FreeBSD track.
Because local bhyve is unavailable in this Xen environment, the active validation target remained the operator-approved XCP-ng VM and its existing VDI:
- VM: `90490f2e-e8fc-4b7a-388e-5c26f0157289`
- VDI: `0f1f90d3-48ca-4fa2-91d8-fc6339b95743`
The required Phase 9 outcomes for this completion step were:
- boot the generated Fruix image on the real VM,
- reach the generated ready marker,
- keep Shepherd running in the guest,
- keep SSH available for operator access,
- and validate the declared system closure from inside the guest.
## Result
This phase now succeeds on the active XCP-ng path.
`tests/system/run-phase9-xcpng-boot.sh` now passes end-to-end and verifies:
- boot on the real XCP-ng VM,
- DHCP on the guest NIC,
- root SSH access via the injected key,
- `/run/current-system` pointing at the generated Fruix closure under `/frx/store`,
- the ready marker at `/var/lib/fruix/ready`,
- `fruix-shepherd` running,
- `sshd` running,
- and a minimal operator-facing home directory for the declared `operator` account.
Successful run metadata:
- workdir: `/tmp/phase9-xcpng-pass-1775113189`
- guest IP: `192.168.213.62`
- closure path:
- `/frx/store/0fe459ea22156510e64cea794b7a001151b59625bd5f12a488d6851e1c6d2198-fruix-system-fruix-freebsd`
- image path:
- `/frx/store/73f5757f8b58cf15fd97fc9a9704664d4b1d390d547fffff68c129a85d6cc368-fruix-bhyve-image-fruix-freebsd/disk.img`
Representative successful metadata values from the passing XCP-ng run:
```text
ready_marker=ready
run_current_system_target=/frx/store/0fe459ea22156510e64cea794b7a001151b59625bd5f12a488d6851e1c6d2198-fruix-system-fruix-freebsd
shepherd_status=running
sshd_status=running
operator_home_listing=/home/operator
uname_output=FreeBSD 15.0-STABLE
logger_log=fruix-shepherd-started
```
## Root causes resolved
The remaining Phase 9 blocker turned out not to be a single Guile bug, but a chain of runtime integration gaps.
### 1. Guile needed a usable UTF-8 locale in the guest
With the minimal image as originally staged, guest Guile started in a plain `C` locale and crashed very early in locale-string conversion paths.
The fix for the current prototype track was to stage the minimal locale data needed for `C.UTF-8`:
- `/usr/share/locale/C.UTF-8/LC_CTYPE`
and to start Shepherd with:
- `LANG=C.UTF-8`
- `LC_ALL=C.UTF-8`
### 2. The copied Guile / Shepherd runtimes still contained baked-in source-prefix paths
The locally built Guile, guile-gnutls/fibers, and Shepherd artifacts were originally installed under temporary validation prefixes such as:
- `/tmp/guile-freebsd-validate-install`
- `/tmp/guile-gnutls-freebsd-validate-install`
- `/tmp/shepherd-freebsd-validate-install`
Even after those trees were copied into `/frx/store`, a number of runtime references still pointed back to the original prefixes:
- Guile system load paths compiled into `libguile`
- Shepherd launcher scripts
- Fibers and GnuTLS Scheme modules
- Shepherd configuration module paths
The immediate prototype fix was to make activation recreate compatibility symlinks at those original prefixes, but pointing at the actual store items in the guest.
This keeps the running system store-backed while unblocking the existing locally built Guile/Shepherd artifacts.
### 3. The Shepherd process needed to be detached correctly from rc startup
Starting Shepherd with a simple shell background `&` was not sufficient on the real boot path. The process could exit when the invoking shell/session disappeared, which made the ready marker appear transiently while Shepherd itself did not remain up.
The fix was to launch Shepherd through FreeBSD `daemon(8)`:
- `/usr/sbin/daemon -c -f -p "$pidfile" -o /var/log/shepherd-bootstrap.out ...`
This gave the guest a stable long-lived Shepherd daemon process and made `onestatus`/socket checks reliable.
### 4. The initial Shepherd config used helper APIs that were not actually present in the guest runtime
The generated Shepherd config originally used:
- `mkdir-p`
- `call-with-output-file` with `#:append`
Those choices were too optimistic for the minimal Scheme environment being staged.
The fix was to replace them with simpler portable logic:
- a local recursive directory-creation helper based on `mkdir`
- explicit append-mode logging via `open-file "a"`
### 5. The XCP-ng harness itself had an SSH-key bug
The first end-to-end rerun of `tests/system/run-phase9-xcpng-boot.sh` failed because it used the public key file as the SSH identity file.
The harness now distinguishes:
- `ROOT_AUTHORIZED_KEY_FILE` for guest key injection
- `ROOT_SSH_PRIVATE_KEY_FILE` for the host-side SSH login
with the private key defaulting to:
- `~/.ssh/id_ed25519`
## Code-level changes that closed the blocker
### `modules/fruix/packages/freebsd.scm`
Extended the minimal runtime again to support the final ready-state path:
- staged `/usr/sbin/daemon`
- staged `/usr/share/locale/C.UTF-8/LC_CTYPE`
### `modules/fruix/system/freebsd.scm`
Completed the Guile/Shepherd guest runtime integration by:
- generating activation that recreates compatibility symlinks from the historical build prefixes to the real `/frx/store` items
- exporting locale and Guile runtime path variables in the Shepherd rc script:
- `LANG`
- `LC_ALL`
- `GUILE_SYSTEM_PATH`
- `GUILE_SYSTEM_COMPILED_PATH`
- `GUILE_SYSTEM_EXTENSIONS_PATH`
- the existing site path variables
- starting Shepherd through `daemon(8)` instead of a fragile shell background job
- fixing the generated Shepherd config so the logger and ready-marker services work in the guest
- exposing the staged locale data into the rootfs via `/usr/share/locale`
### `tests/system/run-phase9-xcpng-boot.sh`
Improved the real XCP-ng harness by:
- separating the injected public key from the SSH private key actually used for login
- preserving the successful passing metadata for the full ready-marker path
## Validation details
### Local reproduction and verification
Before re-testing on XCP-ng, the failure was reproduced and fixed in two faster environments:
1. a host-side chroot into the generated image root partition
2. local QEMU/TCG boots with UEFI and SSH forwarding
That produced a much tighter debug loop for:
- locale staging,
- baked-prefix compatibility,
- and Shepherd daemon lifetime.
### Real XCP-ng validation
The final proof remained the real VM.
The passing XCP-ng run verified all of the following from the booted guest over SSH:
- `cat /var/lib/fruix/ready` returns `ready`
- `/usr/local/etc/rc.d/fruix-shepherd onestatus` succeeds
- `service sshd onestatus` succeeds
- `readlink /run/current-system` matches the generated Fruix closure
- `/home/operator` exists
## Assessment against Phase 9 goals
### 9.1 deterministic ready state
Satisfied on the active XCP-ng track.
The guest now boots to a deterministic ready marker:
- `/var/lib/fruix/ready`
### 9.2 in-guest Shepherd and core-service validation
Satisfied on the active XCP-ng track.
The guest now validates:
- Shepherd active
- generated configuration in effect
- system closure mounted through `/run/current-system`
- `sshd` available for remote operator access
### 9.3 minimal operator usability
Satisfied on the active XCP-ng track.
A human operator can now:
- discover the DHCP address,
- log in over SSH with the injected root key,
- inspect `/run/current-system`,
- inspect the ready marker,
- and inspect Shepherd/log state in the guest.
## Conclusion
Phase 9 is complete for the current FreeBSD prototype track, using the active XCP-ng replacement path in place of unavailable local bhyve.
The Fruix image now boots as a real FreeBSD VM, reaches the generated ready state, runs Shepherd successfully, and supports a minimal operator workflow over SSH.

View File

@@ -268,6 +268,7 @@ experiments."
#:license 'bsd-2
#:install-plan
'((file "/sbin/adjkerntz" "sbin/adjkerntz")
(file "/usr/sbin/daemon" "usr/sbin/daemon")
(file "/sbin/devd" "sbin/devd")
(file "/sbin/devmatch" "sbin/devmatch")
(file "/sbin/dmesg" "sbin/dmesg")
@@ -300,7 +301,8 @@ experiments."
(file "/sbin/devfs" "sbin/devfs")
(file "/bin/freebsd-version" "bin/freebsd-version")
(file "/bin/hostname" "bin/hostname")
(file "/bin/kenv" "bin/kenv"))))
(file "/bin/kenv" "bin/kenv")
(file "/usr/share/locale/C.UTF-8/LC_CTYPE" "usr/share/locale/C.UTF-8/LC_CTYPE"))))
(define freebsd-networking
(freebsd-package

View File

@@ -543,7 +543,7 @@
"PidFile /var/run/sshd.pid\n"
"UseDNS no\n"))
(define (render-activation-script os)
(define* (render-activation-script os #:key guile-store guile-extra-store shepherd-store)
(let* ((users (operating-system-users os))
(groups (operating-system-groups os))
(home-setup
@@ -563,6 +563,23 @@
uid gid home)))))
users)
""))
(compat-prefixes
(string-append
(if guile-store
(string-append
"rm -rf /tmp/guile-freebsd-validate-install\n"
"ln -s " guile-store " /tmp/guile-freebsd-validate-install\n")
"")
(if guile-extra-store
(string-append
"rm -rf /tmp/guile-gnutls-freebsd-validate-install\n"
"ln -s " guile-extra-store " /tmp/guile-gnutls-freebsd-validate-install\n")
"")
(if shepherd-store
(string-append
"rm -rf /tmp/shepherd-freebsd-validate-install\n"
"ln -s " shepherd-store " /tmp/shepherd-freebsd-validate-install\n")
"")))
(ssh-section
(string-append
"mkdir -p /var/empty /etc/ssh /root/.ssh\n"
@@ -581,6 +598,7 @@
"if [ -x /usr/bin/cap_mkdb ] && [ -f /etc/login.conf ]; then /usr/bin/cap_mkdb /etc/login.conf; fi\n"
"if [ -x /usr/sbin/pwd_mkdb ] && [ -f /etc/master.passwd ]; then /usr/sbin/pwd_mkdb -p /etc/master.passwd; fi\n"
home-setup
compat-prefixes
ssh-section)))
(define (render-shepherd-config os)
@@ -589,18 +607,23 @@
"(use-modules (shepherd service)\n"
" (ice-9 ftw))\n\n"
"(define ready-marker \"" ready-marker "\")\n\n"
"(define (mkdir-p* dir)\n"
" (unless (or (string=? dir \"\")\n"
" (string=? dir \"/\")\n"
" (file-exists? dir))\n"
" (mkdir-p* (dirname dir))\n"
" (mkdir dir)))\n\n"
"(define (ensure-parent-directory file)\n"
" (mkdir-p (dirname file)))\n\n"
" (mkdir-p* (dirname file)))\n\n"
"(register-services\n"
" (list\n"
" (service '(fruix-logger)\n"
" #:documentation \"Append a boot trace line for Fruix.\"\n"
" #:start (lambda _\n"
" (ensure-parent-directory \"/var/log/fruix-shepherd.log\")\n"
" (call-with-output-file \"/var/log/fruix-shepherd.log\"\n"
" (lambda (port)\n"
" (display \"fruix-shepherd-started\\n\" port))\n"
" #:append #t)\n"
" (let ((port (open-file \"/var/log/fruix-shepherd.log\" \"a\")))\n"
" (display \"fruix-shepherd-started\\n\" port)\n"
" (close-port port))\n"
" #t)\n"
" #:stop (lambda _ #f)\n"
" #:respawn? #f)\n"
@@ -639,11 +662,20 @@
(define (render-rc-script shepherd-store guile-store guile-extra-store)
(let ((ld-library-path (string-append guile-extra-store "/lib:"
guile-store "/lib:/usr/local/lib"))
(guile-system-path
(string-append guile-store "/share/guile/3.0:"
guile-store "/share/guile/site/3.0:"
guile-store "/share/guile/site:"
guile-store "/share/guile"))
(guile-load-path (string-append shepherd-store "/share/guile/site/3.0:"
guile-extra-store "/share/guile/site/3.0"))
(guile-system-compiled-path
(string-append guile-store "/lib/guile/3.0/ccache:"
guile-store "/lib/guile/3.0/site-ccache"))
(guile-load-compiled-path
(string-append shepherd-store "/lib/guile/3.0/site-ccache:"
guile-extra-store "/lib/guile/3.0/site-ccache"))
(guile-system-extensions-path (string-append guile-store "/lib/guile/3.0/extensions"))
(guile-extensions-path (string-append guile-extra-store "/lib/guile/3.0/extensions")))
(string-append
"#!/bin/sh\n"
@@ -665,11 +697,16 @@
"status_cmd=fruix_shepherd_status\n\n"
"fruix_shepherd_start()\n"
"{\n"
" env LD_LIBRARY_PATH='" ld-library-path "' \\\n"
" /usr/sbin/daemon -c -f -p \"$pidfile\" -o /var/log/shepherd-bootstrap.out /usr/bin/env \\\n"
" LANG='C.UTF-8' LC_ALL='C.UTF-8' \\\n"
" LD_LIBRARY_PATH='" ld-library-path "' \\\n"
" GUILE_SYSTEM_PATH='" guile-system-path "' \\\n"
" GUILE_LOAD_PATH='" guile-load-path "' \\\n"
" GUILE_SYSTEM_COMPILED_PATH='" guile-system-compiled-path "' \\\n"
" GUILE_LOAD_COMPILED_PATH='" guile-load-compiled-path "' \\\n"
" GUILE_SYSTEM_EXTENSIONS_PATH='" guile-system-extensions-path "' \\\n"
" GUILE_EXTENSIONS_PATH='" guile-extensions-path "' \\\n"
" " guile-store "/bin/guile --no-auto-compile " shepherd-store "/bin/shepherd -I -s \"$socket\" -c \"$config\" --pid=\"$pidfile\" -l \"$logfile\" >/var/log/shepherd-bootstrap.out 2>&1 &\n"
" " guile-store "/bin/guile --no-auto-compile " shepherd-store "/bin/shepherd -I -s \"$socket\" -c \"$config\" -l \"$logfile\"\n"
" for _try in 1 2 3 4 5 6 7 8 9 10; do\n"
" [ -f \"$pidfile\" ] && [ -S \"$socket\" ] && return 0\n"
" sleep 1\n"
@@ -678,9 +715,13 @@
"}\n\n"
"fruix_shepherd_stop()\n"
"{\n"
" env LD_LIBRARY_PATH='" ld-library-path "' \\\n"
" env LANG='C.UTF-8' LC_ALL='C.UTF-8' \\\n"
" LD_LIBRARY_PATH='" ld-library-path "' \\\n"
" GUILE_SYSTEM_PATH='" guile-system-path "' \\\n"
" GUILE_LOAD_PATH='" guile-load-path "' \\\n"
" GUILE_SYSTEM_COMPILED_PATH='" guile-system-compiled-path "' \\\n"
" GUILE_LOAD_COMPILED_PATH='" guile-load-compiled-path "' \\\n"
" GUILE_SYSTEM_EXTENSIONS_PATH='" guile-system-extensions-path "' \\\n"
" GUILE_EXTENSIONS_PATH='" guile-extensions-path "' \\\n"
" " guile-store "/bin/guile --no-auto-compile " shepherd-store "/bin/herd -s \"$socket\" stop root >/dev/null 2>&1 || true\n"
" for _try in 1 2 3 4 5 6 7 8 9 10; do\n"
@@ -698,7 +739,7 @@
"load_rc_config $name\n"
"run_rc_command \"$1\"\n")))
(define (operating-system-generated-files os)
(define* (operating-system-generated-files os #:key guile-store guile-extra-store shepherd-store)
(append
`(("boot/loader.conf" . ,(render-loader-conf (operating-system-loader-entries os)))
("etc/rc.conf" . ,(render-rc.conf os))
@@ -711,7 +752,10 @@
("etc/shells" . ,(render-shells os))
("etc/motd" . ,(render-motd os))
("etc/ttys" . ,(render-ttys))
("activate" . ,(render-activation-script os))
("activate" . ,(render-activation-script os
#:guile-store guile-store
#:guile-extra-store guile-extra-store
#:shepherd-store shepherd-store))
("shepherd/init.scm" . ,(render-shepherd-config os)))
(if (sshd-enabled? os)
`(("etc/ssh/sshd_config" . ,(render-sshd-config os)))
@@ -814,7 +858,10 @@
#:extra-files (append guile-runtime-extra-files
guile-extra-runtime-files)))
(shepherd-store (materialize-prefix shepherd-prefix "fruix-shepherd-runtime" "1.0.9" store-dir))
(generated-files (append (operating-system-generated-files os)
(generated-files (append (operating-system-generated-files os
#:guile-store guile-store
#:guile-extra-store guile-extra-store
#:shepherd-store shepherd-store)
`(("usr/local/etc/rc.d/fruix-activate"
. ,(render-activation-rc-script))
("usr/local/etc/rc.d/fruix-shepherd"
@@ -896,7 +943,7 @@
(mkdir-p rootfs)
(for-each (lambda (dir)
(mkdir-p (string-append rootfs dir)))
'("/run" "/boot" "/etc" "/etc/ssh" "/usr" "/usr/local" "/usr/local/etc"
'("/run" "/boot" "/etc" "/etc/ssh" "/usr" "/usr/share" "/usr/local" "/usr/local/etc"
"/usr/local/etc/rc.d" "/var" "/var/cron" "/var/db" "/var/lib" "/var/lib/fruix"
"/var/log" "/var/run" "/tmp" "/dev" "/root" "/home"))
(chmod (string-append rootfs "/tmp") #o1777)
@@ -910,6 +957,9 @@
(symlink-force (string-append "/run/current-system/profile/usr/" dir)
(string-append rootfs "/usr/" dir)))
'("bin" "lib" "sbin" "libexec"))
(when (file-exists? (string-append closure-path "/profile/usr/share/locale"))
(symlink-force "/run/current-system/profile/usr/share/locale"
(string-append rootfs "/usr/share/locale")))
(for-each (lambda (path)
(symlink-force (string-append "/run/current-system/profile/etc/" path)
(string-append rootfs "/etc/" path)))

View File

@@ -5,6 +5,7 @@ repo_root=$(CDPATH= cd -- "$(dirname "$0")/../.." && pwd)
vm_id=90490f2e-e8fc-4b7a-388e-5c26f0157289
metadata_target=${METADATA_OUT:-}
root_authorized_key_file=${ROOT_AUTHORIZED_KEY_FILE:-$HOME/.ssh/id_ed25519.pub}
root_ssh_private_key_file=${ROOT_SSH_PRIVATE_KEY_FILE:-$HOME/.ssh/id_ed25519}
requested_disk_capacity=${DISK_CAPACITY:-}
cleanup=0
@@ -42,6 +43,10 @@ trap cleanup_workdir EXIT INT TERM
echo "missing root authorized key file: $root_authorized_key_file" >&2
exit 1
}
[ -f "$root_ssh_private_key_file" ] || {
echo "missing root SSH private key file: $root_ssh_private_key_file" >&2
exit 1
}
root_authorized_key=$(tr -d '\n' < "$root_authorized_key_file")
# Discover the existing target VDI attached as disk 0 for the operator-provided VM.
@@ -96,7 +101,7 @@ host_ip=$(ifconfig "$host_interface" | awk '/inet /{print $2; exit}')
subnet_prefix=${host_ip%.*}
ssh_guest() {
ssh -i "$root_authorized_key_file" \
ssh -i "$root_ssh_private_key_file" \
-o BatchMode=yes \
-o StrictHostKeyChecking=no \
-o UserKnownHostsFile=/dev/null \
@@ -118,7 +123,7 @@ for attempt in $(jot 90 1 90); do
guest_ip=$(arp -an | awk -v mac="$vm_mac" 'tolower($4)==mac {gsub(/[()]/,"",$2); print $2; exit}')
fi
if [ -n "$guest_ip" ]; then
if ssh -i "$root_authorized_key_file" \
if ssh -i "$root_ssh_private_key_file" \
-o BatchMode=yes \
-o StrictHostKeyChecking=no \
-o UserKnownHostsFile=/dev/null \
@@ -187,6 +192,8 @@ operator_home_listing=$operator_home_listing
activate_preview=$activate_preview
boot_backend=xcp-ng-xo-cli
operator_access=ssh-root-key
root_authorized_key_file=$root_authorized_key_file
root_ssh_private_key_file=$root_ssh_private_key_file
EOF
if [ -n "$metadata_target" ]; then