# Phase 9 checkpoint: XCP-ng boot reached DHCP and SSH on FreeBSD Date: 2026-04-02 ## Goal Advance Phase 9 from a static image-generation milestone to a real booted Fruix guest on the active FreeBSD/XCP-ng track, using the operator-approved VM: - VM: `90490f2e-e8fc-4b7a-388e-5c26f0157289` - existing target VDI: `0f1f90d3-48ca-4fa2-91d8-fc6339b95743` The immediate objective for this checkpoint was narrower than full Phase 9 completion: - boot the generated image under XCP-ng, - obtain DHCP, - and reach SSH access with the injected root key. ## Summary This checkpoint succeeded. The current Fruix FreeBSD image now: - boots on the target XCP-ng VM, - mounts the generated root filesystem, - completes enough of FreeBSD `rc` startup to configure networking, - obtains a DHCP lease on the Xen NIC, - starts `sshd`, - and accepts root public-key authentication over the network. Validated guest details from the successful XCP-ng boot: - guest IP: `192.168.213.62` - hostname: `fruix-freebsd` - kernel string: - `FreeBSD 15.0-STABLE stable/15-n282801-29dce45d8c50 GENERIC amd64` Representative successful SSH validation output: ```text FreeBSD fruix-freebsd 15.0-STABLE FreeBSD 15.0-STABLE stable/15-n282801-29dce45d8c50 GENERIC amd64 fruix-freebsd 192.168.213.62 ``` Successful XCP-ng work directory: - `/tmp/phase9-xcpng-ssh-1775097470` ## Important boot/debugging findings The first decisive breakthrough came from running the generated image locally under QEMU/TCG with serial capture. That made the previously opaque early-boot failure visible. ### 1. The original early boot abort was not an XCP-ng image-format problem anymore After the earlier switch from raw uploads to dynamic VHD uploads, the remaining boot failure was inside the guest boot process, not in the XO import path. ### 2. FreeBSD `fstab` handling for pseudo-filesystems was wrong The serial log showed that boot aborted during filesystem checks because the generated `fstab` gave non-zero fsck fields to non-UFS mounts such as `devfs`. Representative failure: ```text Starting file system checks: /dev/gpt/fruix-root: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/gpt/fruix-root: clean, ... fsck: exec fsck_devfs for devfs in /sbin:/usr/sbin: No such file or directory Unknown error 1; help! ERROR: ABORTING BOOT (sending SIGTERM to parent)! ``` The fix was to generate fsck pass fields only for UFS entries and emit `0 0` for pseudo-filesystems. ### 3. The minimal image was still missing many base files and commands expected by `rc` Once `rc` ran further, QEMU serial logs exposed a long tail of missing runtime pieces that had not been visible from the earlier static validations alone. Examples included: - missing base commands: - `dd` - `expr` - `rmdir` - `sort` - `mktemp` - `egrep` - `fsync` - `kldload` - `kldstat` - `devfs` - `devctl` - `newsyslog` - `ip6addrctl` - missing base config files: - `/etc/network.subr` - `/etc/devd.conf` - `/etc/newsyslog.conf` - `/etc/syslog.conf` - missing runtime directories: - `/var/db` - `/var/cron` - missing libraries needed by later boot helpers: - `libgeom.so.5` - `libdevctl.so.5` - `libcap_net.so.1` - C++ runtime pieces used by `devd` These were staged into the current FreeBSD package layer and linked into the generated rootfs. ### 4. SSH auth initially failed because the image relied on PAM without a complete PAM runtime/configuration `sshd` would start, but root public-key authentication still failed. A direct in-guest debug run showed: ```text PAM: initialisation failed ``` For the minimal Phase 9 guest, the practical fix was to make the generated `sshd_config` use: - `UsePAM no` while still keeping key-only login enabled. That was sufficient to unlock real SSH access on both the local QEMU debug guest and the XCP-ng guest. ## Current code-level outcomes The current checkpoint work materially expanded the minimal FreeBSD runtime staged into Fruix images. Highlights: - `modules/fruix/packages/freebsd.scm` - added dedicated runtime packages for: - `freebsd-networking` - `freebsd-openssh` - expanded staged base runtime coverage substantially for `rc`, networking, and SSH - added required config files and shared libraries used during real boot - `modules/fruix/system/freebsd.scm` - added root authorized-key support to the operating-system model - generated static account databases and supporting files: - `/etc/passwd` - `/etc/master.passwd` - `/etc/group` - `/etc/login.conf` - `/etc/ttys` - activation now runs: - `cap_mkdb` - `pwd_mkdb` - activation creates required directories and SSH host keys - generated `sshd_config` now disables PAM for the current minimal key-only Phase 9 path - `fstab` generation now avoids fsck pass numbers for pseudo-filesystems - rootfs generation now links the additional `/etc` files needed by real boot - `tests/system/phase9-minimal-operating-system.scm.in` - enables DHCP on the relevant NIC names for the current tracks: - `xn0` - `em0` - `vtnet0` - injects the root authorized key - includes the SSH/network runtime packages and required system users/groups - `tests/system/run-phase8-system-image.sh` - now accepts `OS_FILE` - now accepts/passes `DISK_CAPACITY` - serial-console validation was relaxed from an exact loader string to a `comconsole` presence check ## Verified current state The current validated Phase 9 state is: - XCP-ng VHD upload path works against the existing VDI - the guest boots far enough for normal `rc` networking and `sshd` - DHCP works on the Xen NIC - SSH key injection works - root login over SSH works This means the project has crossed an important Phase 9 boundary: - the first boot validation no longer depends on local bhyve serial automation, - and the real XCP-ng target can now be exercised over the network. ## Remaining blocker Phase 9 is not complete yet because the Fruix-specific readiness path still fails. Current remaining blocker: - Guile still crashes in the guest - therefore `fruix-shepherd` does not start - therefore `/var/lib/fruix/ready` is still absent Representative guest evidence: ```text pid 262 (guile), jid 0, uid 0: exited on signal 11 (core dumped) ``` Over SSH on the real XCP-ng guest: - `sshd` is running - DHCP is active - `fruix-shepherd` is stopped - `/var/lib/fruix/ready` is missing A retrieved core dump and local `lldb` analysis show the Guile crash occurs extremely early during initialization, in the locale/string conversion path while building Guile load/build info. This remains the next debugging target. ## Assessment This checkpoint satisfies a meaningful Phase 9 intermediate milestone on the active FreeBSD/XCP-ng track: - the generated Fruix image now boots as a network-reachable FreeBSD guest, - and minimal operator access via SSH is working. However, the full Fruix boot milestone is still blocked by in-guest Guile/Shepherd failure, so the overall Phase 9 milestone remains open.