Enable Fruix FreeBSD guest SSH boot on XCP-ng

This commit is contained in:
2026-04-02 07:34:51 +02:00
parent d465264b5e
commit 4b69118d06
9 changed files with 988 additions and 83 deletions

View File

@@ -2154,3 +2154,66 @@ Next recommended step:
- Fruix at the product boundary
- `/frx` as the canonical store root
- stable upstream-derived internal names unless there is strong architectural value in renaming them
## 2026-04-02 — Phase 9 checkpoint: XCP-ng guest reached DHCP and SSH
Completed work:
- added a dedicated Phase 9 XCP-ng operating-system template:
- `tests/system/phase9-minimal-operating-system.scm.in`
- added an XCP-ng boot/import/validation harness:
- `tests/system/run-phase9-xcpng-boot.sh`
- extended the staged FreeBSD runtime and system-generation layers so the guest can complete enough of real boot for network access:
- `modules/fruix/packages/freebsd.scm`
- `modules/fruix/system/freebsd.scm`
- updated the integrated image-generation path for Phase 9 use cases:
- `tests/system/materialize-phase8-system-image.scm`
- `tests/system/run-phase8-system-image.sh`
- wrote the checkpoint report:
- `docs/reports/phase9-xcpng-ssh-boot-freebsd.md`
Important findings:
- a decisive local QEMU/TCG serial-boot pass exposed the first real early-boot blocker:
- the generated `fstab` was wrong for pseudo-filesystems, so `rc` tried to fsck `devfs` and aborted boot
- after fixing `fstab`, later serial logs exposed additional FreeBSD base runtime gaps that only appear during real boot, including missing commands, runtime directories, and base config files used by `rc`, DHCP, logging, and service startup
- the staged image now includes the minimum currently known set of FreeBSD runtime pieces needed to:
- run `rc`
- obtain DHCP
- generate SSH host keys
- start `sshd`
- public-key SSH login initially still failed because the minimal guest did not stage a complete PAM runtime/config path; for the current Phase 9 prototype track, the generated `sshd_config` now uses:
- `UsePAM no`
- the current XCP-ng validation path succeeded against the operator-approved VM and existing VDI only:
- VM `90490f2e-e8fc-4b7a-388e-5c26f0157289`
- VDI `0f1f90d3-48ca-4fa2-91d8-fc6339b95743`
- the successful XCP-ng boot obtained:
- guest IP `192.168.213.62`
- successful SSH validation on the real guest confirmed:
- `hostname=fruix-freebsd`
- `sshd` is reachable with the injected root key
- networking is configured on the Xen NIC
Current assessment:
- this checkpoint establishes the first real network-reachable Fruix boot on the active FreeBSD/XCP-ng track
- the generated image now boots far enough for DHCP and SSH, which closes the earlier uncertainty about whether the Phase 8 image could become a remotely usable guest at all
- Phase 9 is still not complete because the Fruix-specific readiness path remains blocked:
- `fruix-shepherd` does not start
- `/var/lib/fruix/ready` is still missing
- Guile still crashes in the guest with `signal 11`
- therefore the current state is:
- kernel boot: yes
- root mount: yes
- DHCP: yes
- SSH: yes
- Shepherd/ready marker: not yet
Next recommended step:
1. continue the in-guest Guile crash investigation so `fruix-shepherd` can start on the booted guest
2. once Shepherd is stable, rerun `tests/system/run-phase9-xcpng-boot.sh` to validate the full ready-marker path end-to-end
3. then close Phase 9 with updated report/progress entries for:
- deterministic boot readiness
- in-guest Shepherd validation
- minimal operator usability

View File

@@ -0,0 +1,213 @@
# Phase 9 checkpoint: XCP-ng boot reached DHCP and SSH on FreeBSD
Date: 2026-04-02
## Goal
Advance Phase 9 from a static image-generation milestone to a real booted Fruix guest on the active FreeBSD/XCP-ng track, using the operator-approved VM:
- VM: `90490f2e-e8fc-4b7a-388e-5c26f0157289`
- existing target VDI: `0f1f90d3-48ca-4fa2-91d8-fc6339b95743`
The immediate objective for this checkpoint was narrower than full Phase 9 completion:
- boot the generated image under XCP-ng,
- obtain DHCP,
- and reach SSH access with the injected root key.
## Summary
This checkpoint succeeded.
The current Fruix FreeBSD image now:
- boots on the target XCP-ng VM,
- mounts the generated root filesystem,
- completes enough of FreeBSD `rc` startup to configure networking,
- obtains a DHCP lease on the Xen NIC,
- starts `sshd`,
- and accepts root public-key authentication over the network.
Validated guest details from the successful XCP-ng boot:
- guest IP: `192.168.213.62`
- hostname: `fruix-freebsd`
- kernel string:
- `FreeBSD 15.0-STABLE stable/15-n282801-29dce45d8c50 GENERIC amd64`
Representative successful SSH validation output:
```text
FreeBSD fruix-freebsd 15.0-STABLE FreeBSD 15.0-STABLE stable/15-n282801-29dce45d8c50 GENERIC amd64
fruix-freebsd
192.168.213.62
```
Successful XCP-ng work directory:
- `/tmp/phase9-xcpng-ssh-1775097470`
## Important boot/debugging findings
The first decisive breakthrough came from running the generated image locally under QEMU/TCG with serial capture. That made the previously opaque early-boot failure visible.
### 1. The original early boot abort was not an XCP-ng image-format problem anymore
After the earlier switch from raw uploads to dynamic VHD uploads, the remaining boot failure was inside the guest boot process, not in the XO import path.
### 2. FreeBSD `fstab` handling for pseudo-filesystems was wrong
The serial log showed that boot aborted during filesystem checks because the generated `fstab` gave non-zero fsck fields to non-UFS mounts such as `devfs`.
Representative failure:
```text
Starting file system checks:
/dev/gpt/fruix-root: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/gpt/fruix-root: clean, ...
fsck: exec fsck_devfs for devfs in /sbin:/usr/sbin: No such file or directory
Unknown error 1; help!
ERROR: ABORTING BOOT (sending SIGTERM to parent)!
```
The fix was to generate fsck pass fields only for UFS entries and emit `0 0` for pseudo-filesystems.
### 3. The minimal image was still missing many base files and commands expected by `rc`
Once `rc` ran further, QEMU serial logs exposed a long tail of missing runtime pieces that had not been visible from the earlier static validations alone.
Examples included:
- missing base commands:
- `dd`
- `expr`
- `rmdir`
- `sort`
- `mktemp`
- `egrep`
- `fsync`
- `kldload`
- `kldstat`
- `devfs`
- `devctl`
- `newsyslog`
- `ip6addrctl`
- missing base config files:
- `/etc/network.subr`
- `/etc/devd.conf`
- `/etc/newsyslog.conf`
- `/etc/syslog.conf`
- missing runtime directories:
- `/var/db`
- `/var/cron`
- missing libraries needed by later boot helpers:
- `libgeom.so.5`
- `libdevctl.so.5`
- `libcap_net.so.1`
- C++ runtime pieces used by `devd`
These were staged into the current FreeBSD package layer and linked into the generated rootfs.
### 4. SSH auth initially failed because the image relied on PAM without a complete PAM runtime/configuration
`sshd` would start, but root public-key authentication still failed. A direct in-guest debug run showed:
```text
PAM: initialisation failed
```
For the minimal Phase 9 guest, the practical fix was to make the generated `sshd_config` use:
- `UsePAM no`
while still keeping key-only login enabled.
That was sufficient to unlock real SSH access on both the local QEMU debug guest and the XCP-ng guest.
## Current code-level outcomes
The current checkpoint work materially expanded the minimal FreeBSD runtime staged into Fruix images.
Highlights:
- `modules/fruix/packages/freebsd.scm`
- added dedicated runtime packages for:
- `freebsd-networking`
- `freebsd-openssh`
- expanded staged base runtime coverage substantially for `rc`, networking, and SSH
- added required config files and shared libraries used during real boot
- `modules/fruix/system/freebsd.scm`
- added root authorized-key support to the operating-system model
- generated static account databases and supporting files:
- `/etc/passwd`
- `/etc/master.passwd`
- `/etc/group`
- `/etc/login.conf`
- `/etc/ttys`
- activation now runs:
- `cap_mkdb`
- `pwd_mkdb`
- activation creates required directories and SSH host keys
- generated `sshd_config` now disables PAM for the current minimal key-only Phase 9 path
- `fstab` generation now avoids fsck pass numbers for pseudo-filesystems
- rootfs generation now links the additional `/etc` files needed by real boot
- `tests/system/phase9-minimal-operating-system.scm.in`
- enables DHCP on the relevant NIC names for the current tracks:
- `xn0`
- `em0`
- `vtnet0`
- injects the root authorized key
- includes the SSH/network runtime packages and required system users/groups
- `tests/system/run-phase8-system-image.sh`
- now accepts `OS_FILE`
- now accepts/passes `DISK_CAPACITY`
- serial-console validation was relaxed from an exact loader string to a `comconsole` presence check
## Verified current state
The current validated Phase 9 state is:
- XCP-ng VHD upload path works against the existing VDI
- the guest boots far enough for normal `rc` networking and `sshd`
- DHCP works on the Xen NIC
- SSH key injection works
- root login over SSH works
This means the project has crossed an important Phase 9 boundary:
- the first boot validation no longer depends on local bhyve serial automation,
- and the real XCP-ng target can now be exercised over the network.
## Remaining blocker
Phase 9 is not complete yet because the Fruix-specific readiness path still fails.
Current remaining blocker:
- Guile still crashes in the guest
- therefore `fruix-shepherd` does not start
- therefore `/var/lib/fruix/ready` is still absent
Representative guest evidence:
```text
pid 262 (guile), jid 0, uid 0: exited on signal 11 (core dumped)
```
Over SSH on the real XCP-ng guest:
- `sshd` is running
- DHCP is active
- `fruix-shepherd` is stopped
- `/var/lib/fruix/ready` is missing
A retrieved core dump and local `lldb` analysis show the Guile crash occurs extremely early during initialization, in the locale/string conversion path while building Guile load/build info. This remains the next debugging target.
## Assessment
This checkpoint satisfies a meaningful Phase 9 intermediate milestone on the active FreeBSD/XCP-ng track:
- the generated Fruix image now boots as a network-reachable FreeBSD guest,
- and minimal operator access via SSH is working.
However, the full Fruix boot milestone is still blocked by in-guest Guile/Shepherd failure, so the overall Phase 9 milestone remains open.