214 lines
6.9 KiB
Markdown
214 lines
6.9 KiB
Markdown
# Phase 9 checkpoint: XCP-ng boot reached DHCP and SSH on FreeBSD
|
|
|
|
Date: 2026-04-02
|
|
|
|
## Goal
|
|
|
|
Advance Phase 9 from a static image-generation milestone to a real booted Fruix guest on the active FreeBSD/XCP-ng track, using the operator-approved VM:
|
|
|
|
- VM: `90490f2e-e8fc-4b7a-388e-5c26f0157289`
|
|
- existing target VDI: `0f1f90d3-48ca-4fa2-91d8-fc6339b95743`
|
|
|
|
The immediate objective for this checkpoint was narrower than full Phase 9 completion:
|
|
|
|
- boot the generated image under XCP-ng,
|
|
- obtain DHCP,
|
|
- and reach SSH access with the injected root key.
|
|
|
|
## Summary
|
|
|
|
This checkpoint succeeded.
|
|
|
|
The current Fruix FreeBSD image now:
|
|
|
|
- boots on the target XCP-ng VM,
|
|
- mounts the generated root filesystem,
|
|
- completes enough of FreeBSD `rc` startup to configure networking,
|
|
- obtains a DHCP lease on the Xen NIC,
|
|
- starts `sshd`,
|
|
- and accepts root public-key authentication over the network.
|
|
|
|
Validated guest details from the successful XCP-ng boot:
|
|
|
|
- guest IP: `192.168.213.62`
|
|
- hostname: `fruix-freebsd`
|
|
- kernel string:
|
|
- `FreeBSD 15.0-STABLE stable/15-n282801-29dce45d8c50 GENERIC amd64`
|
|
|
|
Representative successful SSH validation output:
|
|
|
|
```text
|
|
FreeBSD fruix-freebsd 15.0-STABLE FreeBSD 15.0-STABLE stable/15-n282801-29dce45d8c50 GENERIC amd64
|
|
fruix-freebsd
|
|
192.168.213.62
|
|
```
|
|
|
|
Successful XCP-ng work directory:
|
|
|
|
- `/tmp/phase9-xcpng-ssh-1775097470`
|
|
|
|
## Important boot/debugging findings
|
|
|
|
The first decisive breakthrough came from running the generated image locally under QEMU/TCG with serial capture. That made the previously opaque early-boot failure visible.
|
|
|
|
### 1. The original early boot abort was not an XCP-ng image-format problem anymore
|
|
|
|
After the earlier switch from raw uploads to dynamic VHD uploads, the remaining boot failure was inside the guest boot process, not in the XO import path.
|
|
|
|
### 2. FreeBSD `fstab` handling for pseudo-filesystems was wrong
|
|
|
|
The serial log showed that boot aborted during filesystem checks because the generated `fstab` gave non-zero fsck fields to non-UFS mounts such as `devfs`.
|
|
|
|
Representative failure:
|
|
|
|
```text
|
|
Starting file system checks:
|
|
/dev/gpt/fruix-root: FILE SYSTEM CLEAN; SKIPPING CHECKS
|
|
/dev/gpt/fruix-root: clean, ...
|
|
fsck: exec fsck_devfs for devfs in /sbin:/usr/sbin: No such file or directory
|
|
Unknown error 1; help!
|
|
ERROR: ABORTING BOOT (sending SIGTERM to parent)!
|
|
```
|
|
|
|
The fix was to generate fsck pass fields only for UFS entries and emit `0 0` for pseudo-filesystems.
|
|
|
|
### 3. The minimal image was still missing many base files and commands expected by `rc`
|
|
|
|
Once `rc` ran further, QEMU serial logs exposed a long tail of missing runtime pieces that had not been visible from the earlier static validations alone.
|
|
|
|
Examples included:
|
|
|
|
- missing base commands:
|
|
- `dd`
|
|
- `expr`
|
|
- `rmdir`
|
|
- `sort`
|
|
- `mktemp`
|
|
- `egrep`
|
|
- `fsync`
|
|
- `kldload`
|
|
- `kldstat`
|
|
- `devfs`
|
|
- `devctl`
|
|
- `newsyslog`
|
|
- `ip6addrctl`
|
|
- missing base config files:
|
|
- `/etc/network.subr`
|
|
- `/etc/devd.conf`
|
|
- `/etc/newsyslog.conf`
|
|
- `/etc/syslog.conf`
|
|
- missing runtime directories:
|
|
- `/var/db`
|
|
- `/var/cron`
|
|
- missing libraries needed by later boot helpers:
|
|
- `libgeom.so.5`
|
|
- `libdevctl.so.5`
|
|
- `libcap_net.so.1`
|
|
- C++ runtime pieces used by `devd`
|
|
|
|
These were staged into the current FreeBSD package layer and linked into the generated rootfs.
|
|
|
|
### 4. SSH auth initially failed because the image relied on PAM without a complete PAM runtime/configuration
|
|
|
|
`sshd` would start, but root public-key authentication still failed. A direct in-guest debug run showed:
|
|
|
|
```text
|
|
PAM: initialisation failed
|
|
```
|
|
|
|
For the minimal Phase 9 guest, the practical fix was to make the generated `sshd_config` use:
|
|
|
|
- `UsePAM no`
|
|
|
|
while still keeping key-only login enabled.
|
|
|
|
That was sufficient to unlock real SSH access on both the local QEMU debug guest and the XCP-ng guest.
|
|
|
|
## Current code-level outcomes
|
|
|
|
The current checkpoint work materially expanded the minimal FreeBSD runtime staged into Fruix images.
|
|
|
|
Highlights:
|
|
|
|
- `modules/fruix/packages/freebsd.scm`
|
|
- added dedicated runtime packages for:
|
|
- `freebsd-networking`
|
|
- `freebsd-openssh`
|
|
- expanded staged base runtime coverage substantially for `rc`, networking, and SSH
|
|
- added required config files and shared libraries used during real boot
|
|
- `modules/fruix/system/freebsd.scm`
|
|
- added root authorized-key support to the operating-system model
|
|
- generated static account databases and supporting files:
|
|
- `/etc/passwd`
|
|
- `/etc/master.passwd`
|
|
- `/etc/group`
|
|
- `/etc/login.conf`
|
|
- `/etc/ttys`
|
|
- activation now runs:
|
|
- `cap_mkdb`
|
|
- `pwd_mkdb`
|
|
- activation creates required directories and SSH host keys
|
|
- generated `sshd_config` now disables PAM for the current minimal key-only Phase 9 path
|
|
- `fstab` generation now avoids fsck pass numbers for pseudo-filesystems
|
|
- rootfs generation now links the additional `/etc` files needed by real boot
|
|
- `tests/system/phase9-minimal-operating-system.scm.in`
|
|
- enables DHCP on the relevant NIC names for the current tracks:
|
|
- `xn0`
|
|
- `em0`
|
|
- `vtnet0`
|
|
- injects the root authorized key
|
|
- includes the SSH/network runtime packages and required system users/groups
|
|
- `tests/system/run-phase8-system-image.sh`
|
|
- now accepts `OS_FILE`
|
|
- now accepts/passes `DISK_CAPACITY`
|
|
- serial-console validation was relaxed from an exact loader string to a `comconsole` presence check
|
|
|
|
## Verified current state
|
|
|
|
The current validated Phase 9 state is:
|
|
|
|
- XCP-ng VHD upload path works against the existing VDI
|
|
- the guest boots far enough for normal `rc` networking and `sshd`
|
|
- DHCP works on the Xen NIC
|
|
- SSH key injection works
|
|
- root login over SSH works
|
|
|
|
This means the project has crossed an important Phase 9 boundary:
|
|
|
|
- the first boot validation no longer depends on local bhyve serial automation,
|
|
- and the real XCP-ng target can now be exercised over the network.
|
|
|
|
## Remaining blocker
|
|
|
|
Phase 9 is not complete yet because the Fruix-specific readiness path still fails.
|
|
|
|
Current remaining blocker:
|
|
|
|
- Guile still crashes in the guest
|
|
- therefore `fruix-shepherd` does not start
|
|
- therefore `/var/lib/fruix/ready` is still absent
|
|
|
|
Representative guest evidence:
|
|
|
|
```text
|
|
pid 262 (guile), jid 0, uid 0: exited on signal 11 (core dumped)
|
|
```
|
|
|
|
Over SSH on the real XCP-ng guest:
|
|
|
|
- `sshd` is running
|
|
- DHCP is active
|
|
- `fruix-shepherd` is stopped
|
|
- `/var/lib/fruix/ready` is missing
|
|
|
|
A retrieved core dump and local `lldb` analysis show the Guile crash occurs extremely early during initialization, in the locale/string conversion path while building Guile load/build info. This remains the next debugging target.
|
|
|
|
## Assessment
|
|
|
|
This checkpoint satisfies a meaningful Phase 9 intermediate milestone on the active FreeBSD/XCP-ng track:
|
|
|
|
- the generated Fruix image now boots as a network-reachable FreeBSD guest,
|
|
- and minimal operator access via SSH is working.
|
|
|
|
However, the full Fruix boot milestone is still blocked by in-guest Guile/Shepherd failure, so the overall Phase 9 milestone remains open.
|