Files
fruix/docs/reports/phase9-xcpng-ssh-boot-freebsd.md

6.9 KiB

Phase 9 checkpoint: XCP-ng boot reached DHCP and SSH on FreeBSD

Date: 2026-04-02

Goal

Advance Phase 9 from a static image-generation milestone to a real booted Fruix guest on the active FreeBSD/XCP-ng track, using the operator-approved VM:

  • VM: 90490f2e-e8fc-4b7a-388e-5c26f0157289
  • existing target VDI: 0f1f90d3-48ca-4fa2-91d8-fc6339b95743

The immediate objective for this checkpoint was narrower than full Phase 9 completion:

  • boot the generated image under XCP-ng,
  • obtain DHCP,
  • and reach SSH access with the injected root key.

Summary

This checkpoint succeeded.

The current Fruix FreeBSD image now:

  • boots on the target XCP-ng VM,
  • mounts the generated root filesystem,
  • completes enough of FreeBSD rc startup to configure networking,
  • obtains a DHCP lease on the Xen NIC,
  • starts sshd,
  • and accepts root public-key authentication over the network.

Validated guest details from the successful XCP-ng boot:

  • guest IP: 192.168.213.62
  • hostname: fruix-freebsd
  • kernel string:
    • FreeBSD 15.0-STABLE stable/15-n282801-29dce45d8c50 GENERIC amd64

Representative successful SSH validation output:

FreeBSD fruix-freebsd 15.0-STABLE FreeBSD 15.0-STABLE stable/15-n282801-29dce45d8c50 GENERIC amd64
fruix-freebsd
192.168.213.62

Successful XCP-ng work directory:

  • /tmp/phase9-xcpng-ssh-1775097470

Important boot/debugging findings

The first decisive breakthrough came from running the generated image locally under QEMU/TCG with serial capture. That made the previously opaque early-boot failure visible.

1. The original early boot abort was not an XCP-ng image-format problem anymore

After the earlier switch from raw uploads to dynamic VHD uploads, the remaining boot failure was inside the guest boot process, not in the XO import path.

2. FreeBSD fstab handling for pseudo-filesystems was wrong

The serial log showed that boot aborted during filesystem checks because the generated fstab gave non-zero fsck fields to non-UFS mounts such as devfs.

Representative failure:

Starting file system checks:
/dev/gpt/fruix-root: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/gpt/fruix-root: clean, ...
fsck: exec fsck_devfs for devfs in /sbin:/usr/sbin: No such file or directory
Unknown error 1; help!
ERROR: ABORTING BOOT (sending SIGTERM to parent)!

The fix was to generate fsck pass fields only for UFS entries and emit 0 0 for pseudo-filesystems.

3. The minimal image was still missing many base files and commands expected by rc

Once rc ran further, QEMU serial logs exposed a long tail of missing runtime pieces that had not been visible from the earlier static validations alone.

Examples included:

  • missing base commands:
    • dd
    • expr
    • rmdir
    • sort
    • mktemp
    • egrep
    • fsync
    • kldload
    • kldstat
    • devfs
    • devctl
    • newsyslog
    • ip6addrctl
  • missing base config files:
    • /etc/network.subr
    • /etc/devd.conf
    • /etc/newsyslog.conf
    • /etc/syslog.conf
  • missing runtime directories:
    • /var/db
    • /var/cron
  • missing libraries needed by later boot helpers:
    • libgeom.so.5
    • libdevctl.so.5
    • libcap_net.so.1
    • C++ runtime pieces used by devd

These were staged into the current FreeBSD package layer and linked into the generated rootfs.

4. SSH auth initially failed because the image relied on PAM without a complete PAM runtime/configuration

sshd would start, but root public-key authentication still failed. A direct in-guest debug run showed:

PAM: initialisation failed

For the minimal Phase 9 guest, the practical fix was to make the generated sshd_config use:

  • UsePAM no

while still keeping key-only login enabled.

That was sufficient to unlock real SSH access on both the local QEMU debug guest and the XCP-ng guest.

Current code-level outcomes

The current checkpoint work materially expanded the minimal FreeBSD runtime staged into Fruix images.

Highlights:

  • modules/fruix/packages/freebsd.scm
    • added dedicated runtime packages for:
      • freebsd-networking
      • freebsd-openssh
    • expanded staged base runtime coverage substantially for rc, networking, and SSH
    • added required config files and shared libraries used during real boot
  • modules/fruix/system/freebsd.scm
    • added root authorized-key support to the operating-system model
    • generated static account databases and supporting files:
      • /etc/passwd
      • /etc/master.passwd
      • /etc/group
      • /etc/login.conf
      • /etc/ttys
    • activation now runs:
      • cap_mkdb
      • pwd_mkdb
    • activation creates required directories and SSH host keys
    • generated sshd_config now disables PAM for the current minimal key-only Phase 9 path
    • fstab generation now avoids fsck pass numbers for pseudo-filesystems
    • rootfs generation now links the additional /etc files needed by real boot
  • tests/system/phase9-minimal-operating-system.scm.in
    • enables DHCP on the relevant NIC names for the current tracks:
      • xn0
      • em0
      • vtnet0
    • injects the root authorized key
    • includes the SSH/network runtime packages and required system users/groups
  • tests/system/run-phase8-system-image.sh
    • now accepts OS_FILE
    • now accepts/passes DISK_CAPACITY
    • serial-console validation was relaxed from an exact loader string to a comconsole presence check

Verified current state

The current validated Phase 9 state is:

  • XCP-ng VHD upload path works against the existing VDI
  • the guest boots far enough for normal rc networking and sshd
  • DHCP works on the Xen NIC
  • SSH key injection works
  • root login over SSH works

This means the project has crossed an important Phase 9 boundary:

  • the first boot validation no longer depends on local bhyve serial automation,
  • and the real XCP-ng target can now be exercised over the network.

Remaining blocker

Phase 9 is not complete yet because the Fruix-specific readiness path still fails.

Current remaining blocker:

  • Guile still crashes in the guest
  • therefore fruix-shepherd does not start
  • therefore /var/lib/fruix/ready is still absent

Representative guest evidence:

pid 262 (guile), jid 0, uid 0: exited on signal 11 (core dumped)

Over SSH on the real XCP-ng guest:

  • sshd is running
  • DHCP is active
  • fruix-shepherd is stopped
  • /var/lib/fruix/ready is missing

A retrieved core dump and local lldb analysis show the Guile crash occurs extremely early during initialization, in the locale/string conversion path while building Guile load/build info. This remains the next debugging target.

Assessment

This checkpoint satisfies a meaningful Phase 9 intermediate milestone on the active FreeBSD/XCP-ng track:

  • the generated Fruix image now boots as a network-reachable FreeBSD guest,
  • and minimal operator access via SSH is working.

However, the full Fruix boot milestone is still blocked by in-guest Guile/Shepherd failure, so the overall Phase 9 milestone remains open.