Files
fruix/docs/reports/phase20-self-hosted-native-builds-freebsd.md

7.2 KiB

Phase 20.3: controlled guest self-hosted native base-build prototype

Date: 2026-04-05

Goal

Reassess guest self-hosting now that Fruix has already completed the earlier source, installation, generation-layout, rollback, development-overlay, and host-initiated in-guest native-build steps.

Phase 20.3 asked for real evidence about:

  • what self-hosting would improve
  • what it would cost in complexity
  • how it fits with the Fruix source/deployment model already in place

What changed

New in-guest helper

Development-enabled systems now also ship:

  • /usr/local/bin/fruix-self-hosted-native-build

This helper performs a controlled in-guest native FreeBSD base build using the system's own declared materialized source store recorded in:

  • /run/current-system/metadata/store-layout.scm

The helper:

  1. verifies the development overlay is present
  2. verifies the canonical compatibility links exist:
    • /usr/include
    • /usr/share/mk
  3. recovers the materialized FreeBSD source store from current-system metadata
  4. runs:
    • buildworld
    • buildkernel
    • installworld
    • distribution
    • installkernel
  5. stages narrower artifact outputs under:
    • /var/lib/fruix/native-builds/<run-id>/artifacts/
  6. records metadata and status under:
    • /var/lib/fruix/native-builds/<run-id>/
    • /var/lib/fruix/native-builds/latest

The heavy object/stage work stays under:

  • /var/tmp/fruix-self-hosted-native-builds/<run-id>

so the installed-system result area remains smaller and more legible.

Important environment fix discovered during prototyping

The first prototype attempt failed even though Phase 20.2 had already succeeded.

Cause:

  • directly evaluating fruix-development-environment before buildworld exported development-oriented variables like:
    • MAKEFLAGS
    • CPPFLAGS
    • CFLAGS
    • CXXFLAGS
    • LDFLAGS
  • those are appropriate for smaller development builds, but they polluted FreeBSD's world/kernel bootstrap environment and broke the LLVM bootstrap phase

Representative failure:

  • missing generated LLVM config headers during bootstrap (llvm/Config/abi-breaking.h)

The validated fix was to make the self-hosted helper explicitly sanitize that environment first:

  • reset PATH to the normal base paths
  • unset development-shell variables such as:
    • MAKEFLAGS
    • CC, CXX, AR, RANLIB, NM
    • CPPFLAGS, CFLAGS, CXXFLAGS, LDFLAGS
    • FRUIX_DEVELOPMENT_*
    • FRUIX_* tool variables

So the final 20.3 result is not “just reuse the development shell wholesale”.

It is more precise:

  • use the development overlay for canonical paths and available content
  • but run the real base-build steps in a cleaner, purpose-built helper environment

Closure invalidation

To ensure the updated helper actually affects generated system closures, the operating-system closure spec now also records helper-version markers for development-enabled systems.

That ensures guest images pick up helper changes instead of silently reusing an older cached closure path.

Validation harness

Added:

  • tests/system/run-phase20-self-hosted-native-build-xcpng.sh

This harness:

  1. boots the validated development-enabled Fruix guest on the approved XCP-ng path
  2. verifies the new helper exists in the guest
  3. invokes the helper from inside the guest
  4. verifies the recorded result/status/latest pointer
  5. validates the resulting staged artifact metadata and hashes

Validation

Passing run:

  • PASS phase20-self-hosted-native-build-xcpng
  • workdir: /tmp/fruix-phase20-self-hosted-native-build-xcpng

Validated on the approved real XCP-ng path:

  • VM 90490f2e-e8fc-4b7a-388e-5c26f0157289
  • VDI 0f1f90d3-48ca-4fa2-91d8-fc6339b95743

Representative metadata:

run_id=20260405T150359Z
helper_version=2
build_jobs=8
source_store=/frx/store/12d7704362e95afc2697db63f168b878e082b372-freebsd-source-default
build_root=/var/tmp/fruix-self-hosted-native-builds/20260405T150359Z
result_root=/var/lib/fruix/native-builds/20260405T150359Z
latest_link=/var/lib/fruix/native-builds/latest
latest_target=/var/lib/fruix/native-builds/20260405T150359Z
status_value=ok
build_root_size=7.5G
result_root_size=343M
kernel_artifact_size=158M
headers_artifact_size=32M
bootloader_artifact_size=1.3M
sha_kernel=16950f116a52134b98e2f8e0dacc556e18fe254e4a0ac2c1741422dde281a341
sha_loader=ea417846167ece270ada611624dca622ca38bd30125b9a125cd8ebb8b3600313
sha_param=9eb140ca7d9666f3d484a4174c9acd94b45427db6292b4e17de19af2c6aa5219
self_hosted_native_build=ok

Validated facts:

  • the development-enabled Fruix guest can now run a controlled self-hosted native base-build helper from inside the installed system itself
  • the helper can recover the declared source store from current-system metadata without host-side parsing
  • buildworld and buildkernel succeed in the guest
  • staged installworld, distribution, and installkernel succeed in the guest
  • the helper records a stable result directory and latest pointer under:
    • /var/lib/fruix/native-builds
  • the resulting artifact hashes match the earlier validated Phase 20.2 host-initiated in-guest path

What self-hosting improved

The prototype demonstrates a few real improvements:

  • the build recipe itself now lives inside the Fruix-managed system, not only in a host-side SSH harness
  • the guest can derive its own declared source input from current-system metadata
  • result/state recording now has a Fruix-native installed-system location:
    • /var/lib/fruix/native-builds
  • the host no longer needs to spell out every make phase just to validate the in-guest path

What it cost in complexity

The prototype also made the extra complexity visible:

  • the guest helper needs its own controlled environment contract
  • a naive reuse of the development-shell exports was wrong for real buildworld
  • helper-version invalidation had to be made explicit so closure caching would not hide helper changes
  • the in-guest result/staging model now needs its own operator-facing conventions

So the experiment did not eliminate complexity.

It mostly moved some of it from the host harness into an explicit in-guest helper contract.

Decision after the prototype

Phase 20.3 is complete because Fruix now has a first controlled guest self-hosted native base-build prototype.

However, the evidence does not suggest replacing the Phase 20.2 path as the default operator workflow yet.

The current recommendation is:

  • keep the host-initiated in-guest native-build path as the simpler default validation and orchestration flow
  • keep the new self-hosted helper as a controlled prototype and stepping stone toward deeper guest-driven workflows

That fits the existing Fruix model well:

  • source identity still comes from declared store-backed metadata
  • deployment identity still comes from immutable closures under /frx/store
  • the guest-side prototype adds a narrower in-system build/result workflow without replacing the existing deployment story

Result

Phase 20.3 is complete.

Fruix now has:

  • a validated host-orchestrated in-guest native base-build workflow
  • and a validated first controlled guest self-hosted native base-build prototype

That answers the Phase 20.3 question with real evidence instead of only prior caution.