Files
fruix/docs/reports/phase1-freebsd-syscall-mapping.md

223 lines
9.1 KiB
Markdown

# Phase 1.3: FreeBSD system call interface mapping for Guix and Shepherd porting
Date: 2026-04-01
## Summary
This step completes the Phase 1.3 deliverable by documenting how the FreeBSD system interface maps to the Linux-oriented assumptions visible in current Guix source code, and by adding a runnable C harness that exercises the most important FreeBSD-side primitives.
Added files:
- `tests/system/freebsd-syscall-mapping.c`
- `tests/system/run-freebsd-syscall-mapping.sh`
Results from the runnable harness show that the current FreeBSD host provides and successfully exercises:
- `fork` / `waitpid`
- `posix_spawn_file_actions_addclosefrom_np`
- `close_range`
- `lutimes`
- `statvfs`
- `chroot` (root test)
- `jail(2)` (root test)
- `lchown` (root test)
The same harness also confirms several key Linux namespace-oriented interfaces are absent on this host:
- `clone`
- `unshare`
- `setns`
- `pivot_root`
- `sys/prctl.h`
Additionally, `posix_fallocate` is present but returned `EOPNOTSUPP` on the tested filesystems, which is a notable semantic difference from a simple configure-time link check.
## Sources inspected
### Guix / Nix daemon source paths
The mapping work was based on current Guix source inspection, especially:
- `~/repos/guix/nix/libstore/build.cc`
- `~/repos/guix/configure.ac`
Relevant Linux-oriented mechanisms visibly referenced there include:
- chroot-based build roots
- build users via `setuid` / `setgid`
- Linux namespaces through `clone`/`unshare`/`setns`
- `pivot_root`
- seccomp and `prctl`
- mount namespace behavior and bind mounts
- filesystem metadata calls such as `lchown`
- store space checks through `statvfs`
### FreeBSD interface/man-page references
The mapping also used current FreeBSD interfaces/documentation from the host, including:
- `jail(2)`
- `chroot(2)`
- `closefrom(2)` / `close_range(2)`
- `mount(2)` / `nmount(2)`
- `mount_nullfs(8)`
- `cap_enter(2)`
- `cap_rights_limit(2)`
- `lutimes(2)`
- `lchown(2)`
- `posix_fallocate(2)`
- `pdfork(2)`
## Runnable validation harness
The new harness can be run with:
```sh
METADATA_OUT=/tmp/freebsd-syscall-mapping-metadata.txt \
./tests/system/run-freebsd-syscall-mapping.sh
```
Observed output on the current host:
```text
feature.SYS_clone=no
feature.SYS_unshare=no
feature.SYS_setns=no
feature.SYS_pivot_root=no
feature.FreeBSD_jail=yes
feature.Capsicum_headers=yes
feature.sys_prctl_h=no
feature.linux_close_range_h=no
runtime.fork_waitpid=ok
runtime.posix_spawn_addclosefrom_np=ok
runtime.close_range=ok
runtime.posix_fallocate=unsupported-on-tested-filesystem
runtime.lutimes=ok
runtime.statvfs=ok
root.chroot=ok
root.jail=ok
root.lchown=ok
```
## Mapping by functional area
### 1. Process creation and supervision
| Guix/Linux expectation | FreeBSD status | Mapping / notes |
|---|---|---|
| `fork` / `waitpid` | Available and validated | Direct mapping works. |
| `posix_spawn` helpers | Available and validated | Works once Guile uses the fixed local build; `posix_spawn_file_actions_addclosefrom_np` exists on FreeBSD. |
| `pdfork` process descriptors | FreeBSD-specific extra facility | Not currently used by Guix, but relevant as a possible future supervision/containment primitive. |
#### Assessment
FreeBSD is not blocked at the basic process-creation layer. The earlier Guile subprocess crash was an ABI mismatch in Guile/gnulib usage, not a lack of kernel support for subprocess creation.
### 2. File-descriptor cleanup and process hygiene
| Guix/Linux expectation | FreeBSD status | Mapping / notes |
|---|---|---|
| `close_range` | Available and validated | Directly present in FreeBSD libc. |
| `closefrom` behavior | Available | Also native on FreeBSD and useful for daemon/build-helper hygiene. |
| `posix_spawn_file_actions_addclosefrom_np` | Available and validated | Strongly relevant because Guile and Guix subprocess helpers rely on this class of operation. |
#### Assessment
FreeBSD provides good native support for descriptor-sweeping operations. This is a positive compatibility point rather than a gap.
### 3. Filesystem isolation and chroot-style build roots
| Guix/Linux expectation | FreeBSD status | Mapping / notes |
|---|---|---|
| `chroot` build roots | Available and validated | Directly usable; root-only as expected. |
| bind-mount-style exposure of declared inputs | No Linux bind mounts, but equivalent behavior exists | Use `nullfs` mounts plus ordinary mount orchestration rather than Linux bind mounts. |
| `pivot_root` | Absent | Must not be relied on; jail/chroot/nullfs-based setup is the practical replacement direction. |
#### Assessment
The core “restricted filesystem root containing only declared paths” idea is achievable on FreeBSD, but the implementation must be rethought around `chroot`, `jail`, `mount`/`nmount`, and `nullfs`, rather than Linux mount namespaces plus `pivot_root`.
### 4. Namespace-based isolation
| Guix/Linux expectation | FreeBSD status | Mapping / notes |
|---|---|---|
| `clone(CLONE_NEW*)` | Absent | No direct equivalent. |
| `unshare` | Absent | No direct equivalent. |
| `setns` | Absent | No direct equivalent. |
| user namespaces | Absent in Linux sense | Must be replaced with jail design and traditional privilege separation. |
| mount namespaces | Absent in Linux sense | Must be replaced with jail/chroot + mount arrangement. |
| network namespaces | Absent in Linux sense | Use VNET jails when network isolation is required. |
| PID namespaces | Absent in Linux sense | Jail process isolation is the closest available model. |
#### Assessment
This is the single largest architectural gap between current Guix daemon code and FreeBSD. It confirms the Phase 2 design direction: Guix daemon isolation on FreeBSD cannot be a syscall-for-syscall translation; it must be a jail-oriented redesign.
### 5. Mount and store exposure mechanics
| Guix/Linux expectation | FreeBSD status | Mapping / notes |
|---|---|---|
| `mount` operations for build roots | Available | FreeBSD provides `mount(2)` and `nmount(2)`. |
| bind mounts | Different implementation model | `nullfs` is the practical analog for exposing existing paths elsewhere in the namespace. |
| recursive mount-namespace behavior | No direct equivalent | Must be handled explicitly in jail/chroot mount layout. |
#### Assessment
A FreeBSD Guix daemon will need explicit mount planning rather than namespace-based mount isolation. `nullfs` is the most natural replacement for the Linux bind-mount role in store/input exposure.
### 6. Privilege dropping and capability models
| Guix/Linux expectation | FreeBSD status | Mapping / notes |
|---|---|---|
| `setuid` / `setgid` build users | Available | Traditional Unix credential switching remains available. |
| Linux capabilities (`CAP_*`) | Absent | No direct equivalent. |
| `prctl`-style Linux process controls | Absent on this host | Must not be assumed. |
| seccomp filter model | Linux-specific | No direct equivalent. |
| Capsicum capability mode | Available on FreeBSD | Useful complementary mechanism, but not a 1:1 replacement for Linux capabilities or namespaces. |
#### Assessment
FreeBSD can still do classic build-user isolation, but the Linux capability/seccomp model must be replaced by a different combination of jails, traditional credentials, filesystem layout, and possibly Capsicum in carefully chosen places.
### 7. Metadata, timestamps, and storage primitives
| Guix/Linux expectation | FreeBSD status | Mapping / notes |
|---|---|---|
| `lchown` | Available and validated | Works in root test. |
| `lutimes` | Available and validated | Works. |
| `statvfs` | Available and validated | Works and is already referenced in current daemon code. |
| `statx` | Absent | Must use older `stat`/`lstat`/`fstatat` style interfaces instead. |
| `posix_fallocate` | Present but runtime-limited | Returned `EOPNOTSUPP` on the tested filesystems; presence does not imply useful semantics everywhere. |
#### Assessment
Most metadata operations map directly, but `statx` has no FreeBSD equivalent and `posix_fallocate` requires semantic caution rather than a simple availability check.
## Porting implications for Phase 2
The system-call mapping work strongly supports the following Phase 2 design assumptions:
1. **Use jails as the primary isolation model.**
Linux namespace code paths are not portable as-is.
2. **Use `nullfs` + `chroot`/jail layout instead of Linux bind mounts + mount namespaces.**
3. **Retain build users and classic UID/GID switching.**
These mechanisms remain directly usable on FreeBSD.
4. **Do not depend on Linux seccomp/capability/prctl machinery.**
Any comparable restrictions must come from a different design.
5. **Treat `posix_fallocate` conservatively.**
Configure-time presence is not enough; runtime filesystem behavior matters.
## Conclusion
Phase 1.3 is now satisfied by:
- a concrete source-based mapping between Guix's Linux-oriented daemon assumptions and FreeBSD facilities
- a runnable C harness validating the most important FreeBSD-side primitives
- explicit identification of the irreducible architectural gap: Linux namespaces versus FreeBSD jails
This provides enough detail for Phase 2 work to proceed with a jail-first design instead of attempting a misleading syscall-by-syscall translation.