Files
fruix/docs/reports/phase1-guile-subprocess-crash.md

5.9 KiB

Phase 1.1 follow-up: Guile subprocess crash on FreeBSD

Date: 2026-04-01

Summary

guile3 on this FreeBSD 15.0-STABLE amd64 host crashes when Guile tries to create subprocesses through:

  • system*
  • spawn
  • open-pipe*

The crash is reproducible and is not caused by FreeBSD's native posix_spawn(3) implementation by itself. The evidence points to an upstream Guile/gnulib integration bug on FreeBSD:

  • gnulib decides to replace posix_spawn/posix_spawnp on this platform
  • Guile still calls the native FreeBSD extension posix_spawn_file_actions_addclosefrom_np
  • that function receives a gnulib replacement posix_spawn_file_actions_t object with an incompatible ABI
  • the process crashes inside libc when addclosefrom_np interprets gnulib's struct header as a native pointer

Repro artifacts added

  • tests/guile/posix-spawn-freebsd-diagnostics.c
  • tests/guile/run-subprocess-diagnostics.sh

Run with:

./tests/guile/run-subprocess-diagnostics.sh

Expected output on the current host includes:

native-spawn-closefrom=ok
adddup2-invalid-fd-accepted=yes
addopen-invalid-fd-accepted=yes
posix_spawn-secure-exec-result=0
posix_spawnp-secure-exec-result=3
issue-profile-match=yes
system-star exit=139
spawn exit=139
open-pipe-star exit=139

Minimal Guile reproducers

guile3 -c '(system* "/usr/bin/true")'
guile3 -c '(spawn "/usr/bin/true" (list "/usr/bin/true"))'
guile3 -c '(use-modules (ice-9 popen)) (open-pipe* OPEN_READ "/usr/bin/true")'

All three terminate with SIGSEGV (exit 139) on this machine.

Native FreeBSD posix_spawn is not the direct problem

A standalone C test using FreeBSD's native APIs works correctly:

  • posix_spawn_file_actions_init
  • posix_spawn_file_actions_adddup2
  • posix_spawn_file_actions_addclosefrom_np
  • posix_spawn

The diagnostic program in tests/guile/posix-spawn-freebsd-diagnostics.c confirms this with:

native-spawn-closefrom=ok

So the crash is above libc, in how Guile/gnulib prepares the file-actions object.

Why gnulib replaces posix_spawn on this host

Upstream Guile 3.0.10 vendors gnulib logic in m4/posix_spawn.m4.

Two FreeBSD-relevant observations from the local diagnostics match gnulib's replacement logic:

  1. posix_spawnp is considered insecure by gnulib's test because it accepts a script without a shebang and ends up running it successfully instead of rejecting it with ENOEXEC.
  2. FreeBSD's posix_spawn_file_actions_adddup2 and posix_spawn_file_actions_addopen accept obviously invalid file descriptors in the gnulib probe cases, so gnulib also wants wrapper/replacement behavior there.

Observed locally:

adddup2-invalid-fd-accepted=yes
addopen-invalid-fd-accepted=yes
posix_spawnp-secure-exec-result=3

That strongly indicates REPLACE_POSIX_SPAWN=1 in the Guile build on this system.

Root cause hypothesis

1. Guile uses addclosefrom_np when the symbol exists

In upstream Guile 3.0.10, libguile/posix.c contains:

  • #ifdef HAVE_POSIX_SPAWN_FILE_ACTIONS_ADDCLOSEFROM_NP
  • #define HAVE_ADDCLOSEFROM 1
  • later in do_spawn(...):
#ifdef HAVE_ADDCLOSEFROM
  posix_spawn_file_actions_addclosefrom_np (&actions, 3);
#else
  close_inherited_fds (&actions, max_fd);
#endif

2. But gnulib can replace the posix_spawn ABI

In upstream gnulib's lib/spawn.in.h, when REPLACE_POSIX_SPAWN=1, posix_spawn_file_actions_t becomes a gnulib-defined struct instead of the native FreeBSD opaque-pointer type.

FreeBSD's native /usr/include/spawn.h defines:

typedef struct __posix_spawn_file_actions *posix_spawn_file_actions_t;

So native FreeBSD expects posix_spawn_file_actions_t to be pointer-like, while gnulib replacement mode uses an in-memory struct.

3. The crash signature matches that ABI mismatch exactly

The lldb backtrace from the core file shows the crash in:

libc.so.7`posix_spawn_file_actions_addclosefrom_np

with:

*fa = 0x0000000600000008

That value matches the first two 32-bit fields of gnulib's replacement file-actions struct interpreted as a pointer:

  • _allocated = 8
  • _used = 6

Those values are exactly plausible after Guile schedules six dup2 actions in do_spawn(...).

In other words, libc is reading gnulib's struct header as though it were a native pointer to struct __posix_spawn_file_actions, which explains the segmentation fault.

Assessment

This looks like an upstream Guile bug on FreeBSD-family systems where:

  • gnulib decides REPLACE_POSIX_SPAWN=1, and
  • the platform exposes native posix_spawn_file_actions_addclosefrom_np

It does not look like a Guix-specific bug, nor primarily a local packaging mistake.

The safest fix is in Guile's libguile/posix.c:

  • only use posix_spawn_file_actions_addclosefrom_np when Guile is using the native posix_spawn / posix_spawn_file_actions_t ABI
  • if gnulib replacement posix_spawn is active, fall back to close_inherited_fds(&actions, max_fd) instead

In practice that likely means guarding the HAVE_ADDCLOSEFROM path with an additional condition equivalent to:

#if defined(HAVE_POSIX_SPAWN_FILE_ACTIONS_ADDCLOSEFROM_NP) && !defined(REPLACE_POSIX_SPAWN)

or another build-time condition that guarantees ABI compatibility.

Impact on the Guix-on-FreeBSD port

This is an important blocker because Guix and Guile code frequently depend on subprocess creation helpers.

However, the investigation also confirms:

  • lower-level process primitives still work (primitive-fork, waitpid)
  • sockets, file I/O, and FFI still work
  • the problem is narrow enough to patch or work around

So the Guix port remains viable, but robust subprocess handling on FreeBSD will likely require either:

  1. a local Guile patch, or
  2. an upstream fix to Guile/gnulib integration, or
  3. temporary Guix-side avoidance of the crashing subprocess helpers while bootstrapping the port