Agent Design

Overview

pact-agent is the init system, configuration manager, process supervisor, and shell server for diskless HPC/AI compute nodes. It is PID 1 (or near-PID-1) and the only management process that starts from the base boot image.

Subsystems

Process Supervisor (`src/supervisor/`)

Two backends behind the ServiceManager trait:

PactSupervisor (default):

Direct process management via tokio::process::Command
cgroup v2 isolation: creates /sys/fs/cgroup/pact.slice/<service>/ per service
Memory limits, CPU quotas via cgroup controllers
Health checks: process alive + optional HTTP/TCP endpoint check
Restart with exponential backoff (configurable per service)
Dependency ordering from service declarations in vCluster overlay
Zombie reaping: pact-agent sets PR_SET_CHILD_SUBREAPER
stdout/stderr capture via pipes → pact log pipeline → Loki
Ordered shutdown: reverse dependency order, SIGTERM → grace period → SIGKILL

SystemdBackend (fallback):

Generates systemd unit files from vCluster service declarations
Start/stop/restart via D-Bus connection to systemd
Monitor via sd_notify protocol
Same ServiceManager trait — transparent to rest of pact-agent

Shell Server (`src/shell/`)

Replaces SSH. Listens on a gRPC endpoint (mTLS authenticated). Provides three RPC operations: exec (single command), shell (interactive session), and CollectDiag (structured diagnostic log retrieval).

pact exec (single command):

Client → ExecRequest{node_id, command, args} → pact-agent
  → authenticate (OIDC token verification)
  → authorize: call PolicyService.Evaluate() on policy node (full OPA/Rego)
      if policy service unreachable: fall back to cached VClusterPolicy
      (role_bindings + whitelist only; two-person approval denied)
  → whitelist check (command in allowed set?)
  → classify (read-only or state-changing?)
  → if state-changing: go through commit window model
  → execute via fork/exec in restricted environment
  → stream stdout/stderr back to client
  → log command + output to journal

For exec, pact-agent controls the full command — it receives a command + args, validates against the whitelist, and fork/execs directly. No shell interpretation.

pact shell (interactive session — restricted bash):

pact shell does not reimplement a shell. It spawns a restricted bash session inside a controlled environment. Reimplementing line editing, pipes, redirects, globbing, quoting, job control, and signal handling would be both enormous and a security liability (command parsing bugs = bypasses).

Client → ShellSessionRequest{node_id} → pact-agent
  → authenticate + authorize (same policy call as exec; shell requires
    higher privilege — if policy service unreachable, cached RBAC check)
  → open bidirectional gRPC stream
  → allocate PTY with restricted bash environment:
      - PATH restricted to whitelisted command directories
      - readonly PATH, ENV, BASH_ENV, SHELL (prevent escape)
      - custom PROMPT_COMMAND logs each command to pact audit
      - rbash or bash --restricted as base
      - mount namespace: hide sensitive paths if configured
      - cgroup: session-level resource limits
  → session start/end logged to journal
  → session ends: cleanup PTY, cgroup, log session summary

Restriction layers (defense in depth, not command parsing):

PATH restriction: only whitelisted binaries are reachable. The agent builds a restricted PATH from the vCluster’s shell_whitelist, symlinking allowed commands into a session-specific directory (/run/pact/shell/<sid>/bin/). Bash in restricted mode (rbash) prevents changing PATH or running commands by absolute path.
PROMPT_COMMAND audit: bash’s PROMPT_COMMAND hook runs before each prompt, logging the previous command ($(history 1)) to pact’s audit pipeline. This captures what was actually executed, not what pact thinks was executed.
Mount namespace (optional): hide /root, /home, SSH keys, and other sensitive paths from the shell session.
Seccomp/cgroup: session-level resource limits and optional syscall filtering.
State change detection: the existing drift observer (eBPF + inotify + netlink) detects changes made during the session. These trigger commit windows as normal — the shell doesn’t need to pre-classify commands.

What pact exec does vs pact shell:

pact exec: pact controls the full command lifecycle (whitelist, classify, fork/exec). No shell involved. Suitable for automation and diagnostics.
pact shell: bash controls command execution. pact controls the environment (PATH, namespace, cgroup) and observes changes after the fact. Suitable for interactive debugging.

Learning mode: when a user tries to run a command not in PATH, bash returns “command not found”. The agent detects this (via audit log or PROMPT_COMMAND exit code) and suggests adding the command to the vCluster whitelist.

State Observer (`src/observer/`)

Three detection mechanisms:

eBPF probes (feature-gated ebpf, Linux-only):
- System-level: mount, sethostname, sysctl writes, module load/unload
- Extended: file permission changes, network namespace operations, cgroup modifications
- No overlap with lattice eBPF: lattice traces workload-level events (job lifecycle, GPU allocation). pact traces system-level config changes. Probe attachment points are coordinated to avoid conflicts.
inotify: config file paths (derived from declared state + watch list)
netlink: interface state, address changes, mount events, routing

Observe-only mode for initial deployment (log everything, enforce nothing).

Cross-platform: On macOS (development), a MockObserver simulates drift events for local dev/test. Real observers only compile and run on Linux.

Drift Evaluator (`src/drift/`)

DriftVector across 7 dimensions (mounts, files, network, services, kernel, packages, gpu). Magnitude = weighted Euclidean norm with per-vCluster dimension weights.

Commit Window Manager (`src/commit/`)

Optimistic concurrency. Active consumer check before rollback (don’t unmount filesystems with open handles). Emergency mode: extended window + suspended rollback.

Config Subscription (`src/subscription/`)

After boot, the agent subscribes to BootConfigService.SubscribeConfigUpdates() on the journal for live updates. This stream delivers:

vCluster overlay changes (e.g. pact apply updates the overlay)
Node-specific delta changes (e.g. promoted changes from pact promote)
Policy updates (refreshes cached VClusterPolicy for authorization)
Blacklist changes (updates drift detection exclusions)

This means overlay and policy changes propagate to running nodes without reboot. The agent applies overlay changes through the same path as boot-time config application. If the subscription stream is interrupted, the agent reconnects with from_sequence to resume from the last received update.

Capability Reporter (`src/capability/`)

Five hardware detection backends, each following the trait + Linux/Mock pattern:

GPU (GpuBackend): NVIDIA (nvidia-smi, feature nvidia) + AMD (rocm-smi, feature amd) + Mock
CPU (CpuBackend): /proc/cpuinfo + sysfs (arch, cores, freq, ISA features, NUMA, L3 cache)
Memory (MemoryBackend): /proc/meminfo + sysfs NUMA + dmidecode for type (DDR/HBM, 2s timeout)
Network (NetworkBackend): /sys/class/net/ enumeration, Slingshot (cxi driver), speed, link state
Storage (StorageBackend): /sys/block/ NVMe, /proc/mounts + statvfs (2s timeout), diskless detection

Reports to lattice scheduler via tmpfs manifest (/run/pact/capability.json) + unix socket (consumed by lattice-node-agent, which pact supervises as a child process).

Emergency Mode (`src/emergency/`)

pact emergency --reason "..." → extended window, no rollback, full audit logging. Must end with explicit commit or rollback. Stale emergency → alert + scheduling hold.

Cross-Platform Development

Three-tier strategy for macOS development:

Feature-gate: #[cfg(target_os = "linux")] for cgroup v2, eBPF, netlink, inotify, PTY allocation. Stubs compile on macOS.
Mock implementations: MockSupervisor, MockObserver, MockGpuBackend, MockCpuBackend, MockMemoryBackend, MockNetworkBackend, MockStorageBackend for local dev/test on macOS. Unit + integration tests run with mocks.
Devcontainer: Linux container for integration + acceptance tests (BDD/cucumber). Real supervisor, real observers, real cgroups. CI runs in this environment.

Resource Budget

RSS: < 50 MB (including eBPF maps and supervisor overhead)
CPU steady state: < 0.5%
CPU during drift eval: < 2%
CPU during shell session: depends on commands executed

Keyboard shortcuts

PACT Documentation