Building an OpenCHAMI Image with pact

This guide covers building a diskless SquashFS compute node image with pact-agent as the init system (PID 1), SPIRE for workload identity, and OpenCHAMI for boot provisioning.

Overview

The boot chain on a diskless HPC node:

BMC/PXE → OpenCHAMI DHCP → iPXE → kernel + initramfs
  → mount SquashFS root (read-only)
  → pivot_root
  → pact-agent starts as PID 1
  → authenticates to journal (SPIRE or bootstrap cert)
  → streams vCluster config overlay
  → applies config (sysctl, modules, mounts, uenv)
  → starts services in dependency order
  → reports capabilities → node ready

What Goes in the Image vs What Gets Streamed

In the SquashFS image (static)	Streamed at boot (dynamic)
pact-agent binary	vCluster overlay (sysctl, modules, mounts)
SPIRE agent binary + config	Node-specific delta (per-node tunables)
Bootstrap CA cert (`/etc/pact/ca.crt`)	Service declarations (what to start)
Base OS packages (glibc, coreutils, etc.)	OPA policy bundles
GPU drivers (NVIDIA/AMD)	Identity (SVID via SPIRE or CSR)
Network drivers (cxi, i40e, etc.)
pact agent config (`/etc/pact/agent.toml`)

The image is read-only. All runtime state goes to tmpfs (/run/pact/, /tmp/).

Prerequisites

OpenCHAMI deployed (SMD, BSS, DHCP, image server)
A build host with mksquashfs, debootstrap (or equivalent), and the pact release binaries
SPIRE server running on management nodes (optional but recommended)
pact-journal quorum running on management nodes

Step 1: Create the Base Root Filesystem

Start with a minimal Linux root. The exact method depends on your distro:

# Ubuntu/Debian
mkdir -p /tmp/pact-image/rootfs
sudo debootstrap --variant=minbase noble /tmp/pact-image/rootfs http://archive.ubuntu.com/ubuntu

# Or SUSE (for Cray/HPE systems)
# zypper --root /tmp/pact-image/rootfs install ...

# Or from an existing node image
# rsync -a /path/to/base-image/ /tmp/pact-image/rootfs/

Install essential packages in the chroot:

sudo chroot /tmp/pact-image/rootfs /bin/bash -c '
    apt-get update
    apt-get install -y --no-install-recommends \
        ca-certificates \
        iproute2 \
        kmod \
        procps \
        util-linux \
        chrony
'

Step 2: Install pact-agent

Download the agent binary matching your target hardware:

# Example: x86_64 NVIDIA with PactSupervisor
curl -LO https://github.com/witlox/pact/releases/latest/download/pact-agent-x86_64-nvidia-pact.tar.gz
sudo tar xzf pact-agent-x86_64-nvidia-pact.tar.gz -C /tmp/pact-image/rootfs/usr/local/bin/

Step 3: Install SPIRE Agent

SPIRE provides workload identity (X.509 SVIDs) for mTLS between pact-agent and the journal. If SPIRE is not available, pact falls back to the bootstrap certificate + ephemeral CA workflow.

# Download SPIRE agent
SPIRE_VERSION=1.12.0
curl -LO https://github.com/spiffe/spire/releases/download/v${SPIRE_VERSION}/spire-${SPIRE_VERSION}-linux-amd64-musl.tar.gz
tar xzf spire-${SPIRE_VERSION}-linux-amd64-musl.tar.gz
sudo cp spire-${SPIRE_VERSION}/bin/spire-agent /tmp/pact-image/rootfs/usr/local/bin/

Create the SPIRE agent config:

sudo mkdir -p /tmp/pact-image/rootfs/etc/spire
sudo tee /tmp/pact-image/rootfs/etc/spire/agent.conf << 'EOF'
agent {
    data_dir = "/run/spire/agent"
    log_level = "INFO"
    server_address = "spire-server.mgmt"
    server_port = "8081"
    socket_path = "/run/spire/agent.sock"
    trust_domain = "example.org"

    # Node attestation via TPM or join token
    NodeAttestor "tpm_devid" {
        plugin_data {}
    }
}
EOF

For sites without TPM, use join token attestation instead:

# On the SPIRE server, create a join token for this node class:
#   spire-server token generate -spiffeID spiffe://example.org/pact-agent
# Then inject the token into the image or pass via kernel cmdline.

Step 4: Install GPU Drivers

For NVIDIA nodes:

# Install NVIDIA driver + persistenced (in chroot)
sudo chroot /tmp/pact-image/rootfs /bin/bash -c '
    # Install from your driver repo or CUDA toolkit
    apt-get install -y nvidia-driver-570 nvidia-utils-570
'

For AMD nodes:

sudo chroot /tmp/pact-image/rootfs /bin/bash -c '
    # Install ROCm driver
    apt-get install -y rocm-smi-lib
'

Step 5: Install Network Drivers

For Slingshot (Cray CXI) fabric:

# CXI drivers are typically provided by HPE/Cray as RPMs or DEBs
# Install cxi-driver, cxi-utils, libfabric-cxi
sudo chroot /tmp/pact-image/rootfs /bin/bash -c '
    dpkg -i /path/to/cxi-driver_*.deb
'

Step 6: Configure pact-agent

Create the agent config. The node_id and vcluster are set dynamically at boot via environment variables (OpenCHAMI sets the hostname):

sudo mkdir -p /tmp/pact-image/rootfs/etc/pact
sudo tee /tmp/pact-image/rootfs/etc/pact/agent.toml << 'EOF'
[agent]
# node_id auto-detected from hostname (set by OpenCHAMI DHCP)
enforcement_mode = "enforce"

[agent.supervisor]
backend = "pact"

[agent.journal]
endpoints = [
    "journal-1.mgmt:9443",
    "journal-2.mgmt:9443",
    "journal-3.mgmt:9443",
]
tls_enabled = true
tls_ca = "/etc/pact/ca.crt"

[agent.identity]
provider = "spire"
spire_socket = "/run/spire/agent.sock"

[agent.observer]
ebpf_enabled = true
inotify_enabled = true
netlink_enabled = true

[agent.shell]
enabled = true
listen = "0.0.0.0:9445"
whitelist_mode = "strict"

[agent.capability]
manifest_path = "/run/pact/capability.json"
socket_path = "/run/pact/capability.sock"
gpu_poll_interval_seconds = 30

[agent.commit_window]
base_window_seconds = 900
drift_sensitivity = 2.0
emergency_window_seconds = 14400

[agent.blacklist]
patterns = [
    "/tmp/**", "/var/log/**", "/proc/**", "/sys/**",
    "/dev/**", "/run/user/**", "/run/pact/**", "/run/lattice/**",
]
EOF

Step 7: Configure pact-agent as PID 1

Create an init wrapper that sets up minimal infrastructure before handing off to pact-agent. The SquashFS root is read-only, so we need tmpfs mounts:

sudo tee /tmp/pact-image/rootfs/init << 'INITEOF'
#!/bin/sh
# Minimal init for pact-agent as PID 1 on diskless nodes.
# Called directly by the kernel after pivot_root.

# Mount essential filesystems
mount -t proc proc /proc
mount -t sysfs sysfs /sys
mount -t devtmpfs devtmpfs /dev
mount -t tmpfs tmpfs /run
mount -t tmpfs tmpfs /tmp
mkdir -p /run/pact /run/spire/agent /run/lock /var/log

# Load essential modules
modprobe -a overlay tmpfs

# Set hostname from kernel cmdline (OpenCHAMI sets pact.nodeid=)
NODEID=$(sed -n 's/.*pact.nodeid=\([^ ]*\).*/\1/p' /proc/cmdline)
[ -n "$NODEID" ] && hostname "$NODEID"

# Start SPIRE agent in background (if available)
if [ -x /usr/local/bin/spire-agent ]; then
    /usr/local/bin/spire-agent run \
        -config /etc/spire/agent.conf \
        -logLevel INFO &
    # Give SPIRE a moment to create the socket
    sleep 1
fi

# Hand off to pact-agent
exec /usr/local/bin/pact-agent --config /etc/pact/agent.toml
INITEOF
sudo chmod +x /tmp/pact-image/rootfs/init

Step 8: Install Bootstrap CA Certificate

For the initial boot before SPIRE is available, include the journal’s CA cert:

# Copy from a journal node or generate during journal setup
sudo cp /etc/pact/ca.crt /tmp/pact-image/rootfs/etc/pact/ca.crt

If using SPIRE exclusively, this cert is only needed for the first connection to obtain the SPIRE join token or for fallback when SPIRE is unavailable.

Step 9: Build the SquashFS Image

sudo mksquashfs /tmp/pact-image/rootfs /tmp/pact-image/pact-node.squashfs \
    -comp zstd \
    -Xcompression-level 19 \
    -noappend \
    -no-recovery \
    -processors $(nproc)

Typical image sizes:

Base + pact-agent + SPIRE: ~300 MB
With NVIDIA drivers: ~800 MB
With ROCm: ~600 MB

Step 10: Register with OpenCHAMI

Upload the image to OpenCHAMI’s image server and configure the boot parameters:

# Upload image to OpenCHAMI image server.
# Use your site's image management tooling to upload the SquashFS to the image server,
# e.g. scp, s3 upload, or your image registry workflow.

# Set boot parameters for a node group via BSS REST API
curl -X PUT https://bss.mgmt/boot/v1/bootparameters \
    -H "Content-Type: application/json" \
    -d '{
        "macs": [],
        "hosts": ["ml-training"],
        "params": "root=live:http://image-server/pact-ml-training-v1.squashfs init=/init pact.nodeid=${hostname} console=tty0",
        "kernel": "http://image-server/vmlinuz",
        "initrd": "http://image-server/initramfs.img"
    }'

The init=/init parameter tells the kernel to run our init wrapper. The pact.nodeid=${hostname} is expanded by OpenCHAMI’s DHCP/BSS.

Step 11: Pre-enroll Nodes

Before the first boot, register nodes in the journal:

# Enroll nodes with their hardware identity
pact node enroll compute-001 --mac aa:bb:cc:dd:ee:01
pact node enroll compute-002 --mac aa:bb:cc:dd:ee:02
# ... or batch import from SMD inventory:
pact node import --group ml-training

# Assign to vCluster
pact node assign compute-001 --vcluster ml-training
pact node assign compute-002 --vcluster ml-training

Step 12: Boot and Verify

Power on the nodes via OpenCHAMI/Redfish:

# Power on nodes via BMC/Redfish (use your BMC management tool: ipmitool, Redfish, etc.)
# Example with curl against OpenCHAMI SMD:
curl -X POST https://smd.mgmt/hsm/v2/State/Components/x1000c0s0b0n0/Actions/PowerCycle \
    -H "Content-Type: application/json" \
    -d '{"ResetType": "On"}'
# Or use pact's delegation command for enrolled nodes:
pact reboot compute-001

Monitor boot progress:

# Watch the journal for enrollment events
pact watch --vcluster ml-training

# Check node status (should appear within ~2 seconds of boot)
pact status --vcluster ml-training

# Verify capabilities
pact cap compute-001

# Check service status
pact service status compute-001

Updating the Image

To update the base image (new drivers, new pact-agent version):

Build a new SquashFS image (steps 1-9)
Upload to OpenCHAMI image server (using your site’s image management tooling)
Update boot config via BSS REST API: curl -X PUT https://bss.mgmt/boot/v1/bootparameters -d '{"hosts":["ml-training"],"params":"root=live:http://image-server/pact-ml-training-v2.squashfs ..."}'
Rolling reboot: pact drain compute-001 && pact reboot compute-001

Nodes pick up the new image on reboot. pact configuration (sysctl, mounts, services) is streamed from the journal — not baked into the image — so most config changes don’t require a new image.

Including Lattice (Supercharged Mode)

When deploying pact alongside lattice for workload scheduling, the compute node image includes both pact-agent and lattice-node-agent. pact supervises lattice-node-agent as a declared service — this is “supercharged mode” where both systems cooperate.

Additional binaries in the image

Add lattice-node-agent to the SquashFS image alongside pact-agent:

# Download lattice node agent
curl -LO https://github.com/witlox/lattice/releases/latest/download/lattice-node-agent-x86_64.tar.gz
sudo tar xzf lattice-node-agent-x86_64.tar.gz -C /tmp/pact-image/rootfs/usr/local/bin/

lattice-node-agent config

Create the lattice node agent config. The agent connects to the lattice scheduler quorum and reports node capabilities (read from pact’s capability manifest):

sudo mkdir -p /tmp/pact-image/rootfs/etc/lattice
sudo tee /tmp/pact-image/rootfs/etc/lattice/node-agent.toml << 'EOF'
[node_agent]
# lattice scheduler quorum endpoints (on HSN, not management network — ADR-017)
scheduler_endpoints = [
    "lattice-1.hsn:50051",
    "lattice-2.hsn:50051",
    "lattice-3.hsn:50051",
]

# pact capability manifest (lattice-node-agent reads this)
capability_manifest = "/run/pact/capability.json"
capability_socket = "/run/pact/capability.sock"

# Namespace handoff socket (pact creates namespaces, lattice uses them)
namespace_socket = "/run/pact/ns-handoff.sock"

# Mount refcounting (shared between pact and lattice)
mount_socket = "/run/pact/mount-refcount.sock"

[node_agent.identity]
# Uses the same SPIRE socket as pact for workload identity
spire_socket = "/run/spire/agent.sock"
EOF

Declare lattice-node-agent as a pact service

pact-agent supervises lattice-node-agent as a declared service. This is configured in the vCluster overlay (streamed at boot, not baked in the image).

Create the overlay spec:

# vcluster-overlay.toml — applied with: pact apply vcluster-overlay.toml
[vcluster.ml-training.services.lattice-node-agent]
binary = "/usr/local/bin/lattice-node-agent"
args = ["--config", "/etc/lattice/node-agent.toml"]
restart_policy = "always"
order = 50
depends_on = ["chronyd"]

[vcluster.ml-training.services.chronyd]
binary = "/usr/sbin/chronyd"
args = ["-d"]
restart_policy = "always"
order = 10

# For GPU nodes, add nvidia-persistenced
[vcluster.ml-training.services.nvidia-persistenced]
binary = "/usr/bin/nvidia-persistenced"
args = ["--no-persistence-mode"]
restart_policy = "on_failure"
order = 20

Boot sequence with lattice

Kernel → SquashFS root → pact-agent (PID 1)
  → auth to journal → stream vCluster config overlay
  → apply: kernel params, modules, mounts, uenv
  → start services in dependency order:
      1. chronyd (time sync)
      2. nvidia-persistenced (GPU, if declared)
      3. lattice-node-agent (workload scheduling)
  → pact writes CapabilityReport to /run/pact/capability.json
  → lattice-node-agent reads manifest, reports to scheduler
  → node ready for workloads

Supercharged CLI

With both systems running, operators get unified admin access:

# pact-native commands work as before
pact status --vcluster ml-training
pact exec compute-001 -- nvidia-smi
pact diag compute-001 --grep "ECC"

# Supercharged commands query both systems
pact jobs list --vcluster ml-training    # lattice allocations
pact health                              # pact + lattice health
pact drain compute-001                   # lattice drain + pact audit

Configure the lattice endpoint for supercharged commands. Note: the pact CLI connects to lattice’s HSN-facing gRPC port from the admin workstation (which must have HSN access or a management-to-HSN gateway):

export PACT_LATTICE_ENDPOINT=http://lattice-1.hsn:50051
export PACT_LATTICE_TOKEN=<lattice-auth-token>

Network separation

pact and lattice run on separate networks (ADR-017). pact uses the management network exclusively. Lattice runs entirely on the HSN — including agent↔scheduler communication, Raft consensus, and workload data.

Traffic	Network	Port
pact agent ↔ journal	Management	9443, 9444
pact shell/exec/diag	Management	9445
pact journal metrics	Management	9091
lattice agent ↔ scheduler	HSN	50051
lattice Raft consensus	HSN	9000
Workload data (MPI, NCCL)	HSN	Application-defined

pact never touches the HSN. Lattice never touches pact’s management ports. If the HSN goes down, pact continues operating (admin access, config management) while lattice pauses scheduling. If the management network goes down, pact agents use cached config while lattice is unaffected.

Troubleshooting

Node doesn’t appear after boot

# Check if the node enrolled
pact node list --vcluster ml-training

# Check journal logs for enrollment errors
pact audit --source pact -n 20

# If node is reachable via BMC console:
#   - Check /run/pact/ for agent logs
#   - Check if SPIRE socket exists: ls /run/spire/agent.sock
#   - Check if journal is reachable: curl -k https://journal-1.mgmt:9443/health

SPIRE agent fails to attest

# On the SPIRE server, check registration entries:
spire-server entry show

# Create a join token for manual attestation:
spire-server token generate -spiffeID spiffe://example.org/pact-agent/compute-001

# Pass the token to the node via kernel cmdline (update BSS):
#   curl -X PUT https://bss.mgmt/boot/v1/bootparameters \
#     -d '{"hosts":["compute-001"],"params":"... spire.join_token=<token>"}'

Agent falls back to bootstrap identity

This is normal on first boot or when SPIRE is unavailable. The agent will:

Use the bootstrap CA cert for initial journal connection
Submit a CSR to the journal
Journal validates hardware identity and signs the cert
Agent switches to the journal-signed cert

Once SPIRE becomes available, the agent rotates to SPIRE-managed mTLS automatically (identity cascade: SPIRE → journal-signed → bootstrap).

Keyboard shortcuts

PACT Documentation