Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

CLI Design

Design Principle

The CLI is the primary user interface. It should feel natural to Slurm users while exposing Lattice’s richer capabilities. Commands follow a consistent lattice <verb> [resource] [flags] pattern. Output is human-readable by default, machine-parseable with --output=json.

Command Structure

lattice <command> [subcommand] [arguments] [flags]

Global Flags

FlagShortDescription
--output-oOutput format: table (default), json, yaml, wide
--quiet-qSuppress non-essential output
--verbose-vVerbose output (debug info)
--tenant-tOverride tenant (for multi-tenant users)
--vclusterOverride vCluster selection
--configConfig file path (default: ~/.config/lattice/config.yaml)
--no-colorDisable colored output

Authentication Commands

Login (lattice login)

Authenticate with the lattice server. Uses hpc-auth for OIDC token acquisition with cascading flow selection.

# Login (auto-discovers IdP from lattice-api auth discovery endpoint)
lattice login

# Force device code flow (for SSH sessions without browser)
lattice login --flow device

# Force manual paste flow
lattice login --flow manual

# Login to a specific server
lattice login --server cluster.example.com

Token is cached per-server in ~/.config/lattice/tokens.json with 0600 permissions (lenient mode: warn and fix if wrong).

Logout (lattice logout)

Clear cached token and revoke at IdP (best-effort).

lattice logout

Unauthenticated Commands

These commands do not require a token (INV-A1):

  • lattice login / lattice logout
  • lattice --version
  • lattice --help
  • lattice completions <shell>

All other commands require authentication. If no valid token is cached, the CLI prints:

Not logged in. Run `lattice login` first.

Expired tokens are silently refreshed if a valid refresh token exists.

Core Commands

Submit (lattice submit)

Submit an allocation or batch script.

# Submit a script (Slurm-compatible directives parsed)
lattice submit script.sh

# Submit with inline arguments
lattice submit --nodes=64 --walltime=72h --uenv=prgenv-gnu/24.11:v1 -- torchrun train.py

# Submit a task group (job array)
lattice submit --task-group=0-99%20 script.sh

# Submit with dependencies
lattice submit --depends-on=12345:success script.sh

# Submit a DAG from YAML
lattice dag submit workflow.yaml

# Submit to a specific vCluster
lattice submit --vcluster=ml-training script.sh

Output: Allocation ID on success.

Submitted allocation 12345

Status (lattice status)

Query allocation status.

# List own allocations
lattice status

# Specific allocation
lattice status 12345

# Filter by state
lattice status --state=running

# All allocations (tenant admin)
lattice status --all

# Watch mode (refresh every 5s)
lattice status --watch

Default output (table):

ID      NAME           STATE    NODES  WALLTIME   ELAPSED   VCLUSTER
12345   training-run   Running  64     72:00:00   14:23:01  ml-training
12346   eval-job       Pending  4      02:00:00   —         hpc-batch
12347   sweep          Running  1×20   04:00:00   01:12:33  hpc-batch

Wide output (-o wide): Adds columns: tenant, project, uenv, GPU type, dragonfly groups.

Cancel (lattice cancel)

Cancel allocations.

# Cancel single
lattice cancel 12345

# Cancel multiple
lattice cancel 12345 12346 12347

# Cancel all own pending allocations
lattice cancel --state=pending --all-mine

# Cancel a DAG
lattice dag cancel dag-789

Session (lattice session)

Create an interactive session. See sessions.md for details.

# Basic session
lattice session --walltime=4h

# With resources
lattice session --nodes=2 --constraint=gpu_type:GH200 --walltime=8h

# With uenv
lattice session --uenv=prgenv-gnu/24.11:v1 --walltime=4h

Attach (lattice attach)

Attach a terminal to a running allocation. See observability.md.

lattice attach 12345
lattice attach 12345 --node=x1000c0s0b0n3
lattice attach 12345 --command="nvidia-smi -l 1"

Launch (lattice launch)

Run a task within an existing allocation (srun equivalent).

# Run on all nodes
lattice launch --alloc=12345 hostname

# Run on specific number of tasks
lattice launch --alloc=12345 -n 4 ./my_program

# Run interactively with PTY
lattice launch --alloc=12345 --pty bash

Logs (lattice logs)

View allocation logs. See observability.md.

lattice logs 12345
lattice logs 12345 --follow
lattice logs 12345 --stderr --node=x1000c0s0b0n3
lattice logs 12345 --tail=100

Top / Watch / Diag / Compare

Monitoring commands. See observability.md.

lattice top 12345                              # Metrics snapshot
lattice top 12345 --per-gpu                    # Per-GPU breakdown
lattice watch 12345                            # Live streaming metrics
lattice watch 12345 --alerts-only              # Alerts only
lattice diag 12345                             # Network + storage diagnostics
lattice compare 12345 12346 --metric=gpu_util  # Cross-allocation comparison

Telemetry (lattice telemetry)

Switch telemetry mode.

lattice telemetry --alloc=12345 --mode=debug --duration=30m

Nodes (lattice nodes)

View cluster nodes (read-only).

# List all nodes
lattice nodes

# Filter by state
lattice nodes --state=ready

# Filter by vCluster
lattice nodes --vcluster=hpc-batch

# Specific node details
lattice nodes x1000c0s0b0n0

Output:

NODE                STATE   GPUS  VCLUSTER      TENANT    GROUP  CONFORMANCE
x1000c0s0b0n0       Ready   4×GH200  hpc-batch    physics   3      a1b2c3
x1000c0s0b0n1       Ready   4×GH200  hpc-batch    physics   3      a1b2c3
x1000c0s1b0n0       Draining 4×GH200  ml-training  ml-team   7      a1b2c3

History (lattice history)

Query completed allocations (accounting data).

lattice history
lattice history --since=2026-03-01 --until=2026-03-02
lattice history --output=json

DAG Commands (lattice dag)

lattice dag submit workflow.yaml     # Submit a DAG
lattice dag status dag-789           # DAG status with per-allocation states
lattice dag list                     # List DAGs
lattice dag cancel dag-789           # Cancel a DAG

Cache Commands (lattice cache)

lattice cache warm --image=prgenv-gnu/24.11:v1 --group=3
lattice cache status --node=x1000c0s0b0n0
lattice cache evict --image=prgenv-gnu/24.11:v1 --node=x1000c0s0b0n0

Admin Commands (lattice admin)

Administrative commands require system-admin role.

# Node management
lattice node drain x1000c0s0b0n0
lattice node drain x1000c0s0b0n0 --urgent
lattice node undrain x1000c0s0b0n0
lattice node disable x1000c0s0b0n0
lattice node enable x1000c0s0b0n0

# Tenant management
lattice admin tenant create --name=physics --max-nodes=200
lattice admin tenant set-quota --name=physics --max-nodes=250

# vCluster management
lattice admin vcluster create --name=hpc-batch --scheduler=hpc-backfill --tenant=physics
lattice admin vcluster set-weights --name=hpc-batch --priority=0.20 ...

# Configuration
lattice admin config get accounting.enabled
lattice admin config set accounting.enabled=true

# Raft status
lattice admin raft status

Output Formats

FormatFlagUse Case
tableDefaultHuman-readable, aligned columns
wide-o wideExtended columns
json-o jsonMachine-parseable, scripting
yaml-o yamlMachine-parseable, config integration

All formats support piping and redirection. JSON output uses newline-delimited JSON for streaming commands (logs –follow, watch).

Error Messages

Errors are human-readable with actionable guidance:

Error: allocation rejected — tenant "physics" exceeds max_nodes quota
  Current: 195 nodes in use
  Requested: 10 additional nodes
  Limit: 200 nodes

  Hint: Cancel running allocations or request a quota increase from your tenant admin.
Error: no nodes available matching constraints
  GPU type: GH200
  Nodes requested: 64
  Available: 42 (22 in use by your allocations, 136 by other tenants)

  Hint: Reduce node count, use --topology=any, or wait for resources.

Shell Completion

Shell completion is generated for bash, zsh, and fish:

# Generate completion
lattice completion bash > /etc/bash_completion.d/lattice
lattice completion zsh > ~/.zfunc/_lattice
lattice completion fish > ~/.config/fish/completions/lattice.fish

Completions cover: subcommands, flag names, allocation IDs (from recent lattice status), node IDs, vCluster names, uenv names.

Configuration File

# ~/.config/lattice/config.yaml
api_url: "https://lattice.example.com:50051"
default_tenant: "physics"
default_vcluster: "hpc-batch"
default_uenv: "prgenv-gnu/24.11:v1"
output_format: "table"
color: true

Environment variables override config file: LATTICE_API_URL, LATTICE_TENANT, LATTICE_VCLUSTER.

Slurm Compatibility Aliases

For sites migrating from Slurm, optional shell aliases:

# Source from lattice-provided script
source $(lattice compat-aliases)

# Provides:
# sbatch → lattice submit
# squeue → lattice status
# scancel → lattice cancel
# salloc → lattice session
# srun → lattice launch
# sinfo → lattice nodes
# sacct → lattice history

These aliases translate Slurm flags to Lattice flags where possible. See slurm-migration.md for details.

Cross-References