Getting Started
Overview
Lattice is a distributed workload scheduler for HPC and AI infrastructure. It schedules both batch jobs (training runs, simulations) and long-running services (inference endpoints, monitoring) on shared GPU-accelerated clusters.
If you’re coming from Slurm, most concepts map directly — see the Slurm migration guide for a quick comparison.
Prerequisites
- A running Lattice cluster (ask your admin for the API endpoint)
- The
latticeCLI installed on your workstation or login node - Your tenant credentials (OIDC token or mTLS certificate)
Installing the CLI
# Determine architecture
ARCH=$(uname -m | sed 's/aarch64/arm64/')
# Download from GitHub Releases
curl -sSfL "https://github.com/witlox/lattice/releases/latest/download/lattice-${ARCH}.tar.gz" | tar xz
sudo mv lattice /usr/local/bin/
# Or build from source
cargo build --release -p lattice-cli
sudo cp target/release/lattice /usr/local/bin/
Configuration
Create ~/.config/lattice/config.yaml:
endpoint: "lattice-api.example.com:50051"
tenant: "my-team"
# Optional: default vCluster
vcluster: "gpu-batch"
Or use environment variables:
export LATTICE_ENDPOINT="lattice-api.example.com:50051"
export LATTICE_TENANT="my-team"
Your First Job
Submit a batch script
lattice submit train.sh
# Submitted allocation a1b2c3d4
Check status
lattice status
# ID NAME STATE NODES WALLTIME ELAPSED VCLUSTER
# a1b2c3d4 train.sh Running 4 24:00:00 00:12:34 gpu-batch
View logs
lattice logs a1b2c3d4
# [2026-03-05T10:00:12Z] Epoch 1/100, loss=2.341
# [2026-03-05T10:01:45Z] Epoch 2/100, loss=1.892
Cancel a job
lattice cancel a1b2c3d4
Next Steps
- Submitting Workloads — detailed submission options
- Interactive Sessions — attach a terminal to running jobs
- DAG Workflows — multi-step pipelines with dependencies
- Python SDK — programmatic access from notebooks and agents