Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Slurm Migration

Command Mapping

SlurmLatticeNotes
sbatch script.shlattice submit script.sh#SBATCH directives are parsed
squeuelattice status
squeue -u $USERlattice statusDefault shows own jobs
scancel 12345lattice cancel 12345
salloclattice sessionInteractive allocation
srun --pty bashlattice attach <id>Attach terminal
sinfolattice nodesCluster node overview
sacctlattice status --allHistorical view

Directive Mapping

#SBATCH DirectiveLattice EquivalentNotes
--nodes=N--nodes=NExact match
--ntasks=NMapped to node count: ceil(N / tasks_per_node)
--ntasks-per-node=NPassed as task config
--time=HH:MM:SS--walltime=HH:MM:SSAlso accepts 24h, 30m shorthand
--partition=X--vcluster=XConfigurable partition→vCluster mapping
--account=X--tenant=XAccount→tenant mapping
--job-name=X--name=X
--output=fileLogs always go to persistent store; download path configurable
--error=fileSame as --output
--constraint=X--constraint=XFeature matching
--gres=gpu:N--constraint="gpu_count=N"
--qos=X--preemption-class=NConfigurable QOS→class mapping
--array=0-99%20--task-group=0-99%20
--dependency=afterok:ID--depends-on=ID:success
--exclusiveDefaultLattice always allocates full nodes

Environment Variables

When Slurm compatibility is enabled (compat.set_slurm_env: true), Lattice sets familiar environment variables inside allocations:

VariableValue
SLURM_JOB_IDAllocation ID
SLURM_JOB_NAMEAllocation name
SLURM_NNODESNumber of allocated nodes
SLURM_NODELISTComma-separated node list
SLURM_NTASKSTask count
SLURM_SUBMIT_DIRWorking directory at submission

Lattice also sets its own LATTICE_* equivalents.

What’s Different

Full-Node Scheduling

Lattice always allocates full nodes (no sub-node sharing). This simplifies resource management and improves performance isolation. If you’re used to --ntasks=1 on a shared node, you’ll get the whole node.

No Partitions — vClusters

Slurm partitions map to Lattice vClusters, but vClusters are more flexible: each has its own scheduling policy (backfill, bin-pack, FIFO, reservation) and weight tuning.

Topology-Aware Placement

Lattice automatically packs multi-node jobs within the same Slingshot dragonfly group for optimal network performance. No manual --switches needed.

Data Staging

Lattice can pre-stage data during queue wait time. Add --data-mount="s3://bucket/data:/data" and the scheduler factors data locality into placement decisions.

Checkpointing

Unlike Slurm’s --requeue, Lattice coordinates checkpointing before preemption. Declare --checkpoint=signal and your job receives SIGUSR1 before being suspended.

Migration Steps

  1. Start with existing scripts#SBATCH directives work out of the box
  2. Replace sbatch/squeue/scancel with lattice submit/status/cancel
  3. Gradually adopt native features — data staging, checkpointing, DAGs, uenv
  4. Tune scheduling weights — use the RM-Replay simulator for A/B comparison