Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DAG Workflows

DAGs (Directed Acyclic Graphs) let you define multi-step pipelines where allocations depend on each other.

YAML Definition

# workflow.yaml
name: training-pipeline
allocations:
  - name: preprocess
    entrypoint: "python preprocess.py"
    nodes: 2
    walltime: "2h"

  - name: train
    entrypoint: "torchrun train.py"
    nodes: 64
    walltime: "72h"
    uenv: "prgenv-gnu/24.11:v1"
    depends_on:
      - preprocess: success

  - name: evaluate
    entrypoint: "python eval.py"
    nodes: 1
    walltime: "1h"
    depends_on:
      - train: success

  - name: notify-failure
    entrypoint: "python notify.py --status=failed"
    nodes: 1
    walltime: "10m"
    depends_on:
      - train: failure

Submitting a DAG

lattice dag submit workflow.yaml
# Submitted DAG d1e2f3g4 with 4 allocations

Dependency Conditions

ConditionMeaning
successRun after dependency completes successfully
failureRun after dependency fails
anyRun after dependency completes (success or failure)
correspondingFor task groups: task N depends on task N of the parent

Monitoring DAGs

# DAG status overview
lattice dag status d1e2f3g4

# Detailed graph view
lattice dag status d1e2f3g4 --graph

# Output:
# preprocess [Completed] → train [Running] → evaluate [Pending]
#                                          ↘ notify-failure [Pending]

Cancelling a DAG

# Cancel all allocations in the DAG
lattice dag cancel d1e2f3g4

Cancellation cascades — downstream allocations that haven’t started are cancelled automatically.

Failure Propagation

  • If a success dependency fails, downstream allocations are cancelled
  • If a failure dependency succeeds, those downstream allocations are skipped
  • any dependencies always run regardless of upstream outcome

Limits

  • Maximum 1000 allocations per DAG (configurable by admin)
  • Cycles are rejected at submission time
  • Duplicate allocation names within a DAG are rejected