Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Configuration Reference

Kiseki is configured entirely through environment variables. There are no configuration files to manage. Every tunable parameter has a sensible default. Variables are grouped by function below.


Network addresses

VariableDefaultDescription
KISEKI_DATA_ADDR0.0.0.0:9100Listen address for data-path gRPC (log, chunk, composition, view, discovery).
KISEKI_ADVISORY_ADDR0.0.0.0:9101Listen address for the Workflow Advisory gRPC service. Runs on a dedicated tokio runtime, isolated from the data path (ADR-021).
KISEKI_S3_ADDR0.0.0.0:9000Listen address for the S3 HTTP gateway.
KISEKI_NFS_ADDR0.0.0.0:2049Listen address for the NFS gateway (v3 + v4.2).
KISEKI_METRICS_ADDR0.0.0.0:9090Listen address for Prometheus metrics (/metrics), health endpoint (/health), and admin dashboard (/ui).
KISEKI_RAFT_ADDR0.0.0.0:9300Listen address for Raft consensus traffic between nodes.

All addresses accept the host:port format. Use 0.0.0.0 to bind to all interfaces or a specific IP to restrict to one network.


Cluster membership

VariableDefaultDescription
KISEKI_NODE_ID(required)Unique integer identifier for this node within the cluster. Must be stable across restarts.
KISEKI_RAFT_PEERS(required)Comma-separated list of id=host:port pairs for all Raft voters. Example: 1=node1:9300,2=node2:9300,3=node3:9300. Must be identical on every node.
KISEKI_BOOTSTRAPfalseWhen true, the node creates an initial shard on first start. Set to true on exactly one node during initial cluster formation, then set back to false.

Storage

VariableDefaultDescription
KISEKI_DATA_DIR/var/lib/kisekiRoot directory for all persistent state. Contains Raft log (raft/log.redb), key epochs (keys/epochs.redb), chunk metadata (chunks/meta.redb), and inline small-file content (small/objects.redb). Must reside on a low-latency device (NVMe or SSD strongly recommended; HDD triggers a boot warning).

Data directory layout

KISEKI_DATA_DIR/
  raft/log.redb            Raft log entries (bounded by snapshot policy)
  keys/epochs.redb         Key epoch metadata (<10 MB)
  chunks/meta.redb         Chunk extent index (scales with file count)
  small/objects.redb        Small-file encrypted content (capacity-managed)

TLS / mTLS

VariableDefaultDescription
KISEKI_CA_PATH(none)Path to the Cluster CA certificate (PEM). Required for production. When set, all gRPC connections require mTLS.
KISEKI_CERT_PATH(none)Path to this node’s TLS certificate (PEM), signed by the Cluster CA.
KISEKI_KEY_PATH(none)Path to this node’s TLS private key (PEM). Never logged, printed, or transmitted.
KISEKI_CRL_PATH(none)Path to a CRL file (PEM) for certificate revocation. Reloaded periodically. Optional; if not set, CRL checking is disabled.

When KISEKI_CA_PATH is not set, the server runs without TLS. This is acceptable for development but must not be used in production.


Client-side cache (ADR-031)

These variables configure the native client cache on compute nodes running kiseki-client-fuse.

VariableDefaultDescription
KISEKI_CACHE_MODEorganicCache operating mode. One of: pinned (staging-driven, eviction-resistant), organic (LRU with usage-weighted retention), bypass (no caching). Mode is per session, not per file.
KISEKI_CACHE_DIR$KISEKI_DATA_DIR/cacheDirectory for L2 cache pools on local NVMe. Each client process creates an isolated pool with a unique pool_id.
KISEKI_CACHE_L1_MAX1073741824 (1 GB)Maximum bytes for the in-memory L1 cache (decrypted plaintext chunks). Bounded by process memory.
KISEKI_CACHE_L2_MAX107374182400 (100 GB)Maximum bytes for the on-disk L2 cache on local NVMe. Per-process, per-tenant isolation via pool directories.
KISEKI_CACHE_META_TTL_MS5000 (5 seconds)Metadata TTL in milliseconds. File-to-chunk-list mappings are served from cache within this window. After expiry, mappings are re-fetched from canonical. This is the sole freshness window: chunk data itself has no TTL because chunks are immutable (I-C1).
KISEKI_CACHE_POOL_ID(none)Adopt an existing L2 cache pool instead of creating a new one. Used for staging handoff from a Slurm prolog daemon to a workload process.

Cache behavior notes

  • Pinned mode: Pre-staged datasets remain in cache until explicitly released. Best for training workloads that re-read the same data across epochs.
  • Organic mode: LRU eviction with usage-weighted retention. Default for mixed workloads.
  • Bypass mode: No caching at all. Best for checkpoint/restart and streaming workloads.
  • On process restart, the client creates a new L2 pool (wiping orphaned pools). A kiseki-cache-scrub service cleans orphans on node boot.
  • Disconnects longer than 300 seconds (configurable) wipe the entire cache.
  • Crypto-shred events wipe all cached plaintext for the affected tenant within the key health check interval (default 30 seconds).

Metadata capacity (ADR-030)

These variables control the dynamic inline threshold for small-file placement.

VariableDefaultDescription
KISEKI_META_SOFT_LIMIT_PCT50Normal operating ceiling for system disk metadata usage, as a percentage of system partition capacity. Exceeding this triggers inline threshold reduction.
KISEKI_META_HARD_LIMIT_PCT75Absolute maximum for system disk metadata usage. Exceeding this forces the inline threshold to the floor (128 bytes) and emits an alert via out-of-band gRPC (not Raft).

The inline threshold determines whether a file’s encrypted content is stored in small/objects.redb (metadata tier, NVMe) or as a chunk extent on a raw block device (data tier). The threshold is computed per-shard as the minimum affordable threshold across all Raft voters, clamped between 128 bytes (floor) and 64 KB (ceiling).


Observability

VariableDefaultDescription
OTEL_EXPORTER_OTLP_ENDPOINT(none)OpenTelemetry OTLP gRPC endpoint for distributed traces. Example: http://jaeger:4317. When not set, tracing is disabled.
OTEL_SERVICE_NAMEkiseki-serverService name reported in traces. Set to kiseki-keyserver or kiseki-client for other binaries.
RUST_LOGinfoLogging filter directive for the tracing crate. Supports per-module granularity. Examples: kiseki=debug, kiseki_raft=trace,kiseki=info, warn.
KISEKI_LOG_FORMATtextLog output format. text for human-readable, json for structured JSON (one line per event). Use json in production for log aggregation.

Tuning parameters (runtime)

The following parameters are set at runtime via the StorageAdminService gRPC API (SetTuningParams / GetTuningParams), not via environment variables. They are listed here for reference.

Cluster-wide tuning

ParameterDefaultRangeDescription
compaction_rate_mb_s10010-1000Background compaction throughput cap (MB/s).
gc_interval_s30060-3600Interval between GC scans for reclaimable chunks.
rebalance_rate_mb_s500-500Background rebalance/evacuation throughput (MB/s).
scrub_interval_h168 (7 days)24-720Interval between integrity scrub runs.
max_concurrent_repairs41-32Maximum parallel EC repair jobs.
stream_proc_poll_ms10010-1000View materialization polling interval (ms).
inline_threshold_bytes4096512-65536Default inline threshold for new shards.
raft_snapshot_interval100001000-100000Entries between Raft snapshots.

Per-pool tuning

ParameterDefaultRangeDescription
ec_data_chunks4 (NVMe) / 8 (HDD)2-16EC data fragment count. Immutable per pool after creation (I-C6).
ec_parity_chunks2 (NVMe) / 3 (HDD)1-8EC parity fragment count. Immutable per pool after creation.
replication_count32-5For replication pools (non-EC).
warning_threshold_pctPer device class50-95Pool capacity warning level.
critical_threshold_pctPer device class60-98Pool capacity critical level. Writes rejected.
readonly_threshold_pctPer device class70-99Read-only level. In-flight writes drain.
target_fill_pct70 (SSD) / 80 (HDD)50-90Rebalance target fill level.

Default capacity thresholds by device class:

StateNVMe/SSDHDD
Healthy0-75%0-85%
Warning75-85%85-92%
Critical85-92%92-97%
ReadOnly92-97%97-99%
Full97-100%99-100%

All tuning parameter changes via SetTuningParams are recorded in the cluster audit shard with parameter name, old value, new value, timestamp, and admin identity (I-A6).


Environment variable summary

Quick reference of all environment variables:

# Network
KISEKI_DATA_ADDR=0.0.0.0:9100
KISEKI_ADVISORY_ADDR=0.0.0.0:9101
KISEKI_S3_ADDR=0.0.0.0:9000
KISEKI_NFS_ADDR=0.0.0.0:2049
KISEKI_METRICS_ADDR=0.0.0.0:9090
KISEKI_RAFT_ADDR=0.0.0.0:9300

# Cluster
KISEKI_NODE_ID=1
KISEKI_RAFT_PEERS=1=node1:9300,2=node2:9300,3=node3:9300
KISEKI_BOOTSTRAP=false

# Storage
KISEKI_DATA_DIR=/var/lib/kiseki

# TLS
KISEKI_CA_PATH=/etc/kiseki/tls/ca.crt
KISEKI_CERT_PATH=/etc/kiseki/tls/server.crt
KISEKI_KEY_PATH=/etc/kiseki/tls/server.key
KISEKI_CRL_PATH=/etc/kiseki/tls/crl.pem

# Cache (client only)
KISEKI_CACHE_MODE=organic
KISEKI_CACHE_DIR=/var/cache/kiseki
KISEKI_CACHE_L1_MAX=1073741824
KISEKI_CACHE_L2_MAX=107374182400
KISEKI_CACHE_META_TTL_MS=5000

# Metadata capacity
KISEKI_META_SOFT_LIMIT_PCT=50
KISEKI_META_HARD_LIMIT_PCT=75

# Observability
OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317
OTEL_SERVICE_NAME=kiseki-server
RUST_LOG=kiseki=info
KISEKI_LOG_FORMAT=json