Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

System Overview

Kiseki is a distributed storage system designed for HPC and AI workloads. It provides a unified data fabric with POSIX (FUSE), NFS, and S3 access paths, two-layer encryption with tenant-controlled crypto-shred, and pluggable HPC transports (CXI/Slingshot, InfiniBand, RoCEv2).

Workspace structure

The codebase is a single Rust workspace with 18 crates:

CratePurpose
kiseki-commonShared types, HLC, identifiers, errors
kiseki-protoGenerated protobuf/gRPC code
kiseki-cryptoFIPS AEAD (AES-256-GCM), envelope encryption, tenant KMS providers
kiseki-raftShared Raft config, redb log store, TCP transport
kiseki-transportTransport abstraction: TCP+TLS, RDMA verbs, CXI/libfabric
kiseki-logLog context: delta ordering, shard lifecycle, Raft consensus
kiseki-blockRaw block device I/O, bitmap allocator, superblock (ADR-029)
kiseki-chunkChunk storage: placement, erasure coding, GC, device management
kiseki-compositionComposition context: namespace, refcount, multipart
kiseki-viewView materialization: stream processors, MVCC pins
kiseki-gatewayProtocol gateway: NFS and S3 translation
kiseki-clientNative client: FUSE, transport selection, client-side cache
kiseki-keymanagerSystem key manager with Raft HA
kiseki-auditAppend-only audit log with per-tenant shards
kiseki-advisoryWorkflow advisory: hints, telemetry, budgets (ADR-020/021)
kiseki-controlControl plane: tenancy, IAM, policy, federation
kiseki-serverStorage node binary (composes all server-side crates)
kiseki-acceptanceBDD acceptance tests (cucumber-rs)

Bounded contexts

The domain is organized into eight bounded contexts, each with a distinct responsibility, failure domain, and scaling concern:

  1. Log – Delta ordering, Raft consensus, shard lifecycle
  2. Chunk Storage – Encrypted chunk persistence, placement, EC, GC
  3. Composition – Tenant-scoped metadata assembly, namespace management
  4. View Materialization – Protocol-shaped materialized projections
  5. Protocol Gateway – NFS and S3 wire protocol translation
  6. Control Plane – Tenancy, IAM, quota, policy, federation
  7. Key Management – System DEK/KEK, tenant KMS providers, crypto-shred
  8. Workflow Advisory – Client hints, telemetry feedback (cross-cutting)

Additionally, Native Client runs on compute nodes as a separate trust boundary and Block I/O handles raw device management underneath chunk storage.

Data path

Client (plaintext) ──encrypt──► Gateway / Native Client
                                       │
                                       ▼
                                  Composition
                                  (assemble chunks, record delta)
                                       │
                              ┌────────┴────────┐
                              ▼                 ▼
                          Log (Raft)       Chunk Storage
                     (commit delta,      (write encrypted
                      replicate)          chunk to device)

Write path: The client (native or protocol) encrypts data with the tenant KEK wrapping a system DEK. The composition layer assembles chunk references and records a delta. The delta is committed through Raft on the owning shard. Chunks are written to affinity pools with erasure coding.

Read path: The client issues a view lookup (materialized from log deltas). The view resolves chunk references. Chunks are read from devices, decrypted, and returned to the client.

Control path

Admin ──► Control Plane (gRPC)
              │
              ├── Tenant / Namespace / Quota / Policy
              ├── Flavor management
              ├── Federation (async cross-site)
              └── Advisory policy (hint budgets, profiles)

The control plane manages tenant lifecycle, IAM, quotas, compliance tags, placement policy, and federation. It communicates with storage nodes via gRPC on the management network. The control plane depends only on kiseki-common and kiseki-proto (crate-graph firewall, ADR-027).

Advisory path (ADR-020)

Client ──hints──► Advisory Runtime ──telemetry──► Client
                      │
                      ├── Route hints to Chunk / View / Composition
                      ├── Emit caller-scoped telemetry feedback
                      └── Audit advisory events

The workflow advisory system is a cross-cutting concern (not a bounded context). It carries two flows over a bidirectional gRPC channel per declared workflow:

  • Hints (client to storage): advisory steering signals for prefetch, affinity, priority, and phase-adaptive tuning. Never authoritative (I-WA1).
  • Telemetry feedback (storage to client): caller-scoped signals about backpressure, locality, materialization lag, and QoS headroom (I-WA5).

The advisory runtime runs on a dedicated tokio runtime, isolated from the data path. Advisory failures never block data-path operations (I-WA2).

Network ports

PortPurpose
9100Data-path gRPC (Log, Chunk, Composition, View, Discovery)
9101Advisory gRPC (WorkflowAdvisoryService)
9000S3 HTTP gateway
2049NFS server
9090Prometheus metrics + health + admin UI

Binaries

BinaryContentsDeployment
kiseki-serverLog, Chunk, Composition, View, Gateway, Audit, AdvisoryEvery storage node
kiseki-client-fuseNative client with FUSECompute nodes
kiseki-controlControl planeManagement network (3+ instances)
kiseki-keyserverSystem key manager (Raft HA)Dedicated cluster (3-5 nodes)