Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ADR-019: Gateway Deployment Model

Status: Accepted Date: 2026-04-17 Context: ADV-ARCH-03 (monolith blast radius), analyst backpass contention 4

Decision

Gateways run in-process with kiseki-server (monolith per node). Client resilience is provided by multi-endpoint resolution, not per-process gateway isolation.

Rationale

This is a distributed system with no master. Every storage node runs kiseki-server (log + chunk + composition + view + gateways). Clients resolve to multiple endpoints:

Client (NFS/S3/native)
  │
  ├── DNS round-robin: kiseki-nfs.cluster.local → [node1, node2, node3, ...]
  ├── Multiple A/AAAA records
  ├── Native client: seed list → discovery → multiple endpoints
  │
  └── On node failure: client reconnects to next endpoint
      (NFS: automatic reconnect; S3: retry to different host;
       native: transport failover)

Why monolith is acceptable

ConcernMitigation
Gateway crash = node crashClient reconnects to another node (seconds)
All tenants on crashed node affectedTenants are served by multiple nodes; one node loss = partial, not total
Memory leak in gateway affects log/chunkResource limits via cgroups; OOM killer targets the process, not the node
Bug in NFS gateway affects S3 gatewayAccept — both are in the same process. Isolation adds operational complexity disproportionate to the risk

Why NOT separate gateway processes

  • Additional process management per node (spawn, monitor, restart, IPC)
  • Performance overhead of IPC between gateway and log/chunk/view
  • Operational complexity (more processes to configure, monitor, upgrade)
  • The resilience model is client-side multi-endpoint, not server-side process isolation

Client resolution

Client typeResolution mechanism
NFSDNS (multiple A records), NFS mount with multiple server addresses
S3DNS round-robin, HTTP retry to next endpoint on 5xx
NativeSeed list → fabric discovery → multiple endpoints, automatic failover

Consequences

  • kiseki-server remains a single-process monolith per node
  • Client-side resilience is the primary availability mechanism
  • Update failure-modes.md: F-D1 (gateway crash) → node-scoped, not protocol-scoped
  • Node loss tolerance depends on tenant data distribution across nodes