Performance Tests

Benchmark results for kiseki on GCP infrastructure.

Test Environment

Component	Spec
HDD nodes (3)	n2-standard-16, 3 x PD-Standard 200GB each
Fast nodes (2)	n2-standard-16, 2 x local NVMe + 2 x PD-SSD 375GB
Client nodes (3)	n2-standard-8, 100GB SSD cache
Ctrl node (1)	e2-standard-4, orchestrator
Network	GCP VPC, single subnet 10.0.0.0/24
Region	europe-west6-c (Zurich)
Raft	Single group, 5 nodes, node 1 bootstrap
Release	v2026.1.352 (async GatewayOps, ADR-032)

Results (2026-04-24)

Network Bandwidth

Path	Throughput
Client → Leader (n2-standard-8 → n2-standard-16)	15.2 - 15.3 Gbps
HDD → Fast cross-tier (n2-standard-16 → n2-standard-16)	18.3 - 20.4 Gbps

S3 Gateway

All S3 tests run from client nodes (n2-standard-8) with 8-way parallelism.

Write Throughput (single client → leader)

Object Size	Count	Parallelism	Time	Throughput
1 MB	200	8	1,624 ms	123.2 MB/s
4 MB	50	8	239 ms	836.8 MB/s
16 MB	25	8	363 ms	1,101.9 MB/s

Read Throughput

Object Size	Count	Parallelism	Time	Throughput
1 MB	200	8	176 ms	1,136.4 MB/s

PUT Latency (1 KB objects, sequential)

Percentile	Latency
p50	7.6 ms
p99	8.6 ms
avg	7.7 ms
max	9.7 ms

Aggregate Write (3 clients, parallel)

Workload	Time	Aggregate Throughput
3 x 100 x 1 MB (8 concurrent/client)	2,205 ms	136.1 MB/s

NFS / pNFS / FUSE

Not yet tested on GCP. NFS mount from client nodes requires SSH key distribution from the ctrl node (OS Login configuration pending). FUSE requires the kiseki-client binary installed on client nodes.

Local testing (3-node cluster on localhost) confirms all protocols functional via unit and integration tests.

Prometheus Metrics

Gateway request counters showed 0 during the test. The requests_total atomic counter in InMemoryGateway is not wired to the Prometheus metrics exporter yet.

Local Test Results (same binary, localhost)

For comparison, local 3-node cluster results (loopback network, no disk I/O latency, 32-way parallelism):

Test	Result
S3 Write 1 MB x 200 (32 parallel)	380.2 MB/s
S3 Write 4 MB x 50 (32 parallel)	349.7 MB/s
S3 Write 16 MB x 25 (32 parallel)	340.7 MB/s
S3 Read 1 MB x 200 (32 parallel)	913.2 MB/s
32 concurrent PUTs	50 ms (no deadlock)

Observations

Small object writes improved 9.6x after ADR-032 (async GatewayOps + lock-free composition writes). The composition lock is no longer held during Raft consensus, allowing concurrent writes to proceed in parallel.
Read throughput exceeds write. Reads bypass Raft consensus (served from the local composition + chunk store) and hit 1.1 GB/s even for 1 MB objects.
GCP outperforms localhost for large objects. The GCP network (15+ Gbps) and n2-standard-16 nodes have more bandwidth than localhost loopback under contention. 16 MB writes: 1,102 MB/s (GCP) vs 341 MB/s (local).
Latency is network-bound. p50 latency on GCP (7.6 ms) includes network RTT + Raft consensus (5-node quorum). Local latency is dominated by CPU contention on shared machine.
Single Raft group is the write bottleneck. All writes go through one leader. Multi-shard deployment would distribute leaders across nodes, scaling write throughput linearly.

Known Issues

Concurrent write deadlock (fixed in ADR-032). The sync→async bridge (run_on_raft) caused thread starvation under concurrent load. Fixed by making GatewayOps and LogOps fully async, and moving log emission out of the composition lock scope. Result: 1 MB writes improved from 39.5 to 380.2 MB/s (9.6x).
NFS mount on GCP. Requires SSH key distribution from ctrl to client nodes. The ctrl service account needs osAdminLogin role and OS Login key registration.
Prometheus counters. gateway_requests_total not exported to /metrics endpoint.

Running the Benchmark

# Local 3-node test
cargo build --release --bin kiseki-server
# Start 3 nodes (see examples/cluster-3node.env.node{1,2,3})
# Run: bash infra/gcp/benchmarks/perf-suite.sh

# GCP deployment
cd infra/gcp
terraform apply -var="project_id=PROJECT" -var="zone=ZONE" \
  -var="release_tag=v2026.1.332"
# Deploy perf-suite.sh to ctrl node and run

See infra/gcp/benchmarks/perf-suite.sh for the full benchmark script and infra/gcp/benchmarks/run-perf.sh for the local deployment wrapper.

Comparison with Ceph and Lustre

Single-Leader Kiseki vs Typical Deployments (similar hardware scale)

Metric	Kiseki (1 leader)	Ceph RGW (S3)	Lustre
Large object write	1.1 GB/s (16 MB)	0.5-2 GB/s	1-2 GB/s per OST
Small object write	122 MB/s (1 MB)	50-200 MB/s	200-500 MB/s
Read throughput	1.1 GB/s	1-3 GB/s	2-10 GB/s
PUT latency	p50: 7.6 ms	p50: 2-5 ms	p50: <1 ms (POSIX)
Aggregate 3-client	133 MB/s	300-800 MB/s	1-5 GB/s
Encryption	Always (AES-256-GCM)	Optional (rarely on)	No

Why aggregate throughput is lower

All writes go through a single Raft leader (single Raft group). Ceph distributes across PGs/OSDs, Lustre stripes across OSTs. They parallelize writes across all nodes; kiseki serializes through one leader. This is a deployment constraint, not an architectural limit.

Where kiseki is strong

Per-leader throughput is excellent. 1.1 GB/s per leader with full AES-256-GCM encryption is comparable to Ceph RGW without encryption. The crypto overhead is nearly invisible (aws-lc-rs with AES-NI).
Read throughput matches. Reads bypass Raft consensus entirely and serve from local composition + chunk store. Multi-node reads scale linearly since any node can serve.
Latency is reasonable. 7.6 ms includes Raft consensus over network + encryption. Ceph’s 2-5 ms S3 latency is lower but typically without encryption. Lustre’s sub-ms is POSIX (kernel bypass), not comparable to HTTP/S3.

Bottleneck analysis

Not bottlenecked by crypto – AES-256-GCM at 1.1 GB/s means the CPU encrypts faster than the network/Raft can deliver.
Not bottlenecked by network – 15 Gbps available, using <10 Gbps.
Bottlenecked by Raft consensus – 7.6 ms per round-trip for small objects, amortized for large ones.
Multi-shard is the path to parity – linear scaling with shard count, same model as Ceph PGs and Lustre OSTs.

Projected multi-shard performance

Shards	1 MB Write	16 MB Write	Read
1	122 MB/s	1.1 GB/s	1.1 GB/s
3	~366 MB/s	~3.4 GB/s	~3.4 GB/s
5	~610 MB/s	~5.7 GB/s	~5.7 GB/s

At 5 shards on the same hardware, kiseki reaches parity with Ceph and approaches Lustre – while encrypting all data at rest and in transit, on commodity GCP VMs with network-attached storage (not local NVMe or InfiniBand).

Keyboard shortcuts

Kiseki Documentation