Admin Dashboard

Kiseki includes a built-in web dashboard for cluster monitoring and basic operations. The dashboard is served by every storage node on the metrics HTTP port.

Access

http://<node>:9090/ui

Any node in the cluster serves the full cluster-wide view. The dashboard scrapes metrics from peer nodes in the background and aggregates them locally. There is no dedicated dashboard server; connect to whichever node is most convenient.

The metrics HTTP server also serves:

Path	Purpose
`/health`	Health probe (returns `200 OK`). Used by load balancers.
`/metrics`	Prometheus text exposition format.
`/ui`	Admin dashboard (HTML + HTMX + Chart.js).
`/ui/logo`	Kiseki logo image.

Technology

The dashboard is a single-page HTML application using:

HTMX for live updates via HTML fragment polling.
Chart.js for time-series and per-node comparison charts.
No build step, no JavaScript framework, no node_modules.

The dashboard HTML is embedded in the kiseki-server binary at compile time (include_str!). No external files to deploy or manage.

Overview tab

The main view shows six metric cards at the top, a time-series chart in the middle, and a node table at the bottom. All data refreshes automatically via HTMX polling.

Metric cards

Card	Source metric	Description
Cluster Health	Node liveness	`N/M nodes healthy` with color coding: green (all healthy), yellow (degraded), red (all down).
Raft Entries	`kiseki_raft_entries_total`	Total Raft entries applied across the cluster.
Gateway Requests	`kiseki_gateway_requests_total`	Total S3 and NFS requests served.
Data Written	`kiseki_chunk_write_bytes_total`	Aggregate chunk bytes written.
Data Read	`kiseki_chunk_read_bytes_total`	Aggregate chunk bytes read.
Connections	`kiseki_transport_connections_active`	Active transport connections.

Numbers are formatted with SI suffixes (K, M, B) and byte units (KB, MB, GB, TB) for readability.

Time-series charts

The dashboard stores up to 3 hours of metric history (configurable) in memory. Time-series charts show:

Raft entries over time
Gateway request rate
Chunk write/read throughput
Connection count

Historical data is available via the API:

# Get 3 hours of history (default)
curl http://node1:9090/ui/api/history

# Get 1 hour of history
curl http://node1:9090/ui/api/history?hours=1

Node table

A table listing every node in the cluster with per-node metrics:

Column	Description
Node	Node address (hostname:port)
Status	Health badge: green “Healthy” or red “Unreachable”
Raft	Raft entries applied by this node
Requests	Gateway requests served by this node
Written	Chunk bytes written by this node
Read	Chunk bytes read by this node
Conns	Active transport connections on this node

Click a node row to drill down to the node detail view.

Performance tab

The performance tab shows per-node comparison charts for identifying hotspots and imbalances:

Write throughput by node: Bar chart comparing chunk bytes written per node.
Read throughput by node: Bar chart comparing chunk bytes read per node.
Request count by node: Bar chart comparing gateway requests per node.

Chart data is sourced from the chart-data API:

curl http://node1:9090/ui/fragment/chart-data
# Returns: {"labels": [...], "writes": [...], "reads": [...], "requests": [...]}

Alerts tab

The alerts tab shows health status and capacity warnings. Each alert is a row with a colored dot (green, yellow, red, blue), a message, and a timestamp.

Alert types

Dot	Meaning	Example
Green	All clear	“All 3 nodes healthy”
Red	Critical	“Node node2:9100 unreachable”
Blue	Informational	“Capacity monitoring active (3 nodes reporting)”
Green	Activity	“node1:9100: 1.2K gateway requests served”

Alerts are generated by comparing the current cluster state against expected conditions. The alert endpoint returns HTML fragments for HTMX polling:

curl http://node1:9090/ui/fragment/alerts

Operations tab

The operations tab provides buttons for common administrative actions. Each action calls a REST endpoint and records an event in the diagnostic event store.

Available operations

Operation	Endpoint	Method	Description
Maintenance Mode	`/ui/api/ops/maintenance`	POST	Enable or disable cluster-wide maintenance mode. Body: `{"enabled": true}` or `{"enabled": false}`.
Backup	`/ui/api/ops/backup`	POST	Initiate a background backup.
Scrub	`/ui/api/ops/scrub`	POST	Initiate a background integrity scrub.

Example:

# Enable maintenance mode
curl -X POST http://node1:9090/ui/api/ops/maintenance \
  -H 'Content-Type: application/json' \
  -d '{"enabled": true}'

# Trigger a scrub
curl -X POST http://node1:9090/ui/api/ops/scrub

All operations return {"status": "ok", "message": "..."} on success.

Node drill-down

Click a node in the node table to see its detailed view. The drill-down shows:

Node-specific metric history (time-series)
Device health for devices attached to that node
Shard assignments on that node
Raft role (leader/follower/learner) per shard

API endpoints

All dashboard data is available via JSON APIs for scripting and integration:

Endpoint	Method	Description
`/ui/api/cluster`	GET	Cluster summary: healthy nodes, total nodes, aggregate metrics.
`/ui/api/nodes`	GET	List of all nodes with per-node metrics and health status.
`/ui/api/history`	GET	Time-series metric history. Query: `?hours=3` (default).
`/ui/api/events`	GET	Diagnostic event log. Query parameters below.

Event log query parameters

Parameter	Type	Default	Description
`severity`	string	(all)	Filter by severity: `info`, `warning`, `error`, `critical`.
`category`	string	(all)	Filter by category: `node`, `shard`, `device`, `tenant`, `security`, `admin`, `gateway`, `raft`.
`hours`	float	3	Hours to look back.
`limit`	integer	100	Maximum events to return.

Example:

# Get last 50 error events in the past hour
curl 'http://node1:9090/ui/api/events?severity=error&hours=1&limit=50'

Response format:

{
  "count": 2,
  "events": [
    {
      "timestamp": "2026-04-23T14:30:00Z",
      "severity": "error",
      "category": "device",
      "source": "nvme-0001",
      "message": "Device SMART wear exceeds 90%"
    }
  ]
}

Cluster-wide view architecture

Every node in the cluster runs the same dashboard. The cluster-wide view is assembled by scraping /metrics from peer nodes:

Each node knows its peers from KISEKI_RAFT_PEERS.
A background task scrapes each peer’s /metrics endpoint at a configurable interval (default 10 seconds).
Scraped metrics are cached locally in a MetricsAggregator.
Dashboard requests aggregate local + cached peer metrics.

This means:

No single point of failure. Any node serves the dashboard.
Stale data tolerance. If a peer is unreachable, the dashboard shows the last known state and marks the node as “Unreachable.”
No additional infrastructure. No dedicated monitoring server is needed for basic cluster visibility.

For production monitoring with alerting and long-term retention, use Prometheus and Grafana (see Monitoring).

Keyboard shortcuts

Kiseki Documentation