Admin Dashboard
Kiseki includes a built-in web dashboard for cluster monitoring and basic operations. The dashboard is served by every storage node on the metrics HTTP port.
Access
http://<node>:9090/ui
Any node in the cluster serves the full cluster-wide view. The dashboard scrapes metrics from peer nodes in the background and aggregates them locally. There is no dedicated dashboard server; connect to whichever node is most convenient.
The metrics HTTP server also serves:
| Path | Purpose |
|---|---|
/health | Health probe (returns 200 OK). Used by load balancers. |
/metrics | Prometheus text exposition format. |
/ui | Admin dashboard (HTML + HTMX + Chart.js). |
/ui/logo | Kiseki logo image. |
Technology
The dashboard is a single-page HTML application using:
- HTMX for live updates via HTML fragment polling.
- Chart.js for time-series and per-node comparison charts.
- No build step, no JavaScript framework, no node_modules.
The dashboard HTML is embedded in the kiseki-server binary at compile
time (include_str!). No external files to deploy or manage.
Overview tab
The main view shows six metric cards at the top, a time-series chart in the middle, and a node table at the bottom. All data refreshes automatically via HTMX polling.
Metric cards
| Card | Source metric | Description |
|---|---|---|
| Cluster Health | Node liveness | N/M nodes healthy with color coding: green (all healthy), yellow (degraded), red (all down). |
| Raft Entries | kiseki_raft_entries_total | Total Raft entries applied across the cluster. |
| Gateway Requests | kiseki_gateway_requests_total | Total S3 and NFS requests served. |
| Data Written | kiseki_chunk_write_bytes_total | Aggregate chunk bytes written. |
| Data Read | kiseki_chunk_read_bytes_total | Aggregate chunk bytes read. |
| Connections | kiseki_transport_connections_active | Active transport connections. |
Numbers are formatted with SI suffixes (K, M, B) and byte units (KB, MB, GB, TB) for readability.
Time-series charts
The dashboard stores up to 3 hours of metric history (configurable) in memory. Time-series charts show:
- Raft entries over time
- Gateway request rate
- Chunk write/read throughput
- Connection count
Historical data is available via the API:
# Get 3 hours of history (default)
curl http://node1:9090/ui/api/history
# Get 1 hour of history
curl http://node1:9090/ui/api/history?hours=1
Node table
A table listing every node in the cluster with per-node metrics:
| Column | Description |
|---|---|
| Node | Node address (hostname:port) |
| Status | Health badge: green “Healthy” or red “Unreachable” |
| Raft | Raft entries applied by this node |
| Requests | Gateway requests served by this node |
| Written | Chunk bytes written by this node |
| Read | Chunk bytes read by this node |
| Conns | Active transport connections on this node |
Click a node row to drill down to the node detail view.
Performance tab
The performance tab shows per-node comparison charts for identifying hotspots and imbalances:
- Write throughput by node: Bar chart comparing chunk bytes written per node.
- Read throughput by node: Bar chart comparing chunk bytes read per node.
- Request count by node: Bar chart comparing gateway requests per node.
Chart data is sourced from the chart-data API:
curl http://node1:9090/ui/fragment/chart-data
# Returns: {"labels": [...], "writes": [...], "reads": [...], "requests": [...]}
Alerts tab
The alerts tab shows health status and capacity warnings. Each alert is a row with a colored dot (green, yellow, red, blue), a message, and a timestamp.
Alert types
| Dot | Meaning | Example |
|---|---|---|
| Green | All clear | “All 3 nodes healthy” |
| Red | Critical | “Node node2:9100 unreachable” |
| Blue | Informational | “Capacity monitoring active (3 nodes reporting)” |
| Green | Activity | “node1:9100: 1.2K gateway requests served” |
Alerts are generated by comparing the current cluster state against expected conditions. The alert endpoint returns HTML fragments for HTMX polling:
curl http://node1:9090/ui/fragment/alerts
Operations tab
The operations tab provides buttons for common administrative actions. Each action calls a REST endpoint and records an event in the diagnostic event store.
Available operations
| Operation | Endpoint | Method | Description |
|---|---|---|---|
| Maintenance Mode | /ui/api/ops/maintenance | POST | Enable or disable cluster-wide maintenance mode. Body: {"enabled": true} or {"enabled": false}. |
| Backup | /ui/api/ops/backup | POST | Initiate a background backup. |
| Scrub | /ui/api/ops/scrub | POST | Initiate a background integrity scrub. |
Example:
# Enable maintenance mode
curl -X POST http://node1:9090/ui/api/ops/maintenance \
-H 'Content-Type: application/json' \
-d '{"enabled": true}'
# Trigger a scrub
curl -X POST http://node1:9090/ui/api/ops/scrub
All operations return {"status": "ok", "message": "..."} on success.
Node drill-down
Click a node in the node table to see its detailed view. The drill-down shows:
- Node-specific metric history (time-series)
- Device health for devices attached to that node
- Shard assignments on that node
- Raft role (leader/follower/learner) per shard
API endpoints
All dashboard data is available via JSON APIs for scripting and integration:
| Endpoint | Method | Description |
|---|---|---|
/ui/api/cluster | GET | Cluster summary: healthy nodes, total nodes, aggregate metrics. |
/ui/api/nodes | GET | List of all nodes with per-node metrics and health status. |
/ui/api/history | GET | Time-series metric history. Query: ?hours=3 (default). |
/ui/api/events | GET | Diagnostic event log. Query parameters below. |
Event log query parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
severity | string | (all) | Filter by severity: info, warning, error, critical. |
category | string | (all) | Filter by category: node, shard, device, tenant, security, admin, gateway, raft. |
hours | float | 3 | Hours to look back. |
limit | integer | 100 | Maximum events to return. |
Example:
# Get last 50 error events in the past hour
curl 'http://node1:9090/ui/api/events?severity=error&hours=1&limit=50'
Response format:
{
"count": 2,
"events": [
{
"timestamp": "2026-04-23T14:30:00Z",
"severity": "error",
"category": "device",
"source": "nvme-0001",
"message": "Device SMART wear exceeds 90%"
}
]
}
Cluster-wide view architecture
Every node in the cluster runs the same dashboard. The cluster-wide
view is assembled by scraping /metrics from peer nodes:
- Each node knows its peers from
KISEKI_RAFT_PEERS. - A background task scrapes each peer’s
/metricsendpoint at a configurable interval (default 10 seconds). - Scraped metrics are cached locally in a
MetricsAggregator. - Dashboard requests aggregate local + cached peer metrics.
This means:
- No single point of failure. Any node serves the dashboard.
- Stale data tolerance. If a peer is unreachable, the dashboard shows the last known state and marks the node as “Unreachable.”
- No additional infrastructure. No dedicated monitoring server is needed for basic cluster visibility.
For production monitoring with alerting and long-term retention, use Prometheus and Grafana (see Monitoring).