Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Admin Dashboard

Kiseki includes a built-in web dashboard for cluster monitoring and basic operations. The dashboard is served by every storage node on the metrics HTTP port.


Access

http://<node>:9090/ui

Any node in the cluster serves the full cluster-wide view. The dashboard scrapes metrics from peer nodes in the background and aggregates them locally. There is no dedicated dashboard server; connect to whichever node is most convenient.

The metrics HTTP server also serves:

PathPurpose
/healthHealth probe (returns 200 OK). Used by load balancers.
/metricsPrometheus text exposition format.
/uiAdmin dashboard (HTML + HTMX + Chart.js).
/ui/logoKiseki logo image.

Technology

The dashboard is a single-page HTML application using:

  • HTMX for live updates via HTML fragment polling.
  • Chart.js for time-series and per-node comparison charts.
  • No build step, no JavaScript framework, no node_modules.

The dashboard HTML is embedded in the kiseki-server binary at compile time (include_str!). No external files to deploy or manage.


Overview tab

The main view shows six metric cards at the top, a time-series chart in the middle, and a node table at the bottom. All data refreshes automatically via HTMX polling.

Metric cards

CardSource metricDescription
Cluster HealthNode livenessN/M nodes healthy with color coding: green (all healthy), yellow (degraded), red (all down).
Raft Entrieskiseki_raft_entries_totalTotal Raft entries applied across the cluster.
Gateway Requestskiseki_gateway_requests_totalTotal S3 and NFS requests served.
Data Writtenkiseki_chunk_write_bytes_totalAggregate chunk bytes written.
Data Readkiseki_chunk_read_bytes_totalAggregate chunk bytes read.
Connectionskiseki_transport_connections_activeActive transport connections.

Numbers are formatted with SI suffixes (K, M, B) and byte units (KB, MB, GB, TB) for readability.

Time-series charts

The dashboard stores up to 3 hours of metric history (configurable) in memory. Time-series charts show:

  • Raft entries over time
  • Gateway request rate
  • Chunk write/read throughput
  • Connection count

Historical data is available via the API:

# Get 3 hours of history (default)
curl http://node1:9090/ui/api/history

# Get 1 hour of history
curl http://node1:9090/ui/api/history?hours=1

Node table

A table listing every node in the cluster with per-node metrics:

ColumnDescription
NodeNode address (hostname:port)
StatusHealth badge: green “Healthy” or red “Unreachable”
RaftRaft entries applied by this node
RequestsGateway requests served by this node
WrittenChunk bytes written by this node
ReadChunk bytes read by this node
ConnsActive transport connections on this node

Click a node row to drill down to the node detail view.


Performance tab

The performance tab shows per-node comparison charts for identifying hotspots and imbalances:

  • Write throughput by node: Bar chart comparing chunk bytes written per node.
  • Read throughput by node: Bar chart comparing chunk bytes read per node.
  • Request count by node: Bar chart comparing gateway requests per node.

Chart data is sourced from the chart-data API:

curl http://node1:9090/ui/fragment/chart-data
# Returns: {"labels": [...], "writes": [...], "reads": [...], "requests": [...]}

Alerts tab

The alerts tab shows health status and capacity warnings. Each alert is a row with a colored dot (green, yellow, red, blue), a message, and a timestamp.

Alert types

DotMeaningExample
GreenAll clear“All 3 nodes healthy”
RedCritical“Node node2:9100 unreachable”
BlueInformational“Capacity monitoring active (3 nodes reporting)”
GreenActivity“node1:9100: 1.2K gateway requests served”

Alerts are generated by comparing the current cluster state against expected conditions. The alert endpoint returns HTML fragments for HTMX polling:

curl http://node1:9090/ui/fragment/alerts

Operations tab

The operations tab provides buttons for common administrative actions. Each action calls a REST endpoint and records an event in the diagnostic event store.

Available operations

OperationEndpointMethodDescription
Maintenance Mode/ui/api/ops/maintenancePOSTEnable or disable cluster-wide maintenance mode. Body: {"enabled": true} or {"enabled": false}.
Backup/ui/api/ops/backupPOSTInitiate a background backup.
Scrub/ui/api/ops/scrubPOSTInitiate a background integrity scrub.

Example:

# Enable maintenance mode
curl -X POST http://node1:9090/ui/api/ops/maintenance \
  -H 'Content-Type: application/json' \
  -d '{"enabled": true}'

# Trigger a scrub
curl -X POST http://node1:9090/ui/api/ops/scrub

All operations return {"status": "ok", "message": "..."} on success.


Node drill-down

Click a node in the node table to see its detailed view. The drill-down shows:

  • Node-specific metric history (time-series)
  • Device health for devices attached to that node
  • Shard assignments on that node
  • Raft role (leader/follower/learner) per shard

API endpoints

All dashboard data is available via JSON APIs for scripting and integration:

EndpointMethodDescription
/ui/api/clusterGETCluster summary: healthy nodes, total nodes, aggregate metrics.
/ui/api/nodesGETList of all nodes with per-node metrics and health status.
/ui/api/historyGETTime-series metric history. Query: ?hours=3 (default).
/ui/api/eventsGETDiagnostic event log. Query parameters below.

Event log query parameters

ParameterTypeDefaultDescription
severitystring(all)Filter by severity: info, warning, error, critical.
categorystring(all)Filter by category: node, shard, device, tenant, security, admin, gateway, raft.
hoursfloat3Hours to look back.
limitinteger100Maximum events to return.

Example:

# Get last 50 error events in the past hour
curl 'http://node1:9090/ui/api/events?severity=error&hours=1&limit=50'

Response format:

{
  "count": 2,
  "events": [
    {
      "timestamp": "2026-04-23T14:30:00Z",
      "severity": "error",
      "category": "device",
      "source": "nvme-0001",
      "message": "Device SMART wear exceeds 90%"
    }
  ]
}

Cluster-wide view architecture

Every node in the cluster runs the same dashboard. The cluster-wide view is assembled by scraping /metrics from peer nodes:

  1. Each node knows its peers from KISEKI_RAFT_PEERS.
  2. A background task scrapes each peer’s /metrics endpoint at a configurable interval (default 10 seconds).
  3. Scraped metrics are cached locally in a MetricsAggregator.
  4. Dashboard requests aggregate local + cached peer metrics.

This means:

  • No single point of failure. Any node serves the dashboard.
  • Stale data tolerance. If a peer is unreachable, the dashboard shows the last known state and marks the node as “Unreachable.”
  • No additional infrastructure. No dedicated monitoring server is needed for basic cluster visibility.

For production monitoring with alerting and long-term retention, use Prometheus and Grafana (see Monitoring).