Sovra Sovra

Troubleshooting Guide

Overview

Common issues and solutions for Sovra deployment and operation.

Control Plane Issues

API Gateway Not Responding

Symptoms:

Diagnosis:

# Check pods
kubectl get pods -n sovra -l app=api-gateway

# Check logs
kubectl logs -n sovra -l app=api-gateway --tail=50

# Check service
kubectl get svc -n sovra api-gateway

Solutions:

  1. Pod not running: ```bash

    Check pod status

    kubectl describe pod -n sovra

Common fixes:

- Image pull error: Check image name and credentials

- CrashLoopBackOff: Check logs and configuration

- Pending: Check resource availability


2. **Service not accessible:**
```bash
# Check load balancer
kubectl get svc -n sovra api-gateway

# Test from within cluster
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
  curl -k https://api-gateway.sovra.svc:443/health
  1. Certificate issues: ```bash

    Verify certificates

    kubectl get secret sovra-tls -n sovra -o yaml

Test TLS

openssl s_client -connect sovra.example.org:443


### Database Connection Failures

**Symptoms:**
- Control plane services failing
- "connection refused" errors in logs

**Diagnosis:**
```bash
# Check PostgreSQL status
kubectl get pods -n sovra -l app=postgres

# Test connection
kubectl run -it --rm debug --image=postgres:15 --restart=Never -- \
  psql -h postgres.sovra.svc -U sovra -d sovra

Solutions:

  1. Wrong credentials: ```bash

    Check secret

    kubectl get secret sovra-postgres -n sovra -o yaml

Update if needed

kubectl create secret generic sovra-postgres
–from-literal=password=NEW_PASSWORD
–dry-run=client -o yaml | kubectl apply -f -


2. **Network policy blocking:**
```bash
# Check network policies
kubectl get networkpolicy -n sovra

# Test without network policy temporarily
kubectl delete networkpolicy -n sovra allow-postgres

High Latency

Symptoms:

Diagnosis:

# Check resource usage
kubectl top pods -n sovra

# Check database performance
psql -U sovra -c "SELECT * FROM pg_stat_activity;"

# Check API gateway logs for policy evaluation
kubectl logs -n sovra -l app=api-gateway --tail=50

Solutions:

  1. Resource constraints:
    # Increase resources
    kubectl set resources deployment api-gateway \
      --limits=cpu=2,memory=4Gi \
      --requests=cpu=1,memory=2Gi \
      -n sovra
    
  2. Database slow queries: ```bash

    Enable query logging

    psql -U sovra -c “ALTER SYSTEM SET log_min_duration_statement = 1000;”

Add indexes

psql -U sovra sovra < scripts/optimize-db.sql


3. **Policy evaluation slow:**
```bash
# Increase policy cache
kubectl set env deployment/api-gateway \
  POLICY_CACHE_SIZE=1000 \
  -n sovra

Edge Node Issues

Vault Sealed

Symptoms:

Diagnosis:

# Check seal status
vault status

# Check why sealed
kubectl logs -n sovra-edge vault-0

Solutions:

# Unseal Vault
vault operator unseal

# Unseal all replicas
for pod in vault-0 vault-1 vault-2; do
  kubectl exec -n sovra-edge $pod -- vault operator unseal <KEY>
done

# Auto-unseal setup (recommended)
# Configure Vault with auto-unseal using cloud KMS

Edge Agent Not Connecting

Symptoms:

Diagnosis:

# Check edge agent logs
kubectl logs -n sovra-edge -l app=edge-agent

# Check connectivity
kubectl exec -n sovra-edge edge-agent-xxx -- \
  curl -k https://sovra.example.org/health

# Check certificates
kubectl get secret edge-node-tls -n sovra-edge -o yaml

Solutions:

  1. Network connectivity: ```bash

    Check firewall rules

    Ensure port 8443 is open

Check DNS resolution

kubectl exec -n sovra-edge edge-agent-xxx –
nslookup sovra.example.org


2. **Certificate expired:**
```bash
# Check expiry
openssl x509 -in edge-cert.crt -noout -dates

# Renew certificate
sovra edge-node cert-renew edge-1
  1. Wrong control plane URL:
    # Update configuration
    kubectl set env deployment/edge-agent \
      CONTROL_PLANE_URL=https://new-sovra.example.org \
      -n sovra-edge
    

Raft Consensus Failure

Symptoms:

Diagnosis:

# Check Raft status
vault operator raft list-peers

# Check logs
kubectl logs -n sovra-edge vault-0

Solutions:

# Remove failed peer
vault operator raft remove-peer vault-2

# Re-join cluster
kubectl exec -n sovra-edge vault-2 -- \
  vault operator raft join https://vault-0:8200

Federation Issues

Cannot Establish Federation

Symptoms:

Diagnosis:

# Check partner connectivity
curl -k https://partner-sovra.example.org/health

# Check certificates
sovra federation cert-verify org-b

# Check logs
kubectl logs -n sovra -l app=api-gateway --tail=50

Solutions:

  1. Network connectivity: ```bash

    Test from control plane pod

    kubectl exec -n sovra api-gateway-xxx –
    curl -k https://partner-sovra.example.org/health

Check firewall rules

Ensure port 8443 is open for federation


2. **Certificate mismatch:**
```bash
# Regenerate federation certificate
sovra federation cert-regenerate --partner org-b

# Exchange with partner
# Manually transfer new certificate

Workspace Access Denied

Symptoms:

Diagnosis:

# Check workspace policies
sovra policy get --workspace cancer-research

# Check user membership
sovra workspace participants cancer-research

# Check audit logs
sovra audit query \
  --workspace cancer-research \
  --result error

Solutions:

# Update policy
sovra policy update \
  --workspace cancer-research \
  --policy updated-policy.rego

# Add user to workspace
sovra workspace add-user \
  --workspace cancer-research \
  --user researcher@org-b.edu

Performance Issues

High CPU Usage

Diagnosis:

# Check CPU usage
kubectl top pods -n sovra

# Profile application
kubectl exec -n sovra api-gateway-xxx -- \
  curl localhost:6060/debug/pprof/profile?seconds=30 > cpu.prof

Solutions:

# Scale horizontally
kubectl scale deployment api-gateway --replicas=5 -n sovra

# Increase resources
kubectl set resources deployment api-gateway \
  --limits=cpu=4,memory=8Gi \
  -n sovra

High Memory Usage

Diagnosis:

# Check memory usage
kubectl top pods -n sovra

# Check for memory leaks
kubectl exec -n sovra api-gateway-xxx -- \
  curl localhost:6060/debug/pprof/heap > heap.prof

Solutions:

# Restart pods (temporary)
kubectl rollout restart deployment -n sovra

# Increase memory limits
kubectl set resources deployment api-gateway \
  --limits=memory=16Gi \
  -n sovra

# Enable Go GC tuning
kubectl set env deployment/api-gateway \
  GOGC=50 \
  -n sovra