Lacuna¶
Protected space for data governance, lineage, and privacy-aware operations
Overview¶
Organizations deploying local LLMs and data platforms face a critical challenge: How do you enable self-service data access while maintaining governance, lineage tracking, and compliance?
Current solutions force a choice between:
- Strict centralized control → Bottlenecks, slow innovation
- Complete self-service → Compliance violations, data leaks, audit failures
Lacuna solves this by creating a protected space where sensitive data stays secure while enabling self-service access within compliance boundaries.
What is Lacuna?¶
Lacuna is a policy-aware data governance engine that provides:
-
Automatic Classification
Three-layer pipeline (heuristics → embeddings → LLM) classifies data operations in real-time
-
Policy Enforcement
Real-time policy evaluation with clear, actionable feedback to users
-
Complete Lineage
Automatic tracking across transformations, joins, and exports
-
Audit Compliance
ISO 27001-compliant logs with tamper-evident hash chains
-
Easy Integration
Works with dbt, Databricks, Snowflake, and Open Policy Agent
-
High Performance
<10ms classification for 98% of operations
Key Features¶
Three-Layer Classification Pipeline¶
Lacuna achieves 98%+ throughput at <10ms latency through an intelligent pipeline:
- Layer 1: Heuristics - Regex patterns and keyword matching handle 90% of operations in <1ms
- Layer 2: Embeddings - Semantic similarity for 8% of operations in ~10ms
- Layer 3: LLM Reasoning - Complex decisions for 2% of operations in ~200ms
Sensitivity Tiers¶
All data is classified into three tiers:
| Tier | Definition | Examples |
|---|---|---|
| PROPRIETARY | Competitive advantage or confidentiality concerns | Customer PII, proprietary algorithms, strategic plans |
| INTERNAL | Within organization but not sensitive | Internal tooling, team processes, general analytics |
| PUBLIC | Publicly available or publishable | Documentation, open-source code, published research |
Classification Inheritance
When data is joined or transformed, the most restrictive tier propagates. For example: PUBLIC + PROPRIETARY = PROPRIETARY
Quick Start¶
Development Mode¶
Try Lacuna locally with zero external dependencies:
# Clone and install
git clone https://github.com/witlox/lacuna.git
cd lacuna
pip install -e .
# Start in dev mode (uses SQLite, no external dependencies)
lacuna dev
# Open in browser
# API Docs: http://127.0.0.1:8000/docs
# User Dashboard: http://127.0.0.1:8000/user/dashboard
# Admin Dashboard: http://127.0.0.1:8000/admin/
Production Deployment¶
For production with full features:
Use Cases¶
Real-Time Policy Enforcement¶
# User's notebook
import pandas as pd
customers = pd.read_csv("customers.csv")
# ✓ Lacuna detects: PII data loaded, context updated
analysis = customers.merge(sales, on="customer_id")
# ✓ Lacuna classifies: PII propagates through join
analysis.to_csv("~/Downloads/export.csv")
# ✗ Lacuna blocks with clear message:
"""
❌ Governance Policy Violation
Action: Export to ~/Downloads/export.csv
Reason: Cannot export PII data to unmanaged location
Classification: PROPRIETARY (inherited from customers.csv)
Tags: PII, GDPR, FINANCIAL
Alternatives:
1. Use anonymized version: analysis_anon = anonymize(analysis, ['customer_id', 'email'])
2. Save to governed location: analysis.to_csv("/governed/workspace/analysis.csv")
3. Request exception: https://governance.example.com/exception
Policy: P-2024-001 (PII Export Restrictions)
Steward: data-governance@example.com
"""
Automated Lineage Tracking¶
from lacuna import LineageTracker
# Query lineage
lineage = LineageTracker.get_lineage("analysis.csv")
print(lineage.to_graph())
"""
analysis.csv (PROPRIETARY, tags: PII, GDPR, FINANCIAL)
├─ customers.csv (PROPRIETARY, tags: PII, GDPR)
│ └─ raw.customer_master (PROPRIETARY, tags: PII)
│ └─ salesforce.contacts (PROPRIETARY, tags: PII)
└─ sales.csv (INTERNAL, tags: FINANCIAL)
└─ raw.transactions (INTERNAL, tags: FINANCIAL)
"""
# Check downstream impact
downstream = LineageTracker.get_downstream("customers.csv")
print(f"Changing customers.csv will impact {len(downstream)} artifacts")
ISO 27001 Compliance¶
from lacuna.audit import ComplianceReporter
# Generate ISO 27001 A.9.4 report (Access Control)
report = ComplianceReporter.generate_a_9_4_report(
start_date="2025-01-01",
end_date="2025-12-31"
)
# Report includes:
# - All data access attempts (successful and failed)
# - Classification decisions with reasoning
# - Policy violations with user responses
# - Administrative actions
# - Complete audit trail with hash chain verification
Architecture¶
graph TD
A[User Data Operation] --> B[Operation Interceptor]
B --> C[Classification Pipeline]
C --> D[Layer 1: Heuristics<br/><1ms, 90%]
C --> E[Layer 2: Embeddings<br/>~10ms, 8%]
C --> F[Layer 3: LLM<br/>~200ms, 2%]
D --> G[Lineage Engine]
E --> G
F --> G
G --> H[Policy Engine OPA]
H --> I[Audit Logger]
I --> J[User Feedback]
Why Lacuna?¶
The Name¶
Lacuna (Latin): A gap, cavity, or protected space
In anatomy, a lacuna is a small cavity that protects cells. In manuscripts, it's a missing section that reveals what's intentionally kept private.
In data governance, Lacuna creates the protected space where:
- Sensitive data stays secure
- Appropriate data flows freely
- Boundaries are enforced automatically
The Market Gap¶
Existing solutions address either data catalogs, access control, DLP, or policy engines - but none combine all four.
Lacuna uniquely provides:
- Real-time operation interception
- Automatic classification with lineage
- Policy enforcement with user feedback
- ISO 27001-compliant audit logging
- Self-service model with central governance
Who This Is For¶
Organizations:
- Enterprises with data governance requirements
- Regulated industries (finance, healthcare, government)
- Companies with proprietary data assets
- Teams deploying local data platforms
Users:
- Data analysts (self-service access)
- Data engineers (building pipelines)
- Data governance teams (defining policies)
- Compliance officers (audit reports)
- Security teams (monitoring access)
Next Steps¶
-
Get up and running in minutes
-
Learn how to use Lacuna's features
-
Understand how Lacuna works
-
Integrate Lacuna into your stack
Community¶
- GitHub: witlox/lacuna
- Issues: Report bugs or request features
- Discussions: Ask questions
License¶
Lacuna is licensed under the Apache 2.0 License.