README.md
LICENSE
AGENTS.md
api/ # Language-agnostic contract (source of truth)
schema/
entities/ # Entity definitions (conceptual models)
user.yaml
session.yaml
...
operations/ # CRUD + domain operations (semantic, not SQL)
user.ops.yaml
...
errors.yaml # Standard error codes (conflict, not_found, etc.)
capabilities.yaml # Feature flags per backend (tx, joins, ttl, etc.)
idl/
dbal.proto # Optional: RPC/IPC contract if needed
dbal.fbs # Optional: FlatBuffers schema if you prefer
versioning/
compat.md # Compatibility rules across TS/C++
common/ # Shared test vectors + fixtures + golden results
fixtures/
seed/
datasets/
golden/
query_results/
contracts/
conformance_cases.yaml
ts/ # Development implementation in TypeScript
package.json
tsconfig.json
src/
index.ts # Public entrypoint (creates client)
core/
client.ts # DBAL client facade
types.ts # TS types mirroring api/schema
errors.ts # Error mapping to api/errors.yaml
validation/ # Runtime validation (zod/io-ts/etc.)
input.ts
output.ts
capabilities.ts # Capability negotiation
telemetry/
logger.ts
metrics.ts
tracing.ts
adapters/ # Backend implementations (TS)
prisma/
index.ts
prisma_client.ts # Wraps Prisma client (server-side only)
mapping.ts # DB <-> entity mapping, select shaping
migrations/ # Optional: Prisma migration helpers
sqlite/
index.ts
sqlite_driver.ts
schema.ts
migrations/
mongodb/
index.ts
mongo_driver.ts
schema.ts
query/ # Query builder / AST (no backend leakage)
ast.ts
builder.ts
normalize.ts
optimize.ts
runtime/
config.ts # DBAL config (env, URLs, pool sizes)
secrets.ts # Secret loading boundary (server-only)
util/
assert.ts
retry.ts
backoff.ts
time.ts
tests/
unit/
integration/
conformance/ # Runs common/contract vectors on TS adapters
harness/
setup.ts
cpp/ # Production implementation in C++
CMakeLists.txt
include/
dbal/
dbal.hpp # Public API
client.hpp # Facade
types.hpp # Entity/DTO types
errors.hpp
capabilities.hpp
telemetry.hpp
query/
ast.hpp
builder.hpp
normalize.hpp
adapters/
adapter.hpp # Adapter interface
sqlite/
sqlite_adapter.hpp
mongodb/
mongodb_adapter.hpp
prisma/
prisma_adapter.hpp # Usually NOT direct; see note below
util/
expected.hpp
result.hpp
uuid.hpp
src/
client.cpp
errors.cpp
capabilities.cpp
telemetry.cpp
query/
ast.cpp
builder.cpp
normalize.cpp
adapters/
sqlite/
sqlite_adapter.cpp
sqlite_pool.cpp
sqlite_migrations.cpp
mongodb/
mongodb_adapter.cpp
mongo_pool.cpp
prisma/
prisma_adapter.cpp # See note below (often an RPC bridge)
util/
uuid.cpp
backoff.cpp
tests/
unit/
integration/
conformance/ # Runs common/contract vectors on C++ adapters
harness/
main.cpp
backends/ # Backend-specific assets not tied to one lang
sqlite/
schema.sql
migrations/
mongodb/
indexes.json
prisma/
schema.prisma
migrations/
tools/ # Codegen + build helpers (prefer Python)
codegen/
gen_types.py # api/schema -> ts/core/types.ts and cpp/types.hpp
gen_errors.py
gen_capabilities.py
conformance/
run_all.py # runs TS + C++ conformance suites
dev/
lint.py
format.py
scripts/ # Cross-platform entrypoints (Python per your pref)
build.py
test.py
conformance.py
package.py
dist/ # Build outputs (gitignored)
.github/
workflows/
ci.yml
.gitignore
.editorconfig
15 KiB
Agent Development Guide for DBAL
This document provides guidance for AI agents and automated tools working with the DBAL codebase.
Architecture Philosophy
The DBAL is designed as a language-agnostic contract system that separates:
- API Definition (in YAML) - The source of truth
- Development Implementation (TypeScript) - Fast iteration, testing, debugging
- Production Implementation (C++) - Security, performance, isolation
- Shared Test Vectors - Guarantees behavioral consistency
Key Principles for Agents
1. API Contract is Source of Truth
Always start with the API definition when adding features:
1. Define entity in api/schema/entities/
2. Define operations in api/schema/operations/
3. Generate TypeScript types: python tools/codegen/gen_types.py
4. Generate C++ types: python tools/codegen/gen_types.py --lang=cpp
5. Implement in adapters
6. Add conformance tests
Never add fields, operations, or entities directly in TypeScript or C++ without updating the YAML schemas first.
2. TypeScript is for Development Speed
The TypeScript implementation prioritizes:
- Fast iteration - Quick to modify and test
- Rich ecosystem - npm packages, debugging tools
- Easy prototyping - Try ideas quickly
Use TypeScript for:
- New feature development
- Schema iteration
- Integration testing
- Developer debugging
3. C++ is for Production Security
The C++ implementation prioritizes:
- Security - Process isolation, sandboxing, no user code execution
- Performance - Optimized queries, connection pooling
- Stability - Static typing, memory safety
- Auditability - All operations logged
C++ daemon provides:
- Credential protection (user code never sees DB URLs/passwords)
- Query validation and sanitization
- Row-level security enforcement
- Resource limits and quotas
4. Conformance Tests Guarantee Parity
Every operation must have conformance tests that run against both implementations:
# common/contracts/conformance_cases.yaml
- name: "User CRUD operations"
setup:
- create_user:
username: "testuser"
email: "test@example.com"
tests:
- create:
entity: Post
input: { title: "Test", author_id: "$setup.user.id" }
expect: { status: "success" }
- read:
entity: Post
input: { id: "$prev.id" }
expect: { title: "Test" }
CI/CD runs these tests on both TypeScript and C++ implementations. If they diverge, the build fails.
Development Workflow for Agents
Adding a New Entity
# 1. Create entity schema
cat > api/schema/entities/comment.yaml << EOF
entity: Comment
version: "1.0"
fields:
id: { type: uuid, primary: true, generated: true }
content: { type: text, required: true }
post_id: { type: uuid, required: true, foreign_key: { entity: Post, field: id } }
author_id: { type: uuid, required: true }
created_at: { type: datetime, generated: true }
EOF
# 2. Create operations
cat > api/schema/operations/comment.ops.yaml << EOF
operations:
create:
input: [content, post_id, author_id]
output: Comment
acl_required: ["comment:create"]
list:
input: [post_id]
output: Comment[]
acl_required: ["comment:read"]
EOF
# 3. Generate types
python tools/codegen/gen_types.py
# 4. Implement adapters (both TS and C++)
# - ts/src/adapters/prisma/mapping.ts
# - cpp/src/adapters/prisma/prisma_adapter.cpp
# 5. Add conformance tests
cat > common/contracts/comment_tests.yaml << EOF
- name: "Comment CRUD"
operations:
- action: create
entity: Comment
input: { content: "Great post!", post_id: "post_1", author_id: "user_1" }
expected: { status: success }
EOF
# 6. Run conformance
python tools/conformance/run_all.py
Modifying an Existing Entity
# 1. Update YAML schema
vim api/schema/entities/user.yaml
# Add: avatar_url: { type: string, optional: true }
# 2. Regenerate types
python tools/codegen/gen_types.py
# 3. Create migration (if using Prisma)
cd backends/prisma
npx prisma migrate dev --name add_avatar_url
# 4. Update adapters to handle new field
# Both ts/src/adapters/prisma/mapping.ts and C++ version
# 5. Add tests
# Update common/contracts/user_tests.yaml
# 6. Verify conformance
python tools/conformance/run_all.py
Adding a Backend Adapter
# 1. Define capabilities
cat > api/schema/capabilities.yaml << EOF
adapters:
mongodb:
transactions: true
joins: false
full_text_search: true
ttl: true
EOF
# 2. Create TypeScript adapter
mkdir -p ts/src/adapters/mongodb
cat > ts/src/adapters/mongodb/index.ts << EOF
export class MongoDBAdapter implements DBALAdapter {
async create(entity: string, data: any): Promise<any> {
// Implementation
}
}
EOF
# 3. Create C++ adapter
mkdir -p cpp/src/adapters/mongodb
# Implement MongoDBAdapter class
# 4. Register adapter
# Update ts/src/core/client.ts and cpp/src/client.cpp
# 5. Test conformance
python tools/conformance/run_all.py --adapter=mongodb
File Organization Rules
api/ (Language-Agnostic Contracts)
api/
├── schema/
│ ├── entities/ # One file per entity
│ │ ├── user.yaml
│ │ ├── post.yaml
│ │ └── comment.yaml
│ ├── operations/ # One file per entity
│ │ ├── user.ops.yaml
│ │ ├── post.ops.yaml
│ │ └── comment.ops.yaml
│ ├── errors.yaml # Single file for all errors
│ └── capabilities.yaml # Single file for all adapter capabilities
Rules:
- One entity per file
- Use lowercase with underscores for filenames
- Version every entity (semantic versioning)
- Document breaking changes in comments
ts/ (TypeScript Implementation)
ts/src/
├── core/ # Core abstractions
│ ├── client.ts # Main DBAL client
│ ├── types.ts # Generated from YAML
│ └── errors.ts # Error classes
├── adapters/ # One directory per backend
│ ├── prisma/
│ ├── sqlite/
│ └── mongodb/
├── query/ # Query builder (backend-agnostic)
└── runtime/ # Config, secrets, telemetry
Rules:
- Keep files under 300 lines
- One class per file
- Use barrel exports (index.ts)
- No circular dependencies
cpp/ (C++ Implementation)
cpp/
├── include/dbal/ # Public headers
├── src/ # Implementation
├── tests/ # Tests
└── CMakeLists.txt
Rules:
- Header guards:
#ifndef DBAL_CLIENT_HPP - Namespace:
dbal:: - Use modern C++17 features
- RAII for resource management
common/ (Shared Test Vectors)
common/
├── fixtures/ # Sample data
│ ├── seed/
│ └── datasets/
├── golden/ # Expected results
└── contracts/ # Conformance test definitions
├── user_tests.yaml
├── post_tests.yaml
└── conformance_cases.yaml
Rules:
- YAML for test definitions
- JSON for fixtures
- One test suite per entity
- Include edge cases
Code Generation
Automated Type Generation
The DBAL uses Python scripts to generate TypeScript and C++ types from YAML schemas:
# tools/codegen/gen_types.py
def generate_typescript_types(schema_dir: Path, output_file: Path):
"""Generate TypeScript interfaces from YAML schemas"""
def generate_cpp_types(schema_dir: Path, output_dir: Path):
"""Generate C++ structs from YAML schemas"""
When to regenerate:
- After modifying any YAML in
api/schema/ - Before running tests
- As part of CI/CD pipeline
Manual Code vs Generated Code
Generated (Never edit manually):
ts/src/core/types.ts- Entity interfacests/src/core/errors.ts- Error classescpp/include/dbal/types.hpp- Entity structscpp/include/dbal/errors.hpp- Error types
Manual (Safe to edit):
- Adapter implementations
- Query builder
- Client facade
- Utility functions
Testing Strategy
1. Unit Tests (Per Implementation)
# TypeScript
cd ts && npm run test:unit
# C++
cd cpp && ./build/tests/unit_tests
Test individual functions and classes in isolation.
2. Integration Tests (Per Implementation)
# TypeScript
cd ts && npm run test:integration
# C++
cd cpp && ./build/tests/integration_tests
Test adapters against real databases (with Docker).
3. Conformance Tests (Cross-Implementation)
# Both implementations
python tools/conformance/run_all.py
Critical: These must pass for both TS and C++. If they diverge, it's a bug.
4. Security Tests (C++ Only)
cd cpp && ./build/tests/security_tests
Test sandboxing, ACL enforcement, SQL injection prevention.
Security Considerations for Agents
What NOT to Do
❌ Never expose database credentials to user code ❌ Never allow user code to construct raw SQL queries ❌ Never skip ACL checks ❌ Never trust user input without validation ❌ Never log sensitive data (passwords, tokens, PII)
What TO Do
✅ Always validate input against schema ✅ Always enforce row-level security ✅ Always use parameterized queries ✅ Always log security-relevant operations ✅ Always test with malicious input
Sandboxing Requirements (C++ Daemon)
The C++ daemon must:
- Run with minimal privileges (drop root, use dedicated user)
- Restrict file system access (no write outside /var/lib/dbal/)
- Limit network access (only to DB, no outbound internet)
- Enforce resource limits (CPU, memory, connections)
- Validate all RPC calls (schema conformance, ACL checks)
ACL Enforcement
Every operation must check:
// C++ daemon
bool DBALDaemon::authorize(const Request& req) {
User user = req.user();
string entity = req.entity();
string operation = req.operation();
// 1. Check entity-level permission
if (!acl_.hasPermission(user, entity, operation)) {
return false;
}
// 2. Apply row-level filter
if (operation == "update" || operation == "delete") {
return acl_.canAccessRow(user, entity, req.id());
}
return true;
}
CI/CD Integration
GitHub Actions Workflow
name: DBAL CI/CD
on: [push, pull_request]
jobs:
typescript:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: cd dbal/ts && npm ci
- run: npm run test:unit
- run: npm run test:integration
cpp:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: cd dbal/cpp && cmake -B build && cmake --build build
- run: ./build/tests/unit_tests
- run: ./build/tests/integration_tests
conformance:
needs: [typescript, cpp]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: python dbal/tools/conformance/run_all.py
Pre-commit Hooks
# .git/hooks/pre-commit
#!/bin/bash
cd dbal/api/schema
if git diff --cached --name-only | grep -q "\.yaml$"; then
echo "YAML schema changed, regenerating types..."
python ../../tools/codegen/gen_types.py
git add ../ts/src/core/types.ts
git add ../cpp/include/dbal/types.hpp
fi
Deployment Architecture
Development Environment
┌─────────────────┐
│ Spark App (TS) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ DBAL Client (TS)│
└────────┬────────┘
│ (direct)
▼
┌─────────────────┐
│ Prisma Client │
└────────┬────────┘
│
▼
┌─────────────────┐
│ SQLite / DB │
└─────────────────┘
Production Environment
┌─────────────────┐
│ Spark App (TS) │
└────────┬────────┘
│ gRPC
▼
┌─────────────────┐
│ DBAL Client (TS)│
└────────┬────────┘
│ gRPC/WS
▼
┌─────────────────┐ ┌─────────────────┐
│ DBAL Daemon(C++)│────▶│ Network Policy │
│ [Sandboxed] │ │ (Firewall) │
└────────┬────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Prisma Client │
└────────┬────────┘
│
▼
┌─────────────────┐
│ PostgreSQL │
└─────────────────┘
Docker Compose Example
version: '3.8'
services:
dbal-daemon:
build: ./dbal/cpp
container_name: dbal-daemon
ports:
- "50051:50051"
environment:
- DBAL_MODE=production
- DBAL_SANDBOX=strict
- DATABASE_URL=postgresql://user:pass@postgres:5432/db
volumes:
- ./config:/config:ro
security_opt:
- no-new-privileges:true
read_only: true
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE
postgres:
image: postgres:15
container_name: dbal-postgres
environment:
- POSTGRES_PASSWORD=secure_password
volumes:
- postgres-data:/var/lib/postgresql/data
networks:
- internal
networks:
internal:
internal: true
volumes:
postgres-data:
Troubleshooting for Agents
Problem: Types out of sync with schema
Solution:
python tools/codegen/gen_types.py
Problem: Conformance tests failing
Diagnosis:
# Run verbose
python tools/conformance/run_all.py --verbose
# Compare outputs
diff common/golden/ts_results.json common/golden/cpp_results.json
Problem: C++ daemon won't start in production
Check:
- Permissions:
ls -la /var/lib/dbal/ - Ports:
netstat -tlnp | grep 50051 - Logs:
journalctl -u dbal-daemon - Database connectivity:
nc -zv postgres 5432
Problem: Security audit failing
Review:
- No hardcoded secrets
- All queries use parameters
- ACL checks on every operation
- Audit logs enabled
Best Practices Summary
- ✅ Schema first - Define in YAML, generate code
- ✅ Test both - TS and C++ must pass conformance tests
- ✅ Security by default - ACL on every operation
- ✅ Documentation - Update README when adding features
- ✅ Versioning - Semantic versioning for API changes
- ✅ Backward compatibility - Support N-1 versions
- ✅ Fail fast - Validate early, error clearly
- ✅ Audit everything - Log security-relevant operations
- ✅ Principle of least privilege - Minimal permissions
- ✅ Defense in depth - Multiple layers of security
Resources
- API Schema Reference: api/schema/README.md
- TypeScript Guide: ts/README.md
- C++ Guide: cpp/README.md
- Security Guide: docs/SECURITY.md
- Contributing: docs/CONTRIBUTING.md