Files
metabuilder/dbal/docs/AGENTS.md

15 KiB

Agent Development Guide for DBAL

This document provides guidance for AI agents and automated tools working with the DBAL codebase.

Architecture Philosophy

The DBAL is designed as a language-agnostic contract system that separates:

  1. API Definition (in YAML) - The source of truth
  2. Development Implementation (TypeScript) - Fast iteration, testing, debugging
  3. Production Implementation (C++) - Security, performance, isolation
  4. Shared Test Vectors - Guarantees behavioral consistency

Key Principles for Agents

1. API Contract is Source of Truth

Always start with the API definition when adding features:

1. Define entity in api/schema/entities/
2. Define operations in api/schema/operations/
3. Generate TypeScript types: python tools/codegen/gen_types.py
4. Generate C++ types: python tools/codegen/gen_types.py --lang=cpp
5. Implement in adapters
6. Add conformance tests

Never add fields, operations, or entities directly in TypeScript or C++ without updating the YAML schemas first.

2. TypeScript is for Development Speed

The TypeScript implementation prioritizes:

  • Fast iteration - Quick to modify and test
  • Rich ecosystem - npm packages, debugging tools
  • Easy prototyping - Try ideas quickly

Use TypeScript for:

  • New feature development
  • Schema iteration
  • Integration testing
  • Developer debugging

3. C++ is for Production Security

The C++ implementation prioritizes:

  • Security - Process isolation, sandboxing, no user code execution
  • Performance - Optimized queries, connection pooling
  • Stability - Static typing, memory safety
  • Auditability - All operations logged

C++ daemon provides:

  • Credential protection (user code never sees DB URLs/passwords)
  • Query validation and sanitization
  • Row-level security enforcement
  • Resource limits and quotas

4. Conformance Tests Guarantee Parity

Every operation must have conformance tests that run against both implementations:

# common/contracts/conformance_cases.yaml
- name: "User CRUD operations"
  setup:
    - create_user:
        username: "testuser"
        email: "test@example.com"
  tests:
    - create:
        entity: Post
        input: { title: "Test", author_id: "$setup.user.id" }
        expect: { status: "success" }
    - read:
        entity: Post
        input: { id: "$prev.id" }
        expect: { title: "Test" }

CI/CD runs these tests on both TypeScript and C++ implementations. If they diverge, the build fails.

Development Workflow for Agents

Adding a New Entity

# 1. Create entity schema
cat > api/schema/entities/comment.yaml << EOF
entity: Comment
version: "1.0"
fields:
  id: { type: uuid, primary: true, generated: true }
  content: { type: text, required: true }
  post_id: { type: uuid, required: true, foreign_key: { entity: Post, field: id } }
  author_id: { type: uuid, required: true }
  created_at: { type: datetime, generated: true }
EOF

# 2. Create operations
cat > api/schema/operations/comment.ops.yaml << EOF
operations:
  create:
    input: [content, post_id, author_id]
    output: Comment
    acl_required: ["comment:create"]
  list:
    input: [post_id]
    output: Comment[]
    acl_required: ["comment:read"]
EOF

# 3. Generate types
python tools/codegen/gen_types.py

# 4. Implement adapters (both TS and C++)
# - ts/src/adapters/prisma/mapping.ts
# - cpp/src/adapters/prisma/prisma_adapter.cpp

# 5. Add conformance tests
cat > common/contracts/comment_tests.yaml << EOF
- name: "Comment CRUD"
  operations:
    - action: create
      entity: Comment
      input: { content: "Great post!", post_id: "post_1", author_id: "user_1" }
      expected: { status: success }
EOF

# 6. Run conformance
python tools/conformance/run_all.py

Modifying an Existing Entity

# 1. Update YAML schema
vim api/schema/entities/user.yaml
# Add: avatar_url: { type: string, optional: true }

# 2. Regenerate types
python tools/codegen/gen_types.py

# 3. Regenerate Prisma schema + create migration (if using Prisma)
node ../../shared/tools/codegen/gen_prisma_schema.js
npx prisma migrate dev --schema ../../prisma/schema.prisma --name add_avatar_url

# 4. Update adapters to handle new field
# Both ts/src/adapters/prisma/mapping.ts and C++ version

# 5. Add tests
# Update common/contracts/user_tests.yaml

# 6. Verify conformance
python tools/conformance/run_all.py

Adding a Backend Adapter

# 1. Define capabilities
cat > api/schema/capabilities.yaml << EOF
adapters:
  mongodb:
    transactions: true
    joins: false
    full_text_search: true
    ttl: true
EOF

# 2. Create TypeScript adapter
mkdir -p ts/src/adapters/mongodb
cat > ts/src/adapters/mongodb/index.ts << EOF
export class MongoDBAdapter implements DBALAdapter {
  async create(entity: string, data: any): Promise<any> {
    // Implementation
  }
}
EOF

# 3. Create C++ adapter
mkdir -p cpp/src/adapters/mongodb
# Implement MongoDBAdapter class

# 4. Register adapter
# Update ts/src/core/client.ts and cpp/src/client.cpp

# 5. Test conformance
python tools/conformance/run_all.py --adapter=mongodb

File Organization Rules

api/ (Language-Agnostic Contracts)

api/
├── schema/
│   ├── entities/           # One file per entity
│   │   ├── user.yaml
│   │   ├── post.yaml
│   │   └── comment.yaml
│   ├── operations/         # One file per entity
│   │   ├── user.ops.yaml
│   │   ├── post.ops.yaml
│   │   └── comment.ops.yaml
│   ├── errors.yaml         # Single file for all errors
│   └── capabilities.yaml   # Single file for all adapter capabilities

Rules:

  • One entity per file
  • Use lowercase with underscores for filenames
  • Version every entity (semantic versioning)
  • Document breaking changes in comments

ts/ (TypeScript Implementation)

ts/src/
├── core/                   # Core abstractions
│   ├── client.ts          # Main DBAL client
│   ├── types.ts           # Generated from YAML
│   └── errors.ts          # Error classes
├── adapters/              # One directory per backend
│   ├── prisma/
│   ├── sqlite/
│   └── mongodb/
├── query/                 # Query builder (backend-agnostic)
└── runtime/               # Config, secrets, telemetry

Rules:

  • Keep files under 300 lines
  • One class per file
  • Use barrel exports (index.ts)
  • No circular dependencies

cpp/ (C++ Implementation)

cpp/
├── include/dbal/          # Public headers
├── src/                   # Implementation
├── tests/                 # Tests
└── CMakeLists.txt

Rules:

  • Header guards: #ifndef DBAL_CLIENT_HPP
  • Namespace: dbal::
  • Use modern C++17 features
  • RAII for resource management

common/ (Shared Test Vectors)

common/
├── fixtures/              # Sample data
│   ├── seed/
│   └── datasets/
├── golden/                # Expected results
└── contracts/             # Conformance test definitions
    ├── user_tests.yaml
    ├── post_tests.yaml
    └── conformance_cases.yaml

Rules:

  • YAML for test definitions
  • JSON for fixtures
  • One test suite per entity
  • Include edge cases

Code Generation

Automated Type Generation

The DBAL uses Python scripts to generate TypeScript and C++ types from YAML schemas:

# tools/codegen/gen_types.py
def generate_typescript_types(schema_dir: Path, output_file: Path):
    """Generate TypeScript interfaces from YAML schemas"""
    
def generate_cpp_types(schema_dir: Path, output_dir: Path):
    """Generate C++ structs from YAML schemas"""

When to regenerate:

  • After modifying any YAML in api/schema/
  • Before running tests
  • As part of CI/CD pipeline

Manual Code vs Generated Code

Generated (Never edit manually):

  • ts/src/core/types.ts - Entity interfaces
  • ts/src/core/errors.ts - Error classes
  • cpp/include/dbal/types.hpp - Entity structs
  • cpp/include/dbal/errors.hpp - Error types

Manual (Safe to edit):

  • Adapter implementations
  • Query builder
  • Client facade
  • Utility functions

Testing Strategy

1. Unit Tests (Per Implementation)

# TypeScript
cd ts && npm run test:unit

# C++
cd cpp && ./build/tests/unit_tests

Test individual functions and classes in isolation.

2. Integration Tests (Per Implementation)

# TypeScript
cd ts && npm run test:integration

# C++
cd cpp && ./build/tests/integration_tests

Test adapters against real databases (with Docker).

3. Conformance Tests (Cross-Implementation)

# Both implementations
python tools/conformance/run_all.py

Critical: These must pass for both TS and C++. If they diverge, it's a bug.

4. Security Tests (C++ Only)

cd cpp && ./build/tests/security_tests

Test sandboxing, ACL enforcement, SQL injection prevention.

Security Considerations for Agents

What NOT to Do

Never expose database credentials to user code Never allow user code to construct raw SQL queries Never skip ACL checks Never trust user input without validation Never log sensitive data (passwords, tokens, PII)

What TO Do

Always validate input against schema Always enforce row-level security Always use parameterized queries Always log security-relevant operations Always test with malicious input

Sandboxing Requirements (C++ Daemon)

The C++ daemon must:

  1. Run with minimal privileges (drop root, use dedicated user)
  2. Restrict file system access (no write outside /var/lib/dbal/)
  3. Limit network access (only to DB, no outbound internet)
  4. Enforce resource limits (CPU, memory, connections)
  5. Validate all RPC calls (schema conformance, ACL checks)

ACL Enforcement

Every operation must check:

// C++ daemon
bool DBALDaemon::authorize(const Request& req) {
    User user = req.user();
    string entity = req.entity();
    string operation = req.operation();
    
    // 1. Check entity-level permission
    if (!acl_.hasPermission(user, entity, operation)) {
        return false;
    }
    
    // 2. Apply row-level filter
    if (operation == "update" || operation == "delete") {
        return acl_.canAccessRow(user, entity, req.id());
    }
    
    return true;
}

CI/CD Integration

GitHub Actions Workflow

name: DBAL CI/CD

on: [push, pull_request]

jobs:
  typescript:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: cd dbal/development && npm ci
      - run: npm run test:unit
      - run: npm run test:integration
      
  cpp:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: cd dbal/production && cmake -B build && cmake --build build
      - run: ./build/tests/unit_tests
      - run: ./build/tests/integration_tests
      
  conformance:
    needs: [typescript, cpp]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: python dbal/shared/tools/conformance/run_all.py

Pre-commit Hooks

# .git/hooks/pre-commit
#!/bin/bash
cd dbal/shared/api/schema
if git diff --cached --name-only | grep -q "\.yaml$"; then
    echo "YAML schema changed, regenerating types..."
    python ../../tools/codegen/gen_types.py
    git add ../ts/src/core/types.ts
    git add ../cpp/include/dbal/types.hpp
fi

Deployment Architecture

Development Environment

┌─────────────────┐
│  Spark App (TS) │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ DBAL Client (TS)│
└────────┬────────┘
         │ (direct)
         ▼
┌─────────────────┐
│ Prisma Client   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  SQLite / DB    │
└─────────────────┘

Production Environment

┌─────────────────┐
│  Spark App (TS) │
└────────┬────────┘
         │ gRPC
         ▼
┌─────────────────┐
│ DBAL Client (TS)│
└────────┬────────┘
         │ gRPC/WS
         ▼
┌─────────────────┐     ┌─────────────────┐
│ DBAL Daemon(C++)│────▶│  Network Policy │
│   [Sandboxed]   │     │  (Firewall)     │
└────────┬────────┘     └─────────────────┘
         │
         ▼
┌─────────────────┐
│ Prisma Client   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   PostgreSQL    │
└─────────────────┘

Docker Compose Example

version: '3.8'

services:
  dbal-daemon:
    build: ./dbal/production
    container_name: dbal-daemon
    ports:
      - "50051:50051"
    environment:
      - DBAL_MODE=production
      - DBAL_SANDBOX=strict
      - DATABASE_URL=postgresql://user:pass@postgres:5432/db
    volumes:
      - ./config:/config:ro
    security_opt:
      - no-new-privileges:true
    read_only: true
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE
      
  postgres:
    image: postgres:15
    container_name: dbal-postgres
    environment:
      - POSTGRES_PASSWORD=secure_password
    volumes:
      - postgres-data:/var/lib/postgresql/data
    networks:
      - internal

networks:
  internal:
    internal: true

volumes:
  postgres-data:

Troubleshooting for Agents

Problem: Types out of sync with schema

Solution:

python tools/codegen/gen_types.py

Problem: Conformance tests failing

Diagnosis:

# Run verbose
python tools/conformance/run_all.py --verbose

# Compare outputs
diff common/golden/ts_results.json common/golden/cpp_results.json

Problem: C++ daemon won't start in production

Check:

  1. Permissions: ls -la /var/lib/dbal/
  2. Ports: netstat -tlnp | grep 50051
  3. Logs: journalctl -u dbal-daemon
  4. Database connectivity: nc -zv postgres 5432

Problem: Security audit failing

Review:

  • No hardcoded secrets
  • All queries use parameters
  • ACL checks on every operation
  • Audit logs enabled

Best Practices Summary

  1. Schema first - Define in YAML, generate code
  2. Test both - TS and C++ must pass conformance tests
  3. Security by default - ACL on every operation
  4. Documentation - Update README when adding features
  5. Versioning - Semantic versioning for API changes
  6. Backward compatibility - Support N-1 versions
  7. Fail fast - Validate early, error clearly
  8. Audit everything - Log security-relevant operations
  9. Principle of least privilege - Minimal permissions
  10. Defense in depth - Multiple layers of security

Resources