Files
metabuilder/docs/implementation/dbal/BLOB_STORAGE_IMPLEMENTATION.md
2025-12-27 13:30:57 +00:00

17 KiB

Blob Storage Support for DBAL

Date: 2025-12-24
Status: COMPLETE

Summary

Added comprehensive blob storage support to the DBAL with multiple backend implementations including S3, filesystem, and in-memory storage. This enables the platform to handle binary files, media uploads, document storage, and any other blob data requirements.


Supported Storage Backends

1. S3-Compatible Storage

Supported Services:

  • AWS S3
  • MinIO (self-hosted S3-compatible)
  • DigitalOcean Spaces
  • Backblaze B2
  • Wasabi
  • Any S3-compatible object storage

Features:

  • Presigned URLs for temporary access
  • Multipart uploads for large files
  • Streaming support
  • Server-side encryption support
  • Metadata storage
  • Cross-region replication

Dependencies: @aws-sdk/client-s3, @aws-sdk/lib-storage, @aws-sdk/s3-request-presigner

2. Filesystem Storage

Supported Systems:

  • Local filesystem
  • Samba/CIFS network shares
  • NFS (Network File System)
  • Any mounted filesystem

Features:

  • Path traversal protection
  • Metadata sidecar files
  • Streaming support
  • Atomic operations
  • Directory traversal for listing

Use Cases:

  • Development and testing
  • Shared network storage
  • Legacy system integration
  • Simple deployment scenarios

3. In-Memory Storage

Features:

  • Zero configuration
  • Fast operations
  • Perfect for testing
  • Ephemeral data

Use Cases:

  • Unit testing
  • Development
  • Temporary file processing
  • Cache layer

API Overview

Core Operations

interface BlobStorage {
  // Upload operations
  upload(key: string, data: Buffer, options?: UploadOptions): Promise<BlobMetadata>
  uploadStream(key: string, stream: ReadableStream, size: number, options?: UploadOptions): Promise<BlobMetadata>
  
  // Download operations
  download(key: string, options?: DownloadOptions): Promise<Buffer>
  downloadStream(key: string, options?: DownloadOptions): Promise<ReadableStream>
  
  // Management operations
  delete(key: string): Promise<boolean>
  exists(key: string): Promise<boolean>
  getMetadata(key: string): Promise<BlobMetadata>
  list(options?: ListOptions): Promise<BlobListResult>
  copy(sourceKey: string, destKey: string): Promise<BlobMetadata>
  
  // Advanced features
  generatePresignedUrl(key: string, expirationSeconds?: number): Promise<string>
  getTotalSize(): Promise<number>
  getObjectCount(): Promise<number>
}

Usage Examples

TypeScript Examples

1. S3 Storage (AWS/MinIO)

import { createBlobStorage } from './dbal/development/src/blob'

// AWS S3
const s3Storage = createBlobStorage({
  type: 's3',
  s3: {
    bucket: 'my-bucket',
    region: 'us-east-1',
    accessKeyId: process.env.AWS_ACCESS_KEY_ID,
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
  }
})

// MinIO (self-hosted)
const minioStorage = createBlobStorage({
  type: 's3',
  s3: {
    bucket: 'my-bucket',
    region: 'us-east-1',
    accessKeyId: 'minioadmin',
    secretAccessKey: 'minioadmin',
    endpoint: 'http://localhost:9000',
    forcePathStyle: true,
  }
})

// Upload file
const file = Buffer.from('Hello, World!')
const metadata = await s3Storage.upload('documents/hello.txt', file, {
  contentType: 'text/plain',
  metadata: {
    author: 'john',
    department: 'engineering'
  }
})

// Generate presigned URL (S3 only)
const url = await s3Storage.generatePresignedUrl('documents/hello.txt', 3600) // 1 hour
console.log('Share this URL:', url)

// List files with prefix
const result = await s3Storage.list({ prefix: 'documents/', maxKeys: 100 })
for (const item of result.items) {
  console.log(`${item.key}: ${item.size} bytes`)
}

2. Filesystem Storage

import { createBlobStorage } from './dbal/development/src/blob'

// Local filesystem
const fsStorage = createBlobStorage({
  type: 'filesystem',
  filesystem: {
    basePath: '/var/app/uploads',
    createIfNotExists: true
  }
})

// Samba/NFS (just mount it first)
const sambaStorage = createBlobStorage({
  type: 'filesystem',
  filesystem: {
    basePath: '/mnt/samba-share/uploads',
    createIfNotExists: true
  }
})

// Upload file
await fsStorage.upload('users/profile-123.jpg', imageBuffer, {
  contentType: 'image/jpeg',
  metadata: { userId: '123' }
})

// Download file
const data = await fsStorage.download('users/profile-123.jpg')

// Stream large file upload
import { createReadStream } from 'fs'
const stream = createReadStream('./large-video.mp4')
await fsStorage.uploadStream('media/video-456.mp4', stream, fileSize, {
  contentType: 'video/mp4'
})

3. In-Memory Storage (Testing)

import { MemoryStorage } from './dbal/development/src/blob'

const memStorage = new MemoryStorage()

// Upload
await memStorage.upload('test.txt', Buffer.from('test data'))

// Exists check
const exists = await memStorage.exists('test.txt') // true

// Download
const data = await memStorage.download('test.txt')
console.log(data.toString()) // 'test data'

// Get statistics
const totalSize = await memStorage.getTotalSize()
const count = await memStorage.getObjectCount()

4. Streaming Large Files

// Upload stream
const fileStream = createReadStream('./large-file.bin')
await storage.uploadStream('backups/data.bin', fileStream, fileSize)

// Download stream
const downloadStream = await storage.downloadStream('backups/data.bin')
const writeStream = createWriteStream('./downloaded.bin')
await pipeline(downloadStream, writeStream)

// Partial download (range request)
const partialData = await storage.download('large-file.bin', {
  offset: 1000,
  length: 5000  // Download bytes 1000-6000
})

C++ Examples

#include "dbal/blob_storage.hpp"
#include "dbal/blob/memory_storage.cpp"

using namespace dbal;
using namespace dbal::blob;

// Create storage
MemoryStorage storage;

// Upload
std::vector<char> data = {'H', 'e', 'l', 'l', 'o'};
UploadOptions options;
options.content_type = "text/plain";
options.metadata["author"] = "john";

auto result = storage.upload("test.txt", data, options);
if (result.isOk()) {
    auto meta = result.value();
    std::cout << "Uploaded: " << meta.key << " (" << meta.size << " bytes)\n";
}

// Download
auto download_result = storage.download("test.txt");
if (download_result.isOk()) {
    auto content = download_result.value();
    std::cout << "Content: " << std::string(content.begin(), content.end()) << "\n";
}

// List blobs
ListOptions list_opts;
list_opts.prefix = "documents/";
list_opts.max_keys = 100;

auto list_result = storage.list(list_opts);
if (list_result.isOk()) {
    for (const auto& item : list_result.value().items) {
        std::cout << item.key << ": " << item.size << " bytes\n";
    }
}

// Check exists
auto exists_result = storage.exists("test.txt");
if (exists_result.isOk() && exists_result.value()) {
    std::cout << "File exists!\n";
}

// Copy blob
auto copy_result = storage.copy("test.txt", "test-copy.txt");
if (copy_result.isOk()) {
    std::cout << "File copied successfully\n";
}

// Delete
auto delete_result = storage.deleteBlob("test.txt");

Integration Patterns

1. User File Uploads

import { createBlobStorage } from './dbal/development/src/blob'

const storage = createBlobStorage({ type: 's3', s3: { ... } })

async function handleFileUpload(userId: string, file: File) {
  const key = `users/${userId}/uploads/${file.name}`
  
  // Upload with metadata
  const metadata = await storage.upload(key, await file.arrayBuffer(), {
    contentType: file.type,
    metadata: {
      originalName: file.name,
      uploadedBy: userId,
      uploadedAt: new Date().toISOString()
    }
  })
  
  // Generate shareable URL (S3)
  const shareUrl = await storage.generatePresignedUrl(key, 7 * 24 * 3600) // 7 days
  
  return { metadata, shareUrl }
}

2. Profile Picture Storage

async function saveProfilePicture(userId: string, imageData: Buffer) {
  const key = `profiles/${userId}/avatar.jpg`
  
  // Delete old avatar if exists
  const exists = await storage.exists(key)
  if (exists) {
    await storage.delete(key)
  }
  
  // Upload new avatar
  return await storage.upload(key, imageData, {
    contentType: 'image/jpeg',
    metadata: { userId }
  })
}

async function getProfilePictureUrl(userId: string): Promise<string> {
  const key = `profiles/${userId}/avatar.jpg`
  return await storage.generatePresignedUrl(key, 3600) // 1 hour
}

3. Document Versioning

async function saveDocumentVersion(docId: string, version: number, content: Buffer) {
  const key = `documents/${docId}/v${version}.pdf`
  
  await storage.upload(key, content, {
    contentType: 'application/pdf',
    metadata: {
      docId,
      version: version.toString(),
      timestamp: new Date().toISOString()
    }
  })
}

async function listDocumentVersions(docId: string) {
  const result = await storage.list({ prefix: `documents/${docId}/` })
  return result.items.map(item => ({
    key: item.key,
    version: parseInt(item.key.match(/v(\d+)/)?.[1] || '0'),
    size: item.size,
    lastModified: item.lastModified
  }))
}

4. Backup System

async function createBackup(backupId: string, data: NodeJS.ReadableStream) {
  const key = `backups/${new Date().toISOString()}-${backupId}.tar.gz`
  
  const metadata = await storage.uploadStream(key, data, dataSize, {
    contentType: 'application/gzip',
    metadata: {
      backupId,
      createdAt: new Date().toISOString()
    }
  })
  
  console.log(`Backup created: ${metadata.key} (${metadata.size} bytes)`)
}

async function listBackups() {
  const result = await storage.list({ prefix: 'backups/' })
  return result.items.sort((a, b) => 
    b.lastModified.getTime() - a.lastModified.getTime()
  )
}

Configuration

Environment Variables

# S3/AWS Configuration
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
S3_BUCKET=my-app-bucket

# MinIO Configuration
MINIO_ENDPOINT=http://localhost:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_BUCKET=uploads

# Filesystem Configuration
UPLOAD_PATH=/var/app/uploads
SAMBA_MOUNT=/mnt/samba-share

Configuration File

{
  "blobStorage": {
    "production": {
      "type": "s3",
      "s3": {
        "bucket": "prod-uploads",
        "region": "us-east-1"
      }
    },
    "staging": {
      "type": "s3",
      "s3": {
        "bucket": "staging-uploads",
        "region": "us-east-1"
      }
    },
    "development": {
      "type": "filesystem",
      "filesystem": {
        "basePath": "./uploads",
        "createIfNotExists": true
      }
    },
    "test": {
      "type": "memory"
    }
  }
}

Security Considerations

1. Access Control

// Don't expose storage directly - wrap with permissions
async function downloadFile(userId: string, fileKey: string) {
  // Check if user owns the file
  if (!fileKey.startsWith(`users/${userId}/`)) {
    throw new Error('Unauthorized')
  }
  
  return await storage.download(fileKey)
}

2. Content Type Validation

const ALLOWED_TYPES = ['image/jpeg', 'image/png', 'application/pdf']

async function uploadFile(key: string, data: Buffer, contentType: string) {
  if (!ALLOWED_TYPES.includes(contentType)) {
    throw new Error('Invalid file type')
  }
  
  return await storage.upload(key, data, { contentType })
}

3. Size Limits

const MAX_FILE_SIZE = 10 * 1024 * 1024 // 10 MB

async function uploadWithSizeCheck(key: string, data: Buffer) {
  if (data.length > MAX_FILE_SIZE) {
    throw new Error('File too large')
  }
  
  return await storage.upload(key, data)
}

4. Path Traversal Protection

The filesystem storage automatically prevents directory traversal attacks:

  • Normalizes paths
  • Strips ../ sequences
  • Validates all paths are within basePath

Performance Optimization

1. Streaming for Large Files

// Bad: Loads entire file into memory
const data = await fs.readFile('./large-video.mp4')
await storage.upload('video.mp4', data)

// Good: Streams data
const stream = createReadStream('./large-video.mp4')
await storage.uploadStream('video.mp4', stream, fileSize)

2. Parallel Uploads

// Upload multiple files in parallel
const files = ['file1.jpg', 'file2.jpg', 'file3.jpg']
await Promise.all(
  files.map(file => storage.upload(file, data[file]))
)

3. Presigned URLs (S3 Only)

// Instead of downloading and re-serving
// Bad:
const data = await storage.download('image.jpg')
res.send(data) // Wastes bandwidth and time

// Good: Let client download directly from S3
const url = await storage.generatePresignedUrl('image.jpg', 300) // 5 min
res.json({ url }) // Client downloads directly

Testing

Unit Tests with Memory Storage

import { MemoryStorage } from './dbal/development/src/blob'

describe('File Upload', () => {
  let storage: MemoryStorage
  
  beforeEach(() => {
    storage = new MemoryStorage()
  })
  
  it('should upload and download file', async () => {
    const data = Buffer.from('test content')
    
    await storage.upload('test.txt', data)
    const downloaded = await storage.download('test.txt')
    
    expect(downloaded.toString()).toBe('test content')
  })
  
  it('should throw on duplicate without overwrite', async () => {
    await storage.upload('test.txt', Buffer.from('first'))
    
    await expect(
      storage.upload('test.txt', Buffer.from('second'), { overwrite: false })
    ).rejects.toThrow('already exists')
  })
})

Migration Guide

From Direct Filesystem

// Before
import { writeFile, readFile } from 'fs/promises'
await writeFile('./uploads/file.txt', data)
const content = await readFile('./uploads/file.txt')

// After
import { createBlobStorage } from './dbal/development/src/blob'
const storage = createBlobStorage({ 
  type: 'filesystem', 
  filesystem: { basePath: './uploads' } 
})
await storage.upload('file.txt', data)
const content = await storage.download('file.txt')

From AWS SDK Directly

// Before
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3'
const s3 = new S3Client({ region: 'us-east-1' })
await s3.send(new PutObjectCommand({ Bucket: 'my-bucket', Key: 'file.txt', Body: data }))

// After
import { createBlobStorage } from './dbal/development/src/blob'
const storage = createBlobStorage({ 
  type: 's3', 
  s3: { bucket: 'my-bucket', region: 'us-east-1' } 
})
await storage.upload('file.txt', data)

Limitations

Current Implementation

C++:

  • Interface defined
  • Memory storage implemented
  • S3 storage (stub - requires AWS SDK)
  • Filesystem storage (stub - requires implementation)

TypeScript:

  • All interfaces implemented
  • Memory storage (production-ready)
  • S3 storage (requires @aws-sdk/client-s3)
  • Filesystem storage (production-ready)

Known Limitations

  1. C++ S3/Filesystem: Stub implementations (interfaces defined, implementation pending)
  2. Large file handling: Memory storage not suitable for files > 1GB
  3. List operations: S3 pagination limited to 1000 items per call (handled automatically)
  4. Filesystem: No built-in encryption (use encrypted filesystem)

Future Enhancements

  1. Additional Backends:

    • Azure Blob Storage
    • Google Cloud Storage
    • Cloudflare R2
    • IPFS
  2. Features:

    • Automatic compression
    • Image resizing/optimization
    • Virus scanning integration
    • CDN integration
    • Automatic backup/replication
  3. Performance:

    • Connection pooling
    • Automatic retry with exponential backoff
    • Multipart upload optimization
    • Caching layer

Files Changed

C++ Files (2 new):

  • dbal/production/include/dbal/blob_storage.hpp - Interface definition
  • dbal/production/src/blob/memory_storage.cpp - Memory implementation

TypeScript Files (5 new):

  • dbal/development/src/blob/blob-storage.ts - Interface definition
  • dbal/development/src/blob/memory-storage.ts - Memory implementation
  • dbal/development/src/blob/s3-storage.ts - S3 implementation
  • dbal/development/src/blob/filesystem-storage.ts - Filesystem implementation
  • dbal/development/src/blob/index.ts - Exports and factory
  • dbal/development/src/index.ts - Updated exports

Documentation (1 new):

  • BLOB_STORAGE_IMPLEMENTATION.md - Complete guide

Conclusion

Blob Storage Support Complete

The DBAL now supports comprehensive blob storage with multiple backends:

  • S3-compatible storage (AWS, MinIO, etc.)
  • Filesystem storage (local, Samba, NFS)
  • In-memory storage (testing)

Features:

  • Streaming support for large files
  • Presigned URLs (S3)
  • Metadata storage
  • Range requests
  • Copy operations
  • Statistics

Ready for:

  • Media uploads
  • Document storage
  • Backup systems
  • Profile pictures
  • File versioning
  • Any blob storage needs