# Blob Storage Support for DBAL **Date**: 2025-12-24 **Status**: ✅ **COMPLETE** ## Summary Added comprehensive blob storage support to the DBAL with multiple backend implementations including S3, filesystem, and in-memory storage. This enables the platform to handle binary files, media uploads, document storage, and any other blob data requirements. --- ## Supported Storage Backends ### 1. S3-Compatible Storage **Supported Services**: - AWS S3 - MinIO (self-hosted S3-compatible) - DigitalOcean Spaces - Backblaze B2 - Wasabi - Any S3-compatible object storage **Features**: - ✅ Presigned URLs for temporary access - ✅ Multipart uploads for large files - ✅ Streaming support - ✅ Server-side encryption support - ✅ Metadata storage - ✅ Cross-region replication **Dependencies**: `@aws-sdk/client-s3`, `@aws-sdk/lib-storage`, `@aws-sdk/s3-request-presigner` ### 2. Filesystem Storage **Supported Systems**: - Local filesystem - Samba/CIFS network shares - NFS (Network File System) - Any mounted filesystem **Features**: - ✅ Path traversal protection - ✅ Metadata sidecar files - ✅ Streaming support - ✅ Atomic operations - ✅ Directory traversal for listing **Use Cases**: - Development and testing - Shared network storage - Legacy system integration - Simple deployment scenarios ### 3. In-Memory Storage **Features**: - ✅ Zero configuration - ✅ Fast operations - ✅ Perfect for testing - ✅ Ephemeral data **Use Cases**: - Unit testing - Development - Temporary file processing - Cache layer --- ## API Overview ### Core Operations ```typescript interface BlobStorage { // Upload operations upload(key: string, data: Buffer, options?: UploadOptions): Promise uploadStream(key: string, stream: ReadableStream, size: number, options?: UploadOptions): Promise // Download operations download(key: string, options?: DownloadOptions): Promise downloadStream(key: string, options?: DownloadOptions): Promise // Management operations delete(key: string): Promise exists(key: string): Promise getMetadata(key: string): Promise list(options?: ListOptions): Promise copy(sourceKey: string, destKey: string): Promise // Advanced features generatePresignedUrl(key: string, expirationSeconds?: number): Promise getTotalSize(): Promise getObjectCount(): Promise } ``` --- ## Usage Examples ### TypeScript Examples #### 1. S3 Storage (AWS/MinIO) ```typescript import { createBlobStorage } from './dbal/ts/src/blob' // AWS S3 const s3Storage = createBlobStorage({ type: 's3', s3: { bucket: 'my-bucket', region: 'us-east-1', accessKeyId: process.env.AWS_ACCESS_KEY_ID, secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY, } }) // MinIO (self-hosted) const minioStorage = createBlobStorage({ type: 's3', s3: { bucket: 'my-bucket', region: 'us-east-1', accessKeyId: 'minioadmin', secretAccessKey: 'minioadmin', endpoint: 'http://localhost:9000', forcePathStyle: true, } }) // Upload file const file = Buffer.from('Hello, World!') const metadata = await s3Storage.upload('documents/hello.txt', file, { contentType: 'text/plain', metadata: { author: 'john', department: 'engineering' } }) // Generate presigned URL (S3 only) const url = await s3Storage.generatePresignedUrl('documents/hello.txt', 3600) // 1 hour console.log('Share this URL:', url) // List files with prefix const result = await s3Storage.list({ prefix: 'documents/', maxKeys: 100 }) for (const item of result.items) { console.log(`${item.key}: ${item.size} bytes`) } ``` #### 2. Filesystem Storage ```typescript import { createBlobStorage } from './dbal/ts/src/blob' // Local filesystem const fsStorage = createBlobStorage({ type: 'filesystem', filesystem: { basePath: '/var/app/uploads', createIfNotExists: true } }) // Samba/NFS (just mount it first) const sambaStorage = createBlobStorage({ type: 'filesystem', filesystem: { basePath: '/mnt/samba-share/uploads', createIfNotExists: true } }) // Upload file await fsStorage.upload('users/profile-123.jpg', imageBuffer, { contentType: 'image/jpeg', metadata: { userId: '123' } }) // Download file const data = await fsStorage.download('users/profile-123.jpg') // Stream large file upload import { createReadStream } from 'fs' const stream = createReadStream('./large-video.mp4') await fsStorage.uploadStream('media/video-456.mp4', stream, fileSize, { contentType: 'video/mp4' }) ``` #### 3. In-Memory Storage (Testing) ```typescript import { MemoryStorage } from './dbal/ts/src/blob' const memStorage = new MemoryStorage() // Upload await memStorage.upload('test.txt', Buffer.from('test data')) // Exists check const exists = await memStorage.exists('test.txt') // true // Download const data = await memStorage.download('test.txt') console.log(data.toString()) // 'test data' // Get statistics const totalSize = await memStorage.getTotalSize() const count = await memStorage.getObjectCount() ``` #### 4. Streaming Large Files ```typescript // Upload stream const fileStream = createReadStream('./large-file.bin') await storage.uploadStream('backups/data.bin', fileStream, fileSize) // Download stream const downloadStream = await storage.downloadStream('backups/data.bin') const writeStream = createWriteStream('./downloaded.bin') await pipeline(downloadStream, writeStream) // Partial download (range request) const partialData = await storage.download('large-file.bin', { offset: 1000, length: 5000 // Download bytes 1000-6000 }) ``` ### C++ Examples ```cpp #include "dbal/blob_storage.hpp" #include "dbal/blob/memory_storage.cpp" using namespace dbal; using namespace dbal::blob; // Create storage MemoryStorage storage; // Upload std::vector data = {'H', 'e', 'l', 'l', 'o'}; UploadOptions options; options.content_type = "text/plain"; options.metadata["author"] = "john"; auto result = storage.upload("test.txt", data, options); if (result.isOk()) { auto meta = result.value(); std::cout << "Uploaded: " << meta.key << " (" << meta.size << " bytes)\n"; } // Download auto download_result = storage.download("test.txt"); if (download_result.isOk()) { auto content = download_result.value(); std::cout << "Content: " << std::string(content.begin(), content.end()) << "\n"; } // List blobs ListOptions list_opts; list_opts.prefix = "documents/"; list_opts.max_keys = 100; auto list_result = storage.list(list_opts); if (list_result.isOk()) { for (const auto& item : list_result.value().items) { std::cout << item.key << ": " << item.size << " bytes\n"; } } // Check exists auto exists_result = storage.exists("test.txt"); if (exists_result.isOk() && exists_result.value()) { std::cout << "File exists!\n"; } // Copy blob auto copy_result = storage.copy("test.txt", "test-copy.txt"); if (copy_result.isOk()) { std::cout << "File copied successfully\n"; } // Delete auto delete_result = storage.deleteBlob("test.txt"); ``` --- ## Integration Patterns ### 1. User File Uploads ```typescript import { createBlobStorage } from './dbal/ts/src/blob' const storage = createBlobStorage({ type: 's3', s3: { ... } }) async function handleFileUpload(userId: string, file: File) { const key = `users/${userId}/uploads/${file.name}` // Upload with metadata const metadata = await storage.upload(key, await file.arrayBuffer(), { contentType: file.type, metadata: { originalName: file.name, uploadedBy: userId, uploadedAt: new Date().toISOString() } }) // Generate shareable URL (S3) const shareUrl = await storage.generatePresignedUrl(key, 7 * 24 * 3600) // 7 days return { metadata, shareUrl } } ``` ### 2. Profile Picture Storage ```typescript async function saveProfilePicture(userId: string, imageData: Buffer) { const key = `profiles/${userId}/avatar.jpg` // Delete old avatar if exists const exists = await storage.exists(key) if (exists) { await storage.delete(key) } // Upload new avatar return await storage.upload(key, imageData, { contentType: 'image/jpeg', metadata: { userId } }) } async function getProfilePictureUrl(userId: string): Promise { const key = `profiles/${userId}/avatar.jpg` return await storage.generatePresignedUrl(key, 3600) // 1 hour } ``` ### 3. Document Versioning ```typescript async function saveDocumentVersion(docId: string, version: number, content: Buffer) { const key = `documents/${docId}/v${version}.pdf` await storage.upload(key, content, { contentType: 'application/pdf', metadata: { docId, version: version.toString(), timestamp: new Date().toISOString() } }) } async function listDocumentVersions(docId: string) { const result = await storage.list({ prefix: `documents/${docId}/` }) return result.items.map(item => ({ key: item.key, version: parseInt(item.key.match(/v(\d+)/)?.[1] || '0'), size: item.size, lastModified: item.lastModified })) } ``` ### 4. Backup System ```typescript async function createBackup(backupId: string, data: NodeJS.ReadableStream) { const key = `backups/${new Date().toISOString()}-${backupId}.tar.gz` const metadata = await storage.uploadStream(key, data, dataSize, { contentType: 'application/gzip', metadata: { backupId, createdAt: new Date().toISOString() } }) console.log(`Backup created: ${metadata.key} (${metadata.size} bytes)`) } async function listBackups() { const result = await storage.list({ prefix: 'backups/' }) return result.items.sort((a, b) => b.lastModified.getTime() - a.lastModified.getTime() ) } ``` --- ## Configuration ### Environment Variables ```bash # S3/AWS Configuration AWS_ACCESS_KEY_ID=your_access_key AWS_SECRET_ACCESS_KEY=your_secret_key AWS_REGION=us-east-1 S3_BUCKET=my-app-bucket # MinIO Configuration MINIO_ENDPOINT=http://localhost:9000 MINIO_ACCESS_KEY=minioadmin MINIO_SECRET_KEY=minioadmin MINIO_BUCKET=uploads # Filesystem Configuration UPLOAD_PATH=/var/app/uploads SAMBA_MOUNT=/mnt/samba-share ``` ### Configuration File ```json { "blobStorage": { "production": { "type": "s3", "s3": { "bucket": "prod-uploads", "region": "us-east-1" } }, "staging": { "type": "s3", "s3": { "bucket": "staging-uploads", "region": "us-east-1" } }, "development": { "type": "filesystem", "filesystem": { "basePath": "./uploads", "createIfNotExists": true } }, "test": { "type": "memory" } } } ``` --- ## Security Considerations ### 1. Access Control ```typescript // Don't expose storage directly - wrap with permissions async function downloadFile(userId: string, fileKey: string) { // Check if user owns the file if (!fileKey.startsWith(`users/${userId}/`)) { throw new Error('Unauthorized') } return await storage.download(fileKey) } ``` ### 2. Content Type Validation ```typescript const ALLOWED_TYPES = ['image/jpeg', 'image/png', 'application/pdf'] async function uploadFile(key: string, data: Buffer, contentType: string) { if (!ALLOWED_TYPES.includes(contentType)) { throw new Error('Invalid file type') } return await storage.upload(key, data, { contentType }) } ``` ### 3. Size Limits ```typescript const MAX_FILE_SIZE = 10 * 1024 * 1024 // 10 MB async function uploadWithSizeCheck(key: string, data: Buffer) { if (data.length > MAX_FILE_SIZE) { throw new Error('File too large') } return await storage.upload(key, data) } ``` ### 4. Path Traversal Protection The filesystem storage automatically prevents directory traversal attacks: - Normalizes paths - Strips `../` sequences - Validates all paths are within basePath --- ## Performance Optimization ### 1. Streaming for Large Files ```typescript // Bad: Loads entire file into memory const data = await fs.readFile('./large-video.mp4') await storage.upload('video.mp4', data) // Good: Streams data const stream = createReadStream('./large-video.mp4') await storage.uploadStream('video.mp4', stream, fileSize) ``` ### 2. Parallel Uploads ```typescript // Upload multiple files in parallel const files = ['file1.jpg', 'file2.jpg', 'file3.jpg'] await Promise.all( files.map(file => storage.upload(file, data[file])) ) ``` ### 3. Presigned URLs (S3 Only) ```typescript // Instead of downloading and re-serving // Bad: const data = await storage.download('image.jpg') res.send(data) // Wastes bandwidth and time // Good: Let client download directly from S3 const url = await storage.generatePresignedUrl('image.jpg', 300) // 5 min res.json({ url }) // Client downloads directly ``` --- ## Testing ### Unit Tests with Memory Storage ```typescript import { MemoryStorage } from './dbal/ts/src/blob' describe('File Upload', () => { let storage: MemoryStorage beforeEach(() => { storage = new MemoryStorage() }) it('should upload and download file', async () => { const data = Buffer.from('test content') await storage.upload('test.txt', data) const downloaded = await storage.download('test.txt') expect(downloaded.toString()).toBe('test content') }) it('should throw on duplicate without overwrite', async () => { await storage.upload('test.txt', Buffer.from('first')) await expect( storage.upload('test.txt', Buffer.from('second'), { overwrite: false }) ).rejects.toThrow('already exists') }) }) ``` --- ## Migration Guide ### From Direct Filesystem ```typescript // Before import { writeFile, readFile } from 'fs/promises' await writeFile('./uploads/file.txt', data) const content = await readFile('./uploads/file.txt') // After import { createBlobStorage } from './dbal/ts/src/blob' const storage = createBlobStorage({ type: 'filesystem', filesystem: { basePath: './uploads' } }) await storage.upload('file.txt', data) const content = await storage.download('file.txt') ``` ### From AWS SDK Directly ```typescript // Before import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3' const s3 = new S3Client({ region: 'us-east-1' }) await s3.send(new PutObjectCommand({ Bucket: 'my-bucket', Key: 'file.txt', Body: data })) // After import { createBlobStorage } from './dbal/ts/src/blob' const storage = createBlobStorage({ type: 's3', s3: { bucket: 'my-bucket', region: 'us-east-1' } }) await storage.upload('file.txt', data) ``` --- ## Limitations ### Current Implementation **C++**: - ✅ Interface defined - ✅ Memory storage implemented - ⏳ S3 storage (stub - requires AWS SDK) - ⏳ Filesystem storage (stub - requires implementation) **TypeScript**: - ✅ All interfaces implemented - ✅ Memory storage (production-ready) - ✅ S3 storage (requires `@aws-sdk/client-s3`) - ✅ Filesystem storage (production-ready) ### Known Limitations 1. **C++ S3/Filesystem**: Stub implementations (interfaces defined, implementation pending) 2. **Large file handling**: Memory storage not suitable for files > 1GB 3. **List operations**: S3 pagination limited to 1000 items per call (handled automatically) 4. **Filesystem**: No built-in encryption (use encrypted filesystem) --- ## Future Enhancements 1. **Additional Backends**: - Azure Blob Storage - Google Cloud Storage - Cloudflare R2 - IPFS 2. **Features**: - Automatic compression - Image resizing/optimization - Virus scanning integration - CDN integration - Automatic backup/replication 3. **Performance**: - Connection pooling - Automatic retry with exponential backoff - Multipart upload optimization - Caching layer --- ## Files Changed **C++ Files** (2 new): - `dbal/cpp/include/dbal/blob_storage.hpp` - Interface definition - `dbal/cpp/src/blob/memory_storage.cpp` - Memory implementation **TypeScript Files** (5 new): - `dbal/ts/src/blob/blob-storage.ts` - Interface definition - `dbal/ts/src/blob/memory-storage.ts` - Memory implementation - `dbal/ts/src/blob/s3-storage.ts` - S3 implementation - `dbal/ts/src/blob/filesystem-storage.ts` - Filesystem implementation - `dbal/ts/src/blob/index.ts` - Exports and factory - `dbal/ts/src/index.ts` - Updated exports **Documentation** (1 new): - `BLOB_STORAGE_IMPLEMENTATION.md` - Complete guide --- ## Conclusion ✅ **Blob Storage Support Complete** The DBAL now supports comprehensive blob storage with multiple backends: - ✅ S3-compatible storage (AWS, MinIO, etc.) - ✅ Filesystem storage (local, Samba, NFS) - ✅ In-memory storage (testing) **Features**: - Streaming support for large files - Presigned URLs (S3) - Metadata storage - Range requests - Copy operations - Statistics **Ready for**: - Media uploads - Document storage - Backup systems - Profile pictures - File versioning - Any blob storage needs