Files
metabuilder/docs/implementation/BLOB_STORAGE_IMPLEMENTATION.md
2025-12-25 11:42:22 +00:00

719 lines
17 KiB
Markdown

# Blob Storage Support for DBAL
**Date**: 2025-12-24
**Status**: ✅ **COMPLETE**
## Summary
Added comprehensive blob storage support to the DBAL with multiple backend implementations including S3, filesystem, and in-memory storage. This enables the platform to handle binary files, media uploads, document storage, and any other blob data requirements.
---
## Supported Storage Backends
### 1. S3-Compatible Storage
**Supported Services**:
- AWS S3
- MinIO (self-hosted S3-compatible)
- DigitalOcean Spaces
- Backblaze B2
- Wasabi
- Any S3-compatible object storage
**Features**:
- ✅ Presigned URLs for temporary access
- ✅ Multipart uploads for large files
- ✅ Streaming support
- ✅ Server-side encryption support
- ✅ Metadata storage
- ✅ Cross-region replication
**Dependencies**: `@aws-sdk/client-s3`, `@aws-sdk/lib-storage`, `@aws-sdk/s3-request-presigner`
### 2. Filesystem Storage
**Supported Systems**:
- Local filesystem
- Samba/CIFS network shares
- NFS (Network File System)
- Any mounted filesystem
**Features**:
- ✅ Path traversal protection
- ✅ Metadata sidecar files
- ✅ Streaming support
- ✅ Atomic operations
- ✅ Directory traversal for listing
**Use Cases**:
- Development and testing
- Shared network storage
- Legacy system integration
- Simple deployment scenarios
### 3. In-Memory Storage
**Features**:
- ✅ Zero configuration
- ✅ Fast operations
- ✅ Perfect for testing
- ✅ Ephemeral data
**Use Cases**:
- Unit testing
- Development
- Temporary file processing
- Cache layer
---
## API Overview
### Core Operations
```typescript
interface BlobStorage {
// Upload operations
upload(key: string, data: Buffer, options?: UploadOptions): Promise<BlobMetadata>
uploadStream(key: string, stream: ReadableStream, size: number, options?: UploadOptions): Promise<BlobMetadata>
// Download operations
download(key: string, options?: DownloadOptions): Promise<Buffer>
downloadStream(key: string, options?: DownloadOptions): Promise<ReadableStream>
// Management operations
delete(key: string): Promise<boolean>
exists(key: string): Promise<boolean>
getMetadata(key: string): Promise<BlobMetadata>
list(options?: ListOptions): Promise<BlobListResult>
copy(sourceKey: string, destKey: string): Promise<BlobMetadata>
// Advanced features
generatePresignedUrl(key: string, expirationSeconds?: number): Promise<string>
getTotalSize(): Promise<number>
getObjectCount(): Promise<number>
}
```
---
## Usage Examples
### TypeScript Examples
#### 1. S3 Storage (AWS/MinIO)
```typescript
import { createBlobStorage } from './dbal/ts/src/blob'
// AWS S3
const s3Storage = createBlobStorage({
type: 's3',
s3: {
bucket: 'my-bucket',
region: 'us-east-1',
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
}
})
// MinIO (self-hosted)
const minioStorage = createBlobStorage({
type: 's3',
s3: {
bucket: 'my-bucket',
region: 'us-east-1',
accessKeyId: 'minioadmin',
secretAccessKey: 'minioadmin',
endpoint: 'http://localhost:9000',
forcePathStyle: true,
}
})
// Upload file
const file = Buffer.from('Hello, World!')
const metadata = await s3Storage.upload('documents/hello.txt', file, {
contentType: 'text/plain',
metadata: {
author: 'john',
department: 'engineering'
}
})
// Generate presigned URL (S3 only)
const url = await s3Storage.generatePresignedUrl('documents/hello.txt', 3600) // 1 hour
console.log('Share this URL:', url)
// List files with prefix
const result = await s3Storage.list({ prefix: 'documents/', maxKeys: 100 })
for (const item of result.items) {
console.log(`${item.key}: ${item.size} bytes`)
}
```
#### 2. Filesystem Storage
```typescript
import { createBlobStorage } from './dbal/ts/src/blob'
// Local filesystem
const fsStorage = createBlobStorage({
type: 'filesystem',
filesystem: {
basePath: '/var/app/uploads',
createIfNotExists: true
}
})
// Samba/NFS (just mount it first)
const sambaStorage = createBlobStorage({
type: 'filesystem',
filesystem: {
basePath: '/mnt/samba-share/uploads',
createIfNotExists: true
}
})
// Upload file
await fsStorage.upload('users/profile-123.jpg', imageBuffer, {
contentType: 'image/jpeg',
metadata: { userId: '123' }
})
// Download file
const data = await fsStorage.download('users/profile-123.jpg')
// Stream large file upload
import { createReadStream } from 'fs'
const stream = createReadStream('./large-video.mp4')
await fsStorage.uploadStream('media/video-456.mp4', stream, fileSize, {
contentType: 'video/mp4'
})
```
#### 3. In-Memory Storage (Testing)
```typescript
import { MemoryStorage } from './dbal/ts/src/blob'
const memStorage = new MemoryStorage()
// Upload
await memStorage.upload('test.txt', Buffer.from('test data'))
// Exists check
const exists = await memStorage.exists('test.txt') // true
// Download
const data = await memStorage.download('test.txt')
console.log(data.toString()) // 'test data'
// Get statistics
const totalSize = await memStorage.getTotalSize()
const count = await memStorage.getObjectCount()
```
#### 4. Streaming Large Files
```typescript
// Upload stream
const fileStream = createReadStream('./large-file.bin')
await storage.uploadStream('backups/data.bin', fileStream, fileSize)
// Download stream
const downloadStream = await storage.downloadStream('backups/data.bin')
const writeStream = createWriteStream('./downloaded.bin')
await pipeline(downloadStream, writeStream)
// Partial download (range request)
const partialData = await storage.download('large-file.bin', {
offset: 1000,
length: 5000 // Download bytes 1000-6000
})
```
### C++ Examples
```cpp
#include "dbal/blob_storage.hpp"
#include "dbal/blob/memory_storage.cpp"
using namespace dbal;
using namespace dbal::blob;
// Create storage
MemoryStorage storage;
// Upload
std::vector<char> data = {'H', 'e', 'l', 'l', 'o'};
UploadOptions options;
options.content_type = "text/plain";
options.metadata["author"] = "john";
auto result = storage.upload("test.txt", data, options);
if (result.isOk()) {
auto meta = result.value();
std::cout << "Uploaded: " << meta.key << " (" << meta.size << " bytes)\n";
}
// Download
auto download_result = storage.download("test.txt");
if (download_result.isOk()) {
auto content = download_result.value();
std::cout << "Content: " << std::string(content.begin(), content.end()) << "\n";
}
// List blobs
ListOptions list_opts;
list_opts.prefix = "documents/";
list_opts.max_keys = 100;
auto list_result = storage.list(list_opts);
if (list_result.isOk()) {
for (const auto& item : list_result.value().items) {
std::cout << item.key << ": " << item.size << " bytes\n";
}
}
// Check exists
auto exists_result = storage.exists("test.txt");
if (exists_result.isOk() && exists_result.value()) {
std::cout << "File exists!\n";
}
// Copy blob
auto copy_result = storage.copy("test.txt", "test-copy.txt");
if (copy_result.isOk()) {
std::cout << "File copied successfully\n";
}
// Delete
auto delete_result = storage.deleteBlob("test.txt");
```
---
## Integration Patterns
### 1. User File Uploads
```typescript
import { createBlobStorage } from './dbal/ts/src/blob'
const storage = createBlobStorage({ type: 's3', s3: { ... } })
async function handleFileUpload(userId: string, file: File) {
const key = `users/${userId}/uploads/${file.name}`
// Upload with metadata
const metadata = await storage.upload(key, await file.arrayBuffer(), {
contentType: file.type,
metadata: {
originalName: file.name,
uploadedBy: userId,
uploadedAt: new Date().toISOString()
}
})
// Generate shareable URL (S3)
const shareUrl = await storage.generatePresignedUrl(key, 7 * 24 * 3600) // 7 days
return { metadata, shareUrl }
}
```
### 2. Profile Picture Storage
```typescript
async function saveProfilePicture(userId: string, imageData: Buffer) {
const key = `profiles/${userId}/avatar.jpg`
// Delete old avatar if exists
const exists = await storage.exists(key)
if (exists) {
await storage.delete(key)
}
// Upload new avatar
return await storage.upload(key, imageData, {
contentType: 'image/jpeg',
metadata: { userId }
})
}
async function getProfilePictureUrl(userId: string): Promise<string> {
const key = `profiles/${userId}/avatar.jpg`
return await storage.generatePresignedUrl(key, 3600) // 1 hour
}
```
### 3. Document Versioning
```typescript
async function saveDocumentVersion(docId: string, version: number, content: Buffer) {
const key = `documents/${docId}/v${version}.pdf`
await storage.upload(key, content, {
contentType: 'application/pdf',
metadata: {
docId,
version: version.toString(),
timestamp: new Date().toISOString()
}
})
}
async function listDocumentVersions(docId: string) {
const result = await storage.list({ prefix: `documents/${docId}/` })
return result.items.map(item => ({
key: item.key,
version: parseInt(item.key.match(/v(\d+)/)?.[1] || '0'),
size: item.size,
lastModified: item.lastModified
}))
}
```
### 4. Backup System
```typescript
async function createBackup(backupId: string, data: NodeJS.ReadableStream) {
const key = `backups/${new Date().toISOString()}-${backupId}.tar.gz`
const metadata = await storage.uploadStream(key, data, dataSize, {
contentType: 'application/gzip',
metadata: {
backupId,
createdAt: new Date().toISOString()
}
})
console.log(`Backup created: ${metadata.key} (${metadata.size} bytes)`)
}
async function listBackups() {
const result = await storage.list({ prefix: 'backups/' })
return result.items.sort((a, b) =>
b.lastModified.getTime() - a.lastModified.getTime()
)
}
```
---
## Configuration
### Environment Variables
```bash
# S3/AWS Configuration
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
S3_BUCKET=my-app-bucket
# MinIO Configuration
MINIO_ENDPOINT=http://localhost:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_BUCKET=uploads
# Filesystem Configuration
UPLOAD_PATH=/var/app/uploads
SAMBA_MOUNT=/mnt/samba-share
```
### Configuration File
```json
{
"blobStorage": {
"production": {
"type": "s3",
"s3": {
"bucket": "prod-uploads",
"region": "us-east-1"
}
},
"staging": {
"type": "s3",
"s3": {
"bucket": "staging-uploads",
"region": "us-east-1"
}
},
"development": {
"type": "filesystem",
"filesystem": {
"basePath": "./uploads",
"createIfNotExists": true
}
},
"test": {
"type": "memory"
}
}
}
```
---
## Security Considerations
### 1. Access Control
```typescript
// Don't expose storage directly - wrap with permissions
async function downloadFile(userId: string, fileKey: string) {
// Check if user owns the file
if (!fileKey.startsWith(`users/${userId}/`)) {
throw new Error('Unauthorized')
}
return await storage.download(fileKey)
}
```
### 2. Content Type Validation
```typescript
const ALLOWED_TYPES = ['image/jpeg', 'image/png', 'application/pdf']
async function uploadFile(key: string, data: Buffer, contentType: string) {
if (!ALLOWED_TYPES.includes(contentType)) {
throw new Error('Invalid file type')
}
return await storage.upload(key, data, { contentType })
}
```
### 3. Size Limits
```typescript
const MAX_FILE_SIZE = 10 * 1024 * 1024 // 10 MB
async function uploadWithSizeCheck(key: string, data: Buffer) {
if (data.length > MAX_FILE_SIZE) {
throw new Error('File too large')
}
return await storage.upload(key, data)
}
```
### 4. Path Traversal Protection
The filesystem storage automatically prevents directory traversal attacks:
- Normalizes paths
- Strips `../` sequences
- Validates all paths are within basePath
---
## Performance Optimization
### 1. Streaming for Large Files
```typescript
// Bad: Loads entire file into memory
const data = await fs.readFile('./large-video.mp4')
await storage.upload('video.mp4', data)
// Good: Streams data
const stream = createReadStream('./large-video.mp4')
await storage.uploadStream('video.mp4', stream, fileSize)
```
### 2. Parallel Uploads
```typescript
// Upload multiple files in parallel
const files = ['file1.jpg', 'file2.jpg', 'file3.jpg']
await Promise.all(
files.map(file => storage.upload(file, data[file]))
)
```
### 3. Presigned URLs (S3 Only)
```typescript
// Instead of downloading and re-serving
// Bad:
const data = await storage.download('image.jpg')
res.send(data) // Wastes bandwidth and time
// Good: Let client download directly from S3
const url = await storage.generatePresignedUrl('image.jpg', 300) // 5 min
res.json({ url }) // Client downloads directly
```
---
## Testing
### Unit Tests with Memory Storage
```typescript
import { MemoryStorage } from './dbal/ts/src/blob'
describe('File Upload', () => {
let storage: MemoryStorage
beforeEach(() => {
storage = new MemoryStorage()
})
it('should upload and download file', async () => {
const data = Buffer.from('test content')
await storage.upload('test.txt', data)
const downloaded = await storage.download('test.txt')
expect(downloaded.toString()).toBe('test content')
})
it('should throw on duplicate without overwrite', async () => {
await storage.upload('test.txt', Buffer.from('first'))
await expect(
storage.upload('test.txt', Buffer.from('second'), { overwrite: false })
).rejects.toThrow('already exists')
})
})
```
---
## Migration Guide
### From Direct Filesystem
```typescript
// Before
import { writeFile, readFile } from 'fs/promises'
await writeFile('./uploads/file.txt', data)
const content = await readFile('./uploads/file.txt')
// After
import { createBlobStorage } from './dbal/ts/src/blob'
const storage = createBlobStorage({
type: 'filesystem',
filesystem: { basePath: './uploads' }
})
await storage.upload('file.txt', data)
const content = await storage.download('file.txt')
```
### From AWS SDK Directly
```typescript
// Before
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3'
const s3 = new S3Client({ region: 'us-east-1' })
await s3.send(new PutObjectCommand({ Bucket: 'my-bucket', Key: 'file.txt', Body: data }))
// After
import { createBlobStorage } from './dbal/ts/src/blob'
const storage = createBlobStorage({
type: 's3',
s3: { bucket: 'my-bucket', region: 'us-east-1' }
})
await storage.upload('file.txt', data)
```
---
## Limitations
### Current Implementation
**C++**:
- ✅ Interface defined
- ✅ Memory storage implemented
- ⏳ S3 storage (stub - requires AWS SDK)
- ⏳ Filesystem storage (stub - requires implementation)
**TypeScript**:
- ✅ All interfaces implemented
- ✅ Memory storage (production-ready)
- ✅ S3 storage (requires `@aws-sdk/client-s3`)
- ✅ Filesystem storage (production-ready)
### Known Limitations
1. **C++ S3/Filesystem**: Stub implementations (interfaces defined, implementation pending)
2. **Large file handling**: Memory storage not suitable for files > 1GB
3. **List operations**: S3 pagination limited to 1000 items per call (handled automatically)
4. **Filesystem**: No built-in encryption (use encrypted filesystem)
---
## Future Enhancements
1. **Additional Backends**:
- Azure Blob Storage
- Google Cloud Storage
- Cloudflare R2
- IPFS
2. **Features**:
- Automatic compression
- Image resizing/optimization
- Virus scanning integration
- CDN integration
- Automatic backup/replication
3. **Performance**:
- Connection pooling
- Automatic retry with exponential backoff
- Multipart upload optimization
- Caching layer
---
## Files Changed
**C++ Files** (2 new):
- `dbal/cpp/include/dbal/blob_storage.hpp` - Interface definition
- `dbal/cpp/src/blob/memory_storage.cpp` - Memory implementation
**TypeScript Files** (5 new):
- `dbal/ts/src/blob/blob-storage.ts` - Interface definition
- `dbal/ts/src/blob/memory-storage.ts` - Memory implementation
- `dbal/ts/src/blob/s3-storage.ts` - S3 implementation
- `dbal/ts/src/blob/filesystem-storage.ts` - Filesystem implementation
- `dbal/ts/src/blob/index.ts` - Exports and factory
- `dbal/ts/src/index.ts` - Updated exports
**Documentation** (1 new):
- `BLOB_STORAGE_IMPLEMENTATION.md` - Complete guide
---
## Conclusion
**Blob Storage Support Complete**
The DBAL now supports comprehensive blob storage with multiple backends:
- ✅ S3-compatible storage (AWS, MinIO, etc.)
- ✅ Filesystem storage (local, Samba, NFS)
- ✅ In-memory storage (testing)
**Features**:
- Streaming support for large files
- Presigned URLs (S3)
- Metadata storage
- Range requests
- Copy operations
- Statistics
**Ready for**:
- Media uploads
- Document storage
- Backup systems
- Profile pictures
- File versioning
- Any blob storage needs