mirror of
https://github.com/johndoe6345789/metabuilder.git
synced 2026-04-25 22:34:56 +00:00
719 lines
17 KiB
Markdown
719 lines
17 KiB
Markdown
# Blob Storage Support for DBAL
|
|
|
|
**Date**: 2025-12-24
|
|
**Status**: ✅ **COMPLETE**
|
|
|
|
## Summary
|
|
|
|
Added comprehensive blob storage support to the DBAL with multiple backend implementations including S3, filesystem, and in-memory storage. This enables the platform to handle binary files, media uploads, document storage, and any other blob data requirements.
|
|
|
|
---
|
|
|
|
## Supported Storage Backends
|
|
|
|
### 1. S3-Compatible Storage
|
|
|
|
**Supported Services**:
|
|
- AWS S3
|
|
- MinIO (self-hosted S3-compatible)
|
|
- DigitalOcean Spaces
|
|
- Backblaze B2
|
|
- Wasabi
|
|
- Any S3-compatible object storage
|
|
|
|
**Features**:
|
|
- ✅ Presigned URLs for temporary access
|
|
- ✅ Multipart uploads for large files
|
|
- ✅ Streaming support
|
|
- ✅ Server-side encryption support
|
|
- ✅ Metadata storage
|
|
- ✅ Cross-region replication
|
|
|
|
**Dependencies**: `@aws-sdk/client-s3`, `@aws-sdk/lib-storage`, `@aws-sdk/s3-request-presigner`
|
|
|
|
### 2. Filesystem Storage
|
|
|
|
**Supported Systems**:
|
|
- Local filesystem
|
|
- Samba/CIFS network shares
|
|
- NFS (Network File System)
|
|
- Any mounted filesystem
|
|
|
|
**Features**:
|
|
- ✅ Path traversal protection
|
|
- ✅ Metadata sidecar files
|
|
- ✅ Streaming support
|
|
- ✅ Atomic operations
|
|
- ✅ Directory traversal for listing
|
|
|
|
**Use Cases**:
|
|
- Development and testing
|
|
- Shared network storage
|
|
- Legacy system integration
|
|
- Simple deployment scenarios
|
|
|
|
### 3. In-Memory Storage
|
|
|
|
**Features**:
|
|
- ✅ Zero configuration
|
|
- ✅ Fast operations
|
|
- ✅ Perfect for testing
|
|
- ✅ Ephemeral data
|
|
|
|
**Use Cases**:
|
|
- Unit testing
|
|
- Development
|
|
- Temporary file processing
|
|
- Cache layer
|
|
|
|
---
|
|
|
|
## API Overview
|
|
|
|
### Core Operations
|
|
|
|
```typescript
|
|
interface BlobStorage {
|
|
// Upload operations
|
|
upload(key: string, data: Buffer, options?: UploadOptions): Promise<BlobMetadata>
|
|
uploadStream(key: string, stream: ReadableStream, size: number, options?: UploadOptions): Promise<BlobMetadata>
|
|
|
|
// Download operations
|
|
download(key: string, options?: DownloadOptions): Promise<Buffer>
|
|
downloadStream(key: string, options?: DownloadOptions): Promise<ReadableStream>
|
|
|
|
// Management operations
|
|
delete(key: string): Promise<boolean>
|
|
exists(key: string): Promise<boolean>
|
|
getMetadata(key: string): Promise<BlobMetadata>
|
|
list(options?: ListOptions): Promise<BlobListResult>
|
|
copy(sourceKey: string, destKey: string): Promise<BlobMetadata>
|
|
|
|
// Advanced features
|
|
generatePresignedUrl(key: string, expirationSeconds?: number): Promise<string>
|
|
getTotalSize(): Promise<number>
|
|
getObjectCount(): Promise<number>
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Usage Examples
|
|
|
|
### TypeScript Examples
|
|
|
|
#### 1. S3 Storage (AWS/MinIO)
|
|
|
|
```typescript
|
|
import { createBlobStorage } from './dbal/ts/src/blob'
|
|
|
|
// AWS S3
|
|
const s3Storage = createBlobStorage({
|
|
type: 's3',
|
|
s3: {
|
|
bucket: 'my-bucket',
|
|
region: 'us-east-1',
|
|
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
|
|
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
|
|
}
|
|
})
|
|
|
|
// MinIO (self-hosted)
|
|
const minioStorage = createBlobStorage({
|
|
type: 's3',
|
|
s3: {
|
|
bucket: 'my-bucket',
|
|
region: 'us-east-1',
|
|
accessKeyId: 'minioadmin',
|
|
secretAccessKey: 'minioadmin',
|
|
endpoint: 'http://localhost:9000',
|
|
forcePathStyle: true,
|
|
}
|
|
})
|
|
|
|
// Upload file
|
|
const file = Buffer.from('Hello, World!')
|
|
const metadata = await s3Storage.upload('documents/hello.txt', file, {
|
|
contentType: 'text/plain',
|
|
metadata: {
|
|
author: 'john',
|
|
department: 'engineering'
|
|
}
|
|
})
|
|
|
|
// Generate presigned URL (S3 only)
|
|
const url = await s3Storage.generatePresignedUrl('documents/hello.txt', 3600) // 1 hour
|
|
console.log('Share this URL:', url)
|
|
|
|
// List files with prefix
|
|
const result = await s3Storage.list({ prefix: 'documents/', maxKeys: 100 })
|
|
for (const item of result.items) {
|
|
console.log(`${item.key}: ${item.size} bytes`)
|
|
}
|
|
```
|
|
|
|
#### 2. Filesystem Storage
|
|
|
|
```typescript
|
|
import { createBlobStorage } from './dbal/ts/src/blob'
|
|
|
|
// Local filesystem
|
|
const fsStorage = createBlobStorage({
|
|
type: 'filesystem',
|
|
filesystem: {
|
|
basePath: '/var/app/uploads',
|
|
createIfNotExists: true
|
|
}
|
|
})
|
|
|
|
// Samba/NFS (just mount it first)
|
|
const sambaStorage = createBlobStorage({
|
|
type: 'filesystem',
|
|
filesystem: {
|
|
basePath: '/mnt/samba-share/uploads',
|
|
createIfNotExists: true
|
|
}
|
|
})
|
|
|
|
// Upload file
|
|
await fsStorage.upload('users/profile-123.jpg', imageBuffer, {
|
|
contentType: 'image/jpeg',
|
|
metadata: { userId: '123' }
|
|
})
|
|
|
|
// Download file
|
|
const data = await fsStorage.download('users/profile-123.jpg')
|
|
|
|
// Stream large file upload
|
|
import { createReadStream } from 'fs'
|
|
const stream = createReadStream('./large-video.mp4')
|
|
await fsStorage.uploadStream('media/video-456.mp4', stream, fileSize, {
|
|
contentType: 'video/mp4'
|
|
})
|
|
```
|
|
|
|
#### 3. In-Memory Storage (Testing)
|
|
|
|
```typescript
|
|
import { MemoryStorage } from './dbal/ts/src/blob'
|
|
|
|
const memStorage = new MemoryStorage()
|
|
|
|
// Upload
|
|
await memStorage.upload('test.txt', Buffer.from('test data'))
|
|
|
|
// Exists check
|
|
const exists = await memStorage.exists('test.txt') // true
|
|
|
|
// Download
|
|
const data = await memStorage.download('test.txt')
|
|
console.log(data.toString()) // 'test data'
|
|
|
|
// Get statistics
|
|
const totalSize = await memStorage.getTotalSize()
|
|
const count = await memStorage.getObjectCount()
|
|
```
|
|
|
|
#### 4. Streaming Large Files
|
|
|
|
```typescript
|
|
// Upload stream
|
|
const fileStream = createReadStream('./large-file.bin')
|
|
await storage.uploadStream('backups/data.bin', fileStream, fileSize)
|
|
|
|
// Download stream
|
|
const downloadStream = await storage.downloadStream('backups/data.bin')
|
|
const writeStream = createWriteStream('./downloaded.bin')
|
|
await pipeline(downloadStream, writeStream)
|
|
|
|
// Partial download (range request)
|
|
const partialData = await storage.download('large-file.bin', {
|
|
offset: 1000,
|
|
length: 5000 // Download bytes 1000-6000
|
|
})
|
|
```
|
|
|
|
### C++ Examples
|
|
|
|
```cpp
|
|
#include "dbal/blob_storage.hpp"
|
|
#include "dbal/blob/memory_storage.cpp"
|
|
|
|
using namespace dbal;
|
|
using namespace dbal::blob;
|
|
|
|
// Create storage
|
|
MemoryStorage storage;
|
|
|
|
// Upload
|
|
std::vector<char> data = {'H', 'e', 'l', 'l', 'o'};
|
|
UploadOptions options;
|
|
options.content_type = "text/plain";
|
|
options.metadata["author"] = "john";
|
|
|
|
auto result = storage.upload("test.txt", data, options);
|
|
if (result.isOk()) {
|
|
auto meta = result.value();
|
|
std::cout << "Uploaded: " << meta.key << " (" << meta.size << " bytes)\n";
|
|
}
|
|
|
|
// Download
|
|
auto download_result = storage.download("test.txt");
|
|
if (download_result.isOk()) {
|
|
auto content = download_result.value();
|
|
std::cout << "Content: " << std::string(content.begin(), content.end()) << "\n";
|
|
}
|
|
|
|
// List blobs
|
|
ListOptions list_opts;
|
|
list_opts.prefix = "documents/";
|
|
list_opts.max_keys = 100;
|
|
|
|
auto list_result = storage.list(list_opts);
|
|
if (list_result.isOk()) {
|
|
for (const auto& item : list_result.value().items) {
|
|
std::cout << item.key << ": " << item.size << " bytes\n";
|
|
}
|
|
}
|
|
|
|
// Check exists
|
|
auto exists_result = storage.exists("test.txt");
|
|
if (exists_result.isOk() && exists_result.value()) {
|
|
std::cout << "File exists!\n";
|
|
}
|
|
|
|
// Copy blob
|
|
auto copy_result = storage.copy("test.txt", "test-copy.txt");
|
|
if (copy_result.isOk()) {
|
|
std::cout << "File copied successfully\n";
|
|
}
|
|
|
|
// Delete
|
|
auto delete_result = storage.deleteBlob("test.txt");
|
|
```
|
|
|
|
---
|
|
|
|
## Integration Patterns
|
|
|
|
### 1. User File Uploads
|
|
|
|
```typescript
|
|
import { createBlobStorage } from './dbal/ts/src/blob'
|
|
|
|
const storage = createBlobStorage({ type: 's3', s3: { ... } })
|
|
|
|
async function handleFileUpload(userId: string, file: File) {
|
|
const key = `users/${userId}/uploads/${file.name}`
|
|
|
|
// Upload with metadata
|
|
const metadata = await storage.upload(key, await file.arrayBuffer(), {
|
|
contentType: file.type,
|
|
metadata: {
|
|
originalName: file.name,
|
|
uploadedBy: userId,
|
|
uploadedAt: new Date().toISOString()
|
|
}
|
|
})
|
|
|
|
// Generate shareable URL (S3)
|
|
const shareUrl = await storage.generatePresignedUrl(key, 7 * 24 * 3600) // 7 days
|
|
|
|
return { metadata, shareUrl }
|
|
}
|
|
```
|
|
|
|
### 2. Profile Picture Storage
|
|
|
|
```typescript
|
|
async function saveProfilePicture(userId: string, imageData: Buffer) {
|
|
const key = `profiles/${userId}/avatar.jpg`
|
|
|
|
// Delete old avatar if exists
|
|
const exists = await storage.exists(key)
|
|
if (exists) {
|
|
await storage.delete(key)
|
|
}
|
|
|
|
// Upload new avatar
|
|
return await storage.upload(key, imageData, {
|
|
contentType: 'image/jpeg',
|
|
metadata: { userId }
|
|
})
|
|
}
|
|
|
|
async function getProfilePictureUrl(userId: string): Promise<string> {
|
|
const key = `profiles/${userId}/avatar.jpg`
|
|
return await storage.generatePresignedUrl(key, 3600) // 1 hour
|
|
}
|
|
```
|
|
|
|
### 3. Document Versioning
|
|
|
|
```typescript
|
|
async function saveDocumentVersion(docId: string, version: number, content: Buffer) {
|
|
const key = `documents/${docId}/v${version}.pdf`
|
|
|
|
await storage.upload(key, content, {
|
|
contentType: 'application/pdf',
|
|
metadata: {
|
|
docId,
|
|
version: version.toString(),
|
|
timestamp: new Date().toISOString()
|
|
}
|
|
})
|
|
}
|
|
|
|
async function listDocumentVersions(docId: string) {
|
|
const result = await storage.list({ prefix: `documents/${docId}/` })
|
|
return result.items.map(item => ({
|
|
key: item.key,
|
|
version: parseInt(item.key.match(/v(\d+)/)?.[1] || '0'),
|
|
size: item.size,
|
|
lastModified: item.lastModified
|
|
}))
|
|
}
|
|
```
|
|
|
|
### 4. Backup System
|
|
|
|
```typescript
|
|
async function createBackup(backupId: string, data: NodeJS.ReadableStream) {
|
|
const key = `backups/${new Date().toISOString()}-${backupId}.tar.gz`
|
|
|
|
const metadata = await storage.uploadStream(key, data, dataSize, {
|
|
contentType: 'application/gzip',
|
|
metadata: {
|
|
backupId,
|
|
createdAt: new Date().toISOString()
|
|
}
|
|
})
|
|
|
|
console.log(`Backup created: ${metadata.key} (${metadata.size} bytes)`)
|
|
}
|
|
|
|
async function listBackups() {
|
|
const result = await storage.list({ prefix: 'backups/' })
|
|
return result.items.sort((a, b) =>
|
|
b.lastModified.getTime() - a.lastModified.getTime()
|
|
)
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# S3/AWS Configuration
|
|
AWS_ACCESS_KEY_ID=your_access_key
|
|
AWS_SECRET_ACCESS_KEY=your_secret_key
|
|
AWS_REGION=us-east-1
|
|
S3_BUCKET=my-app-bucket
|
|
|
|
# MinIO Configuration
|
|
MINIO_ENDPOINT=http://localhost:9000
|
|
MINIO_ACCESS_KEY=minioadmin
|
|
MINIO_SECRET_KEY=minioadmin
|
|
MINIO_BUCKET=uploads
|
|
|
|
# Filesystem Configuration
|
|
UPLOAD_PATH=/var/app/uploads
|
|
SAMBA_MOUNT=/mnt/samba-share
|
|
```
|
|
|
|
### Configuration File
|
|
|
|
```json
|
|
{
|
|
"blobStorage": {
|
|
"production": {
|
|
"type": "s3",
|
|
"s3": {
|
|
"bucket": "prod-uploads",
|
|
"region": "us-east-1"
|
|
}
|
|
},
|
|
"staging": {
|
|
"type": "s3",
|
|
"s3": {
|
|
"bucket": "staging-uploads",
|
|
"region": "us-east-1"
|
|
}
|
|
},
|
|
"development": {
|
|
"type": "filesystem",
|
|
"filesystem": {
|
|
"basePath": "./uploads",
|
|
"createIfNotExists": true
|
|
}
|
|
},
|
|
"test": {
|
|
"type": "memory"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Security Considerations
|
|
|
|
### 1. Access Control
|
|
|
|
```typescript
|
|
// Don't expose storage directly - wrap with permissions
|
|
async function downloadFile(userId: string, fileKey: string) {
|
|
// Check if user owns the file
|
|
if (!fileKey.startsWith(`users/${userId}/`)) {
|
|
throw new Error('Unauthorized')
|
|
}
|
|
|
|
return await storage.download(fileKey)
|
|
}
|
|
```
|
|
|
|
### 2. Content Type Validation
|
|
|
|
```typescript
|
|
const ALLOWED_TYPES = ['image/jpeg', 'image/png', 'application/pdf']
|
|
|
|
async function uploadFile(key: string, data: Buffer, contentType: string) {
|
|
if (!ALLOWED_TYPES.includes(contentType)) {
|
|
throw new Error('Invalid file type')
|
|
}
|
|
|
|
return await storage.upload(key, data, { contentType })
|
|
}
|
|
```
|
|
|
|
### 3. Size Limits
|
|
|
|
```typescript
|
|
const MAX_FILE_SIZE = 10 * 1024 * 1024 // 10 MB
|
|
|
|
async function uploadWithSizeCheck(key: string, data: Buffer) {
|
|
if (data.length > MAX_FILE_SIZE) {
|
|
throw new Error('File too large')
|
|
}
|
|
|
|
return await storage.upload(key, data)
|
|
}
|
|
```
|
|
|
|
### 4. Path Traversal Protection
|
|
|
|
The filesystem storage automatically prevents directory traversal attacks:
|
|
- Normalizes paths
|
|
- Strips `../` sequences
|
|
- Validates all paths are within basePath
|
|
|
|
---
|
|
|
|
## Performance Optimization
|
|
|
|
### 1. Streaming for Large Files
|
|
|
|
```typescript
|
|
// Bad: Loads entire file into memory
|
|
const data = await fs.readFile('./large-video.mp4')
|
|
await storage.upload('video.mp4', data)
|
|
|
|
// Good: Streams data
|
|
const stream = createReadStream('./large-video.mp4')
|
|
await storage.uploadStream('video.mp4', stream, fileSize)
|
|
```
|
|
|
|
### 2. Parallel Uploads
|
|
|
|
```typescript
|
|
// Upload multiple files in parallel
|
|
const files = ['file1.jpg', 'file2.jpg', 'file3.jpg']
|
|
await Promise.all(
|
|
files.map(file => storage.upload(file, data[file]))
|
|
)
|
|
```
|
|
|
|
### 3. Presigned URLs (S3 Only)
|
|
|
|
```typescript
|
|
// Instead of downloading and re-serving
|
|
// Bad:
|
|
const data = await storage.download('image.jpg')
|
|
res.send(data) // Wastes bandwidth and time
|
|
|
|
// Good: Let client download directly from S3
|
|
const url = await storage.generatePresignedUrl('image.jpg', 300) // 5 min
|
|
res.json({ url }) // Client downloads directly
|
|
```
|
|
|
|
---
|
|
|
|
## Testing
|
|
|
|
### Unit Tests with Memory Storage
|
|
|
|
```typescript
|
|
import { MemoryStorage } from './dbal/ts/src/blob'
|
|
|
|
describe('File Upload', () => {
|
|
let storage: MemoryStorage
|
|
|
|
beforeEach(() => {
|
|
storage = new MemoryStorage()
|
|
})
|
|
|
|
it('should upload and download file', async () => {
|
|
const data = Buffer.from('test content')
|
|
|
|
await storage.upload('test.txt', data)
|
|
const downloaded = await storage.download('test.txt')
|
|
|
|
expect(downloaded.toString()).toBe('test content')
|
|
})
|
|
|
|
it('should throw on duplicate without overwrite', async () => {
|
|
await storage.upload('test.txt', Buffer.from('first'))
|
|
|
|
await expect(
|
|
storage.upload('test.txt', Buffer.from('second'), { overwrite: false })
|
|
).rejects.toThrow('already exists')
|
|
})
|
|
})
|
|
```
|
|
|
|
---
|
|
|
|
## Migration Guide
|
|
|
|
### From Direct Filesystem
|
|
|
|
```typescript
|
|
// Before
|
|
import { writeFile, readFile } from 'fs/promises'
|
|
await writeFile('./uploads/file.txt', data)
|
|
const content = await readFile('./uploads/file.txt')
|
|
|
|
// After
|
|
import { createBlobStorage } from './dbal/ts/src/blob'
|
|
const storage = createBlobStorage({
|
|
type: 'filesystem',
|
|
filesystem: { basePath: './uploads' }
|
|
})
|
|
await storage.upload('file.txt', data)
|
|
const content = await storage.download('file.txt')
|
|
```
|
|
|
|
### From AWS SDK Directly
|
|
|
|
```typescript
|
|
// Before
|
|
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3'
|
|
const s3 = new S3Client({ region: 'us-east-1' })
|
|
await s3.send(new PutObjectCommand({ Bucket: 'my-bucket', Key: 'file.txt', Body: data }))
|
|
|
|
// After
|
|
import { createBlobStorage } from './dbal/ts/src/blob'
|
|
const storage = createBlobStorage({
|
|
type: 's3',
|
|
s3: { bucket: 'my-bucket', region: 'us-east-1' }
|
|
})
|
|
await storage.upload('file.txt', data)
|
|
```
|
|
|
|
---
|
|
|
|
## Limitations
|
|
|
|
### Current Implementation
|
|
|
|
**C++**:
|
|
- ✅ Interface defined
|
|
- ✅ Memory storage implemented
|
|
- ⏳ S3 storage (stub - requires AWS SDK)
|
|
- ⏳ Filesystem storage (stub - requires implementation)
|
|
|
|
**TypeScript**:
|
|
- ✅ All interfaces implemented
|
|
- ✅ Memory storage (production-ready)
|
|
- ✅ S3 storage (requires `@aws-sdk/client-s3`)
|
|
- ✅ Filesystem storage (production-ready)
|
|
|
|
### Known Limitations
|
|
|
|
1. **C++ S3/Filesystem**: Stub implementations (interfaces defined, implementation pending)
|
|
2. **Large file handling**: Memory storage not suitable for files > 1GB
|
|
3. **List operations**: S3 pagination limited to 1000 items per call (handled automatically)
|
|
4. **Filesystem**: No built-in encryption (use encrypted filesystem)
|
|
|
|
---
|
|
|
|
## Future Enhancements
|
|
|
|
1. **Additional Backends**:
|
|
- Azure Blob Storage
|
|
- Google Cloud Storage
|
|
- Cloudflare R2
|
|
- IPFS
|
|
|
|
2. **Features**:
|
|
- Automatic compression
|
|
- Image resizing/optimization
|
|
- Virus scanning integration
|
|
- CDN integration
|
|
- Automatic backup/replication
|
|
|
|
3. **Performance**:
|
|
- Connection pooling
|
|
- Automatic retry with exponential backoff
|
|
- Multipart upload optimization
|
|
- Caching layer
|
|
|
|
---
|
|
|
|
## Files Changed
|
|
|
|
**C++ Files** (2 new):
|
|
- `dbal/cpp/include/dbal/blob_storage.hpp` - Interface definition
|
|
- `dbal/cpp/src/blob/memory_storage.cpp` - Memory implementation
|
|
|
|
**TypeScript Files** (5 new):
|
|
- `dbal/ts/src/blob/blob-storage.ts` - Interface definition
|
|
- `dbal/ts/src/blob/memory-storage.ts` - Memory implementation
|
|
- `dbal/ts/src/blob/s3-storage.ts` - S3 implementation
|
|
- `dbal/ts/src/blob/filesystem-storage.ts` - Filesystem implementation
|
|
- `dbal/ts/src/blob/index.ts` - Exports and factory
|
|
- `dbal/ts/src/index.ts` - Updated exports
|
|
|
|
**Documentation** (1 new):
|
|
- `BLOB_STORAGE_IMPLEMENTATION.md` - Complete guide
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
✅ **Blob Storage Support Complete**
|
|
|
|
The DBAL now supports comprehensive blob storage with multiple backends:
|
|
- ✅ S3-compatible storage (AWS, MinIO, etc.)
|
|
- ✅ Filesystem storage (local, Samba, NFS)
|
|
- ✅ In-memory storage (testing)
|
|
|
|
**Features**:
|
|
- Streaming support for large files
|
|
- Presigned URLs (S3)
|
|
- Metadata storage
|
|
- Range requests
|
|
- Copy operations
|
|
- Statistics
|
|
|
|
**Ready for**:
|
|
- Media uploads
|
|
- Document storage
|
|
- Backup systems
|
|
- Profile pictures
|
|
- File versioning
|
|
- Any blob storage needs
|