mirror of https://github.com/johndoe6345789/metabuilder.git synced 2026-04-26 06:44:58 +00:00

Files

johndoe6345789 df5398a7ee feat(auth): Phase 7 Flask authentication middleware with JWT and multi-tenant isolation

Complete implementation of enterprise-grade authentication middleware for email service:

Features:
- JWT token creation/validation with configurable expiration
- Bearer token extraction and validation
- Multi-tenant isolation enforced at middleware level
- Role-based access control (RBAC) with user/admin roles
- Row-level security (RLS) for resource access
- Automatic request logging with user context and audit trail
- CORS configuration for email client frontend
- Rate limiting (50 req/min per user with Redis backend)
- Comprehensive error handling with proper HTTP status codes

Implementation:
- Enhanced src/middleware/auth.py (415 lines)
  - JWTConfig class for token management
  - create_jwt_token() for token generation
  - decode_jwt_token() for token validation
  - @verify_tenant_context decorator for auth middleware
  - @verify_role decorator for RBAC
  - verify_resource_access() for row-level security
  - log_request_context() for audit logging

Testing:
- 52 comprehensive test cases covering all features
- 100% pass rate with fast execution (0.15s)
- Test categories: JWT, multi-tenant, RBAC, RLS, logging, integration
- Full coverage of error scenarios and edge cases

Documentation:
- AUTH_MIDDLEWARE.md: Complete API reference and configuration guide
- AUTH_INTEGRATION_EXAMPLE.py: Real-world usage examples for 5+ scenarios
- PHASE_7_SUMMARY.md: Implementation summary with checklist
- Inline code documentation with type hints

Security:
- Multi-tenant data isolation at all levels
- Constant-time password comparison
- JWT signature validation
- CORS protection
- Rate limiting against abuse
- Comprehensive audit logging

Dependencies Added:
- PyJWT==2.8.1

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-01-24 00:20:19 +00:00

14 KiB

Raw Permalink Blame History

Email Parser Plugin - Phase 6

RFC 5322 compliant email parsing with MIME multipart support, HTML sanitization, and comprehensive attachment extraction.

Features

RFC 5322 Compliance

Full RFC 5322 (Internet Message Format) header parsing
Header folding and continuation line support
Multiple header values handling
RFC 2047 encoded header decoding (charset, base64, quoted-printable)

MIME Message Support

RFC 2045-2049 multipart message handling
multipart/alternative (prefer HTML over plain text)
multipart/mixed (content + attachments)
multipart/related (content + inline resources)
Nested multipart structures
Content-Type parameter parsing (charset, boundary)

Attachment Handling

Attachment metadata extraction (filename, size, MIME type)
Inline vs attachment classification (Content-Disposition)
Content-ID for embedded resources
Content encoding detection (base64, quoted-printable, 7bit, 8bit, binary)
Size limits with configurable thresholds
Selective content extraction (metadata only or base64 encoded)

Security (XSS Protection)

Dangerous tag removal: <script>, <iframe>, <object>, <embed>, etc.
Event handler sanitization: onclick, onerror, onload, etc.
Attribute filtering on dangerous event handlers and URLs
Configurable sanitization (enable/disable)

Content Encoding

Base64 decoding
Quoted-printable decoding
7bit/8bit/binary pass-through
Automatic charset handling

Installation

npm install @metabuilder/workflow-plugin-email-parser

Usage

Basic Email Parsing

import { emailParserExecutor, EmailParserConfig } from '@metabuilder/workflow-plugin-email-parser';

const config: EmailParserConfig = {
  rawMessage: `From: sender@example.com
To: recipient@example.com
Subject: Test Email
Date: Mon, 23 Jan 2026 14:30:45 +0000

Hello, this is a test email.`,
  tenantId: 'tenant-123',
  sanitizeHtml: true,
  maxAttachmentSize: 25 * 1024 * 1024, // 25MB
  extractAttachmentContent: false
};

const node = {
  id: 'parse-email',
  type: 'email-parser',
  parameters: config
};

const result = await emailParserExecutor.execute(node, context, state);

if (result.status === 'success' || result.status === 'partial') {
  const message = result.output.message;
  console.log(`From: ${message.from}`);
  console.log(`To: ${message.to.join(', ')}`);
  console.log(`Subject: ${message.subject}`);
  console.log(`Body: ${message.textBody || message.htmlBody}`);
  console.log(`Attachments: ${message.attachmentCount}`);
}

Workflow Configuration

{
  "id": "email-parse-node",
  "type": "email-parser",
  "parameters": {
    "rawMessage": "{{ $json.rawEmailData }}",
    "tenantId": "{{ $context.tenantId }}",
    "sanitizeHtml": true,
    "maxAttachmentSize": 26214400,
    "extractAttachmentContent": false
  },
  "connections": ["imap-sync"]
}

Configuration Options

Parameter	Type	Default	Description
`rawMessage`	string	required	Raw email message in RFC 5322 format
`tenantId`	string	required	Tenant ID for multi-tenant context
`sanitizeHtml`	boolean	`true`	Remove dangerous HTML tags/attributes
`extractAttachmentContent`	boolean	`false`	Include base64 content for attachments
`maxAttachmentSize`	number	25MB	Maximum attachment size in bytes
`maxBodyLength`	number	1MB	Maximum body text length in characters

Output Format

ParsedEmailMessage

{
  messageId: string;                    // RFC 5322 Message-ID
  from: string;                         // Sender email address
  to: string[];                         // Recipients
  cc?: string[];                        // CC recipients
  bcc?: string[];                       // BCC recipients
  replyTo?: string;                     // Reply-To header
  subject: string;                      // Email subject
  textBody?: string;                    // Plain text version
  htmlBody?: string;                    // HTML version (sanitized)
  headers: Record<string, string[]>;    // All headers
  receivedAt: string;                   // ISO 8601 timestamp
  attachmentCount: number;              // Total attachments
  attachments: EmailAttachmentMetadata[]; // Attachment list
  size: number;                         // Message size in bytes
  priority?: 'high' | 'normal' | 'low'; // Priority from X-Priority
  mimeType: string;                     // Content-Type
}

EmailAttachmentMetadata

{
  filename: string;              // Original filename
  mimeType: string;              // e.g., "image/png", "application/pdf"
  size: number;                  // Size in bytes
  contentId?: string;            // For embedded resources
  isInline: boolean;             // Inline vs attachment
  content?: string;              // Base64 encoded (if extracted)
  contentEncoding: string;       // Encoding type (base64, quoted-printable, etc.)
}

Execution Result

{
  status: 'success' | 'partial' | 'error';
  output: {
    message?: ParsedEmailMessage;      // Parsed email (if successful)
    errors: ParserError[];              // Parse errors
    warnings: string[];                 // Non-fatal warnings
    metrics: {
      parseDurationMs: number;          // Parse time
      headerCount: number;              // Headers parsed
      partCount: number;                // MIME parts
      attachmentCount: number;          // Attachments found
      attachmentSizeBytes: number;      // Total attachment size
      sanitizationWarnings: number;     // HTML sanitization removals
    }
  };
  duration: number;
}

Error Handling

Error Codes

Code	Description	Recoverable
`MISSING_FROM`	No From header found	No
`MISSING_TO`	No valid To header	No
`INVALID_HEADERS`	Malformed header section	Yes
`INVALID_MIME`	Malformed MIME structure	Yes
`PARSE_ERROR`	Generic parse failure	No
`PARSE_EXCEPTION`	Unexpected exception	No

Partial Parsing

When status === 'partial':

Message was successfully extracted
Some non-critical errors or warnings occurred
Errors array contains details of issues
Message can still be processed (attachments, encoding errors, etc.)

Example:

if (result.status === 'partial') {
  console.log('Errors:', result.output.errors);
  console.log('Warnings:', result.output.warnings);
  // Still process message
  const message = result.output.message;
}

RFC Standards Implemented

RFC 5322 - Internet Message Format

Header parsing with folding support
Address list parsing
Date/time parsing
Comment handling
Quoted strings

RFC 2045-2049 - MIME

Content-Type parameter parsing
Multipart boundary detection
Content-Transfer-Encoding support
Content-Disposition handling

RFC 2047 - MIME Header Extensions

Encoded-word syntax: =?charset?encoding?text?=
Base64 and Quoted-Printable decoding
Multiple encoded words in single header

RFC 3501 - IMAP4rev1

MIME integration with IMAP flags
Content structure compatibility

Security Considerations

XSS Prevention

The parser automatically sanitizes HTML content by:

Removing dangerous tags: <script>, <iframe>, <object>, <embed>, <applet>, <meta>, <link>, <style>, <form>, <svg>, etc.
Removing event handlers: onclick, onerror, onload, onmouseover, onchange, onsubmit, etc.
Filtering dangerous attributes: href, src, action, formaction on dangerous tags
Counting sanitization actions: metrics.sanitizationWarnings tracks removed elements

Best Practices

// Always enable HTML sanitization for untrusted email sources
const config: EmailParserConfig = {
  rawMessage: emailFromImap,
  tenantId: userTenantId,
  sanitizeHtml: true,  // ✓ Always true for user emails
  extractAttachmentContent: false // ✓ Avoid extracting large files to memory
};

// Size limits prevent memory exhaustion
maxBodyLength: 1024 * 1024,           // 1MB
maxAttachmentSize: 25 * 1024 * 1024,  // 25MB per file

// Large attachments should be stored separately
if (attachment.size > 10 * 1024 * 1024) {
  // Store in S3 instead of database
}

No Code Execution

The parser:

Does NOT execute JavaScript or any code
Does NOT make external HTTP requests
Does NOT modify files on disk
Does NOT load external resources
Is fully synchronous and isolated

Examples

Simple Text Email

From: alice@example.com
To: bob@example.com
Subject: Hello
Message-ID: <msg@example.com>
Date: Mon, 23 Jan 2026 10:00:00 +0000

Hi Bob, how are you?

Multipart Alternative (Text + HTML)

From: sender@example.com
To: recipient@example.com
Subject: Test
Content-Type: multipart/alternative; boundary="boundary123"

--boundary123
Content-Type: text/plain

Plain text version

--boundary123
Content-Type: text/html

<html><body>HTML version</body></html>

--boundary123--

Email with Attachment

From: sender@example.com
To: recipient@example.com
Subject: Document
Content-Type: multipart/mixed; boundary="boundary456"

--boundary456
Content-Type: text/plain

See attachment.

--boundary456
Content-Type: application/pdf
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="report.pdf"

JVBERi0xLjQKJeLj...

--boundary456--

Email with Inline Image

Content-Type: multipart/mixed; boundary="boundary789"

--boundary789
Content-Type: text/html

<html><img src="cid:logo@company.com"/></html>

--boundary789
Content-Type: image/png
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="logo.png"
Content-ID: <logo@company.com>

iVBORw0KGgoAAAANSUhEUgA...

--boundary789--

Testing

Run the test suite:

npm test                    # Run all tests
npm run test:watch        # Watch mode
npm run type-check        # TypeScript validation
npm run build              # Build plugin

Test coverage includes:

RFC 5322 header parsing (simple, folded, multiple)
MIME multipart handling (alternative, mixed, nested)
Content encoding (base64, quoted-printable)
HTML sanitization (script, iframe, events)
Attachment extraction and cataloging
Error handling and recovery
Real-world complex emails
Metrics collection

Integration with Email Client

The parser is designed to work within the email client architecture:

IMAP Sync (imap-sync) - Fetches raw messages from IMAP server
Email Parser (email-parser) - Parses RFC 5322 format [THIS PLUGIN]
DBAL Storage - Stores parsed message in EmailMessage/EmailAttachment entities
Email Search (imap-search) - Full-text search on parsed content

Workflow Example

{
  "id": "email-sync-flow",
  "nodes": [
    {
      "id": "sync-node",
      "type": "imap-sync",
      "parameters": {
        "imapId": "{{ $context.imapClientId }}",
        "folderId": "{{ $json.folderId }}",
        "maxMessages": 100
      }
    },
    {
      "id": "parse-node",
      "type": "email-parser",
      "parameters": {
        "rawMessage": "{{ $json.messageContent }}",
        "tenantId": "{{ $context.tenantId }}",
        "sanitizeHtml": true
      },
      "connections": ["sync-node"]
    },
    {
      "id": "store-node",
      "type": "dbal-write",
      "parameters": {
        "entity": "EmailMessage",
        "data": "{{ $json.parsedMessage }}"
      },
      "connections": ["parse-node"]
    }
  ]
}

Performance

Benchmarks

Typical parsing times on modern hardware:

Message Type	Size	Time
Simple text	2KB	<1ms
Text + HTML multipart	50KB	2-5ms
With small attachment	500KB	5-10ms
Large HTML with images	5MB	50-100ms

Memory Usage

Per message parsing: ~10-20MB (includes decoded content)
Streaming not supported (loads entire message into memory)
Large attachments should be extracted to disk

Optimization Tips

// Don't extract large attachment content
extractAttachmentContent: false,  // ✓ Metadata only

// Limit body length for huge messages
maxBodyLength: 1024 * 1024,       // ✓ 1MB limit

// Set reasonable attachment size limit
maxAttachmentSize: 25 * 1024 * 1024, // ✓ 25MB

// Disable HTML sanitization if not needed (rare)
sanitizeHtml: false,              // ✗ Usually want sanitization

Limitations

No streaming: Entire message loaded into memory
Synchronous: No async I/O (parsing only)
No external resources: Links and images not fetched
Limited charset support: UTF-8, ASCII, ISO-8859-1 primarily
No S/MIME or PGP: Encrypted messages not decrypted
No authentication: Just parsing, no verification

Architecture Notes

Header Parsing Strategy

Headers are case-insensitive and may have folding:

Subject: This is a very long
 subject that continues
 on next line