# PodcastItLater MVP Implementation Prompt

You are implementing a two-service MVP system called "PodcastItLater" that converts web articles to podcast episodes via email submission. This follows a monorepo namespace structure where all files live under `Biz/PodcastItLater/`.

## Code Organization & Structure
- **Primary files**:
  - `Biz/PodcastItLater/Web.py` - web service (ludic app, routes, webhook)
  - `Biz/PodcastItLater/Worker.py` - background processor
  - `Biz/PodcastItLater/Models.py` - database schema and data access
- **Keep code in as few files as possible following monorepo conventions**
- **Namespaces are always capitalized** (this is a Python project but follows the Haskell-style namespace hierarchy)

## Technical Requirements

### Core Libraries
```python
# Required dependencies
import ludic  # web framework (see provided docs)
import trafilatura  # content extraction
import openai  # tts api
import boto3  # s3 uploads
import feedgen  # rss generation
import sqlite3  # database
import pydub  # audio manipulation if needed
```

### Database Schema
```sql
-- Queue table for job processing
CREATE TABLE IF NOT EXISTS queue (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    url TEXT,
    email TEXT,
    status TEXT DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    error_message TEXT
);

-- Episodes table for completed podcasts
CREATE TABLE IF NOT EXISTS episodes (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    title TEXT NOT NULL,
    content_length INTEGER,
    audio_url TEXT NOT NULL,
    duration INTEGER,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```

## Service 1: Web Frontend (`Biz/PodcastItLater/Web.py`)

### Responsibilities
- Serve ludic + htmx web interface
- Handle mailgun webhook for email submissions
- Provide manual article submission form
- Display processing queue status
- Serve RSS podcast feed
- Basic podcast player for testing

### Required Routes
```python
@app.route("/")
def index():
    # Simple form to submit article URL
    # Display recent episodes and queue status
    # Use htmx for dynamic updates

@app.route("/submit", methods=["POST"])
def submit_article():
    # Handle manual form submission
    # Insert into queue table
    # Return htmx response with status

@app.route("/webhook/mailgun", methods=["POST"])
def mailgun_webhook():
    # Parse email, extract URLs from body
    # Insert into queue table
    # Verify webhook signature for security

@app.route("/feed.xml")
def rss_feed():
    # Generate RSS from episodes table
    # Use feedgen library

@app.route("/status")
def queue_status():
    # HTMX endpoint for live queue updates
    # Return current queue + recent episodes
```

### RSS Feed Metadata (hardcoded)
```python
RSS_CONFIG = {
    "title": "Ben's Article Podcast",
    "description": "Web articles converted to audio",
    "author": "Ben Sima",
    "language": "en-US",
    "base_url": "https://your-domain.com"  # configure via env var
}
```

## Service 2: Background Worker (`Biz/PodcastItLater/Worker.py`)

### Responsibilities
- Poll queue table every 30 seconds
- Extract article content using trafilatura
- Convert text to speech via OpenAI TTS
- Upload audio files to S3-compatible storage
- Update episodes table with completed episodes
- Handle errors with retry logic (3 attempts max)

### Processing Pipeline
```python
def process_article(queue_item):
    """Complete article processing pipeline"""
    try:
        # 1. Extract content with trafilatura
        content = extract_article_content(queue_item.url)

        # 2. Generate audio with OpenAI TTS
        audio_file = text_to_speech(content)

        # 3. Upload to S3
        audio_url = upload_to_s3(audio_file)

        # 4. Create episode record
        create_episode(title, audio_url, duration)

        # 5. Mark queue item as complete
        mark_complete(queue_item.id)

    except Exception as e:
        handle_error(queue_item.id, str(e))
```

### Configuration via Environment Variables
```python
# Required environment variables
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
S3_ENDPOINT = os.getenv("S3_ENDPOINT")  # Digital Ocean Spaces
S3_BUCKET = os.getenv("S3_BUCKET")
S3_ACCESS_KEY = os.getenv("S3_ACCESS_KEY")
S3_SECRET_KEY = os.getenv("S3_SECRET_KEY")
MAILGUN_WEBHOOK_KEY = os.getenv("MAILGUN_WEBHOOK_KEY")
```

## Email Processing Logic
- Parse email body for first HTTP/HTTPS URL found
- If no URL found, treat entire email body as article content
- Store original email in queue record for debugging

## Error Handling Strategy
- Log all errors but continue processing
- Failed jobs marked with 'error' status and error message
- Retry logic: 3 attempts with exponential backoff
- Graceful degradation when external services fail

## Audio Configuration
- **Format**: MP3, 128kbps
- **TTS Voice**: OpenAI default voice (can add voice selection later)
- **File naming**: `episode_{timestamp}_{id}.mp3`

## HTMX Frontend Behavior
- Auto-refresh queue status every 30 seconds
- Form submission without page reload
- Simple progress indicators for processing jobs
- Basic audio player for testing episodes

## Testing Requirements

Create tests covering:
- Article content extraction accuracy
- TTS API integration (with mocking)
- S3 upload/download functionality
- RSS feed generation and XML validation
- Email webhook parsing and security
- Database operations and data integrity
- End-to-end submission workflow

## Success Criteria
The MVP should successfully:
1. Receive article submissions via email webhook
2. Extract clean article content
3. Convert text to high-quality audio
4. Store audio in S3-compatible storage
5. Generate valid RSS podcast feed
6. Provide basic web interface for monitoring
7. Handle errors gracefully without crashing

## Implementation Notes
- Start with Web.py service first, then Worker.py
- Use simple polling rather than complex job queues
- Focus on reliability over performance for MVP
- Keep total code under 300-400 lines
- Use reasonable defaults everywhere possible
- Prioritize working code over perfect code

Implement this as a robust, deployable MVP that can handle real-world article processing workloads while maintaining simplicity.