1 files changed, 198 insertions, 0 deletions
diff --git a/Biz/PodcastItLater.md b/Biz/PodcastItLater.md
new file mode 100644
index 0000000..bb65082
--- /dev/null
+++ b/Biz/PodcastItLater.md
@@ -0,0 +1,198 @@
+# PodcastItLater MVP Implementation Prompt
+
+You are implementing a two-service MVP system called "PodcastItLater" that converts web articles to podcast episodes via email submission. This follows a monorepo namespace structure where all files live under `Biz/PodcastItLater/`.
+
+## Code Organization & Structure
+- **Primary files**:
+  - `Biz/PodcastItLater/Web.py` - web service (ludic app, routes, webhook)
+  - `Biz/PodcastItLater/Worker.py` - background processor
+  - `Biz/PodcastItLater/Models.py` - database schema and data access
+- **Keep code in as few files as possible following monorepo conventions**
+- **Namespaces are always capitalized** (this is a Python project but follows the Haskell-style namespace hierarchy)
+
+## Technical Requirements
+
+### Core Libraries
+```python
+# Required dependencies
+import ludic  # web framework (see provided docs)
+import trafilatura  # content extraction
+import openai  # tts api
+import boto3  # s3 uploads
+import feedgen  # rss generation
+import sqlite3  # database
+import pydub  # audio manipulation if needed
+```
+
+### Database Schema
+```sql
+-- Queue table for job processing
+CREATE TABLE IF NOT EXISTS queue (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    url TEXT,
+    email TEXT,
+    status TEXT DEFAULT 'pending',
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    error_message TEXT
+);
+
+-- Episodes table for completed podcasts
+CREATE TABLE IF NOT EXISTS episodes (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    title TEXT NOT NULL,
+    content_length INTEGER,
+    audio_url TEXT NOT NULL,
+    duration INTEGER,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+```
+
+## Service 1: Web Frontend (`Biz/PodcastItLater/Web.py`)
+
+### Responsibilities
+- Serve ludic + htmx web interface
+- Handle mailgun webhook for email submissions
+- Provide manual article submission form
+- Display processing queue status
+- Serve RSS podcast feed
+- Basic podcast player for testing
+
+### Required Routes
+```python
+@app.route("/")
+def index():
+    # Simple form to submit article URL
+    # Display recent episodes and queue status
+    # Use htmx for dynamic updates
+
+@app.route("/submit", methods=["POST"])
+def submit_article():
+    # Handle manual form submission
+    # Insert into queue table
+    # Return htmx response with status
+
+@app.route("/webhook/mailgun", methods=["POST"])
+def mailgun_webhook():
+    # Parse email, extract URLs from body
+    # Insert into queue table
+    # Verify webhook signature for security
+
+@app.route("/feed.xml")
+def rss_feed():
+    # Generate RSS from episodes table
+    # Use feedgen library
+
+@app.route("/status")
+def queue_status():
+    # HTMX endpoint for live queue updates
+    # Return current queue + recent episodes
+```
+
+### RSS Feed Metadata (hardcoded)
+```python
+RSS_CONFIG = {
+    "title": "Ben's Article Podcast",
+    "description": "Web articles converted to audio",
+    "author": "Ben Sima",
+    "language": "en-US",
+    "base_url": "https://your-domain.com"  # configure via env var
+}
+```
+
+## Service 2: Background Worker (`Biz/PodcastItLater/Worker.py`)
+
+### Responsibilities
+- Poll queue table every 30 seconds
+- Extract article content using trafilatura
+- Convert text to speech via OpenAI TTS
+- Upload audio files to S3-compatible storage
+- Update episodes table with completed episodes
+- Handle errors with retry logic (3 attempts max)
+
+### Processing Pipeline
+```python
+def process_article(queue_item):
+    """Complete article processing pipeline"""
+    try:
+        # 1. Extract content with trafilatura
+        content = extract_article_content(queue_item.url)
+
+        # 2. Generate audio with OpenAI TTS
+        audio_file = text_to_speech(content)
+
+        # 3. Upload to S3
+        audio_url = upload_to_s3(audio_file)
+
+        # 4. Create episode record
+        create_episode(title, audio_url, duration)
+
+        # 5. Mark queue item as complete
+        mark_complete(queue_item.id)
+
+    except Exception as e:
+        handle_error(queue_item.id, str(e))
+```
+
+### Configuration via Environment Variables
+```python
+# Required environment variables
+OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
+S3_ENDPOINT = os.getenv("S3_ENDPOINT")  # Digital Ocean Spaces
+S3_BUCKET = os.getenv("S3_BUCKET")
+S3_ACCESS_KEY = os.getenv("S3_ACCESS_KEY")
+S3_SECRET_KEY = os.getenv("S3_SECRET_KEY")
+MAILGUN_WEBHOOK_KEY = os.getenv("MAILGUN_WEBHOOK_KEY")
+```
+
+## Email Processing Logic
+- Parse email body for first HTTP/HTTPS URL found
+- If no URL found, treat entire email body as article content
+- Store original email in queue record for debugging
+
+## Error Handling Strategy
+- Log all errors but continue processing
+- Failed jobs marked with 'error' status and error message
+- Retry logic: 3 attempts with exponential backoff
+- Graceful degradation when external services fail
+
+## Audio Configuration
+- **Format**: MP3, 128kbps
+- **TTS Voice**: OpenAI default voice (can add voice selection later)
+- **File naming**: `episode_{timestamp}_{id}.mp3`
+
+## HTMX Frontend Behavior
+- Auto-refresh queue status every 30 seconds
+- Form submission without page reload
+- Simple progress indicators for processing jobs
+- Basic audio player for testing episodes
+
+## Testing Requirements
+
+Create tests covering:
+- Article content extraction accuracy
+- TTS API integration (with mocking)
+- S3 upload/download functionality
+- RSS feed generation and XML validation
+- Email webhook parsing and security
+- Database operations and data integrity
+- End-to-end submission workflow
+
+## Success Criteria
+The MVP should successfully:
+1. Receive article submissions via email webhook
+2. Extract clean article content
+3. Convert text to high-quality audio
+4. Store audio in S3-compatible storage
+5. Generate valid RSS podcast feed
+6. Provide basic web interface for monitoring
+7. Handle errors gracefully without crashing
+
+## Implementation Notes
+- Start with Web.py service first, then Worker.py
+- Use simple polling rather than complex job queues
+- Focus on reliability over performance for MVP
+- Keep total code under 300-400 lines
+- Use reasonable defaults everywhere possible
+- Prioritize working code over perfect code
+
+Implement this as a robust, deployable MVP that can handle real-world article processing workloads while maintaining simplicity.