# PodcastItLater MVP Implementation Prompt You are implementing a two-service MVP system called "PodcastItLater" that converts web articles to podcast episodes via email submission. This follows a monorepo namespace structure where all files live under `Biz/PodcastItLater/`. ## Code Organization & Structure - **Primary files**: - `Biz/PodcastItLater/Web.py` - web service (ludic app, routes, webhook) - `Biz/PodcastItLater/Worker.py` - background processor - `Biz/PodcastItLater/Models.py` - database schema and data access - **Keep code in as few files as possible following monorepo conventions** - **Namespaces are always capitalized** (this is a Python project but follows the Haskell-style namespace hierarchy) ## Technical Requirements ### Core Libraries ```python # Required dependencies import ludic # web framework (see provided docs) import trafilatura # content extraction import openai # tts api import boto3 # s3 uploads import feedgen # rss generation import sqlite3 # database import pydub # audio manipulation if needed ``` ### Database Schema ```sql -- Queue table for job processing CREATE TABLE IF NOT EXISTS queue ( id INTEGER PRIMARY KEY AUTOINCREMENT, url TEXT, email TEXT, status TEXT DEFAULT 'pending', created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, error_message TEXT ); -- Episodes table for completed podcasts CREATE TABLE IF NOT EXISTS episodes ( id INTEGER PRIMARY KEY AUTOINCREMENT, title TEXT NOT NULL, content_length INTEGER, audio_url TEXT NOT NULL, duration INTEGER, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); ``` ## Service 1: Web Frontend (`Biz/PodcastItLater/Web.py`) ### Responsibilities - Serve ludic + htmx web interface - Handle mailgun webhook for email submissions - Provide manual article submission form - Display processing queue status - Serve RSS podcast feed - Basic podcast player for testing ### Required Routes ```python @app.route("/") def index(): # Simple form to submit article URL # Display recent episodes and queue status # Use htmx for dynamic updates @app.route("/submit", methods=["POST"]) def submit_article(): # Handle manual form submission # Insert into queue table # Return htmx response with status @app.route("/webhook/mailgun", methods=["POST"]) def mailgun_webhook(): # Parse email, extract URLs from body # Insert into queue table # Verify webhook signature for security @app.route("/feed.xml") def rss_feed(): # Generate RSS from episodes table # Use feedgen library @app.route("/status") def queue_status(): # HTMX endpoint for live queue updates # Return current queue + recent episodes ``` ### RSS Feed Metadata (hardcoded) ```python RSS_CONFIG = { "title": "Ben's Article Podcast", "description": "Web articles converted to audio", "author": "Ben Sima", "language": "en-US", "base_url": "https://your-domain.com" # configure via env var } ``` ## Service 2: Background Worker (`Biz/PodcastItLater/Worker.py`) ### Responsibilities - Poll queue table every 30 seconds - Extract article content using trafilatura - Convert text to speech via OpenAI TTS - Upload audio files to S3-compatible storage - Update episodes table with completed episodes - Handle errors with retry logic (3 attempts max) ### Processing Pipeline ```python def process_article(queue_item): """Complete article processing pipeline""" try: # 1. Extract content with trafilatura content = extract_article_content(queue_item.url) # 2. Generate audio with OpenAI TTS audio_file = text_to_speech(content) # 3. Upload to S3 audio_url = upload_to_s3(audio_file) # 4. Create episode record create_episode(title, audio_url, duration) # 5. Mark queue item as complete mark_complete(queue_item.id) except Exception as e: handle_error(queue_item.id, str(e)) ``` ### Configuration via Environment Variables ```python # Required environment variables OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") S3_ENDPOINT = os.getenv("S3_ENDPOINT") # Digital Ocean Spaces S3_BUCKET = os.getenv("S3_BUCKET") S3_ACCESS_KEY = os.getenv("S3_ACCESS_KEY") S3_SECRET_KEY = os.getenv("S3_SECRET_KEY") MAILGUN_WEBHOOK_KEY = os.getenv("MAILGUN_WEBHOOK_KEY") ``` ## Email Processing Logic - Parse email body for first HTTP/HTTPS URL found - If no URL found, treat entire email body as article content - Store original email in queue record for debugging ## Error Handling Strategy - Log all errors but continue processing - Failed jobs marked with 'error' status and error message - Retry logic: 3 attempts with exponential backoff - Graceful degradation when external services fail ## Audio Configuration - **Format**: MP3, 128kbps - **TTS Voice**: OpenAI default voice (can add voice selection later) - **File naming**: `episode_{timestamp}_{id}.mp3` ## HTMX Frontend Behavior - Auto-refresh queue status every 30 seconds - Form submission without page reload - Simple progress indicators for processing jobs - Basic audio player for testing episodes ## Testing Requirements Create tests covering: - Article content extraction accuracy - TTS API integration (with mocking) - S3 upload/download functionality - RSS feed generation and XML validation - Email webhook parsing and security - Database operations and data integrity - End-to-end submission workflow ## Success Criteria The MVP should successfully: 1. Receive article submissions via email webhook 2. Extract clean article content 3. Convert text to high-quality audio 4. Store audio in S3-compatible storage 5. Generate valid RSS podcast feed 6. Provide basic web interface for monitoring 7. Handle errors gracefully without crashing ## Implementation Notes - Start with Web.py service first, then Worker.py - Use simple polling rather than complex job queues - Focus on reliability over performance for MVP - Keep total code under 300-400 lines - Use reasonable defaults everywhere possible - Prioritize working code over perfect code Implement this as a robust, deployable MVP that can handle real-world article processing workloads while maintaining simplicity.