summaryrefslogtreecommitdiff
path: root/Biz
diff options
context:
space:
mode:
Diffstat (limited to 'Biz')
-rw-r--r--Biz/PodcastItLater.md198
1 files changed, 198 insertions, 0 deletions
diff --git a/Biz/PodcastItLater.md b/Biz/PodcastItLater.md
new file mode 100644
index 0000000..bb65082
--- /dev/null
+++ b/Biz/PodcastItLater.md
@@ -0,0 +1,198 @@
+# PodcastItLater MVP Implementation Prompt
+
+You are implementing a two-service MVP system called "PodcastItLater" that converts web articles to podcast episodes via email submission. This follows a monorepo namespace structure where all files live under `Biz/PodcastItLater/`.
+
+## Code Organization & Structure
+- **Primary files**:
+ - `Biz/PodcastItLater/Web.py` - web service (ludic app, routes, webhook)
+ - `Biz/PodcastItLater/Worker.py` - background processor
+ - `Biz/PodcastItLater/Models.py` - database schema and data access
+- **Keep code in as few files as possible following monorepo conventions**
+- **Namespaces are always capitalized** (this is a Python project but follows the Haskell-style namespace hierarchy)
+
+## Technical Requirements
+
+### Core Libraries
+```python
+# Required dependencies
+import ludic # web framework (see provided docs)
+import trafilatura # content extraction
+import openai # tts api
+import boto3 # s3 uploads
+import feedgen # rss generation
+import sqlite3 # database
+import pydub # audio manipulation if needed
+```
+
+### Database Schema
+```sql
+-- Queue table for job processing
+CREATE TABLE IF NOT EXISTS queue (
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
+ url TEXT,
+ email TEXT,
+ status TEXT DEFAULT 'pending',
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+ error_message TEXT
+);
+
+-- Episodes table for completed podcasts
+CREATE TABLE IF NOT EXISTS episodes (
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
+ title TEXT NOT NULL,
+ content_length INTEGER,
+ audio_url TEXT NOT NULL,
+ duration INTEGER,
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+```
+
+## Service 1: Web Frontend (`Biz/PodcastItLater/Web.py`)
+
+### Responsibilities
+- Serve ludic + htmx web interface
+- Handle mailgun webhook for email submissions
+- Provide manual article submission form
+- Display processing queue status
+- Serve RSS podcast feed
+- Basic podcast player for testing
+
+### Required Routes
+```python
+@app.route("/")
+def index():
+ # Simple form to submit article URL
+ # Display recent episodes and queue status
+ # Use htmx for dynamic updates
+
+@app.route("/submit", methods=["POST"])
+def submit_article():
+ # Handle manual form submission
+ # Insert into queue table
+ # Return htmx response with status
+
+@app.route("/webhook/mailgun", methods=["POST"])
+def mailgun_webhook():
+ # Parse email, extract URLs from body
+ # Insert into queue table
+ # Verify webhook signature for security
+
+@app.route("/feed.xml")
+def rss_feed():
+ # Generate RSS from episodes table
+ # Use feedgen library
+
+@app.route("/status")
+def queue_status():
+ # HTMX endpoint for live queue updates
+ # Return current queue + recent episodes
+```
+
+### RSS Feed Metadata (hardcoded)
+```python
+RSS_CONFIG = {
+ "title": "Ben's Article Podcast",
+ "description": "Web articles converted to audio",
+ "author": "Ben Sima",
+ "language": "en-US",
+ "base_url": "https://your-domain.com" # configure via env var
+}
+```
+
+## Service 2: Background Worker (`Biz/PodcastItLater/Worker.py`)
+
+### Responsibilities
+- Poll queue table every 30 seconds
+- Extract article content using trafilatura
+- Convert text to speech via OpenAI TTS
+- Upload audio files to S3-compatible storage
+- Update episodes table with completed episodes
+- Handle errors with retry logic (3 attempts max)
+
+### Processing Pipeline
+```python
+def process_article(queue_item):
+ """Complete article processing pipeline"""
+ try:
+ # 1. Extract content with trafilatura
+ content = extract_article_content(queue_item.url)
+
+ # 2. Generate audio with OpenAI TTS
+ audio_file = text_to_speech(content)
+
+ # 3. Upload to S3
+ audio_url = upload_to_s3(audio_file)
+
+ # 4. Create episode record
+ create_episode(title, audio_url, duration)
+
+ # 5. Mark queue item as complete
+ mark_complete(queue_item.id)
+
+ except Exception as e:
+ handle_error(queue_item.id, str(e))
+```
+
+### Configuration via Environment Variables
+```python
+# Required environment variables
+OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
+S3_ENDPOINT = os.getenv("S3_ENDPOINT") # Digital Ocean Spaces
+S3_BUCKET = os.getenv("S3_BUCKET")
+S3_ACCESS_KEY = os.getenv("S3_ACCESS_KEY")
+S3_SECRET_KEY = os.getenv("S3_SECRET_KEY")
+MAILGUN_WEBHOOK_KEY = os.getenv("MAILGUN_WEBHOOK_KEY")
+```
+
+## Email Processing Logic
+- Parse email body for first HTTP/HTTPS URL found
+- If no URL found, treat entire email body as article content
+- Store original email in queue record for debugging
+
+## Error Handling Strategy
+- Log all errors but continue processing
+- Failed jobs marked with 'error' status and error message
+- Retry logic: 3 attempts with exponential backoff
+- Graceful degradation when external services fail
+
+## Audio Configuration
+- **Format**: MP3, 128kbps
+- **TTS Voice**: OpenAI default voice (can add voice selection later)
+- **File naming**: `episode_{timestamp}_{id}.mp3`
+
+## HTMX Frontend Behavior
+- Auto-refresh queue status every 30 seconds
+- Form submission without page reload
+- Simple progress indicators for processing jobs
+- Basic audio player for testing episodes
+
+## Testing Requirements
+
+Create tests covering:
+- Article content extraction accuracy
+- TTS API integration (with mocking)
+- S3 upload/download functionality
+- RSS feed generation and XML validation
+- Email webhook parsing and security
+- Database operations and data integrity
+- End-to-end submission workflow
+
+## Success Criteria
+The MVP should successfully:
+1. Receive article submissions via email webhook
+2. Extract clean article content
+3. Convert text to high-quality audio
+4. Store audio in S3-compatible storage
+5. Generate valid RSS podcast feed
+6. Provide basic web interface for monitoring
+7. Handle errors gracefully without crashing
+
+## Implementation Notes
+- Start with Web.py service first, then Worker.py
+- Use simple polling rather than complex job queues
+- Focus on reliability over performance for MVP
+- Keep total code under 300-400 lines
+- Use reasonable defaults everywhere possible
+- Prioritize working code over perfect code
+
+Implement this as a robust, deployable MVP that can handle real-world article processing workloads while maintaining simplicity.