summaryrefslogtreecommitdiff
path: root/Omni/Agent/Telegram/Media.hs
AgeCommit message (Collapse)Author
6 daysfix: use OpenAI Whisper for voice transcriptionBen Sima
OpenRouter's chat completion API doesn't properly pass audio to models. Switched to calling OpenAI's /v1/audio/transcriptions endpoint directly with the whisper-1 model. Requires OPENAI_API_KEY environment variable.
6 daystelegram: fix audio transcription model and prompt orderBen Sima
- Switch from gemini-2.0-flash-001 to gemini-2.5-flash - Put audio content before text prompt (model was ignoring audio) - Strengthen prompt to return only transcription
7 daystelegram: fix parsing, add webpage reader, use geminiBen Sima
- Fix Provider.hs to strip leading whitespace from OpenRouter responses - Fix FunctionCall parser to handle missing 'arguments' field - Use eitherDecode for better error messages on parse failures - Switch to claude-sonnet-4.5 for main agent - Use gemini-2.0-flash for conversation summarization (cheaper) - Add read_webpage tool for fetching and summarizing URLs - Add tagsoup to Haskell deps (unused, kept for future)
7 daystelegram bot: refactor + multimedia + reply supportBen Sima
Refactor Telegram.hs into submodules to reduce file size: - Types.hs: data types, JSON parsing - Media.hs: file downloads, image/voice analysis - Reminders.hs: reminder loop, user chat persistence Multimedia improvements: - Vision uses third-person to avoid LLM confusion - Better message framing for embedded descriptions - Size validation (10MB images, 20MB voice) - MIME type validation for voice messages New features: - Reply support: bot sees context when users reply - Web search: default 5->10, max 10->20 results - Guardrails: duplicate tool limit 3->10 for research - Timezone: todos parse/display in Eastern time (ET)