# Omni Agent Infrastructure Plan **Status**: Draft **Author**: Ben (with AI assistance) **Date**: 2025-12-11 ## Vision A unified agent infrastructure supporting multiple specialized agents (coder, researcher, planner, telegram bot, etc.) with: - Shared tools, memory, and model backends - LoRA fine-tuning with model snapshots - Evals to prevent regression - Configurable LLM providers (local Ollama or OpenRouter) --- ## 0. Scope & Task Tracking **Building now**: Infrastructure and library primitives **First concrete agent**: Telegram Bot (validates the infrastructure) **Building later**: Researcher, Planner, and other agents ### Active Tasks (in dependency order) | Task ID | Title | Status | Blocks | |---------|-------|--------|--------| | t-247 | Provider Abstraction | Open | t-248, t-249, t-250 | | t-248 | Memory System | Open (blocked by t-247) | t-251 | | t-249 | Tool Registry | Open (blocked by t-247) | t-251 | | t-250 | Evals Framework | Open (blocked by t-247) | - | | t-251 | Telegram Bot Agent | Open (blocked by t-248, t-249) | - | Run `jr task show ` for full implementation details on each task. --- ## 1. Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────┐ │ Agent Layer │ ├──────────┬──────────┬──────────┬──────────┬────────────────────┤ │ Jr/Coder │Researcher│ Planner │ Telegram │ Future Agents... │ └────┬─────┴────┬─────┴────┬─────┴────┬─────┴────────────────────┘ │ │ │ │ ┌────▼──────────▼──────────▼──────────▼──────────────────────────┐ │ Omni.Agent.Core │ │ - Agent protocol (system prompt, tool execution loop) │ │ - Model backend abstraction (Ollama | OpenRouter | Amp) │ │ - Conversation/session management │ └────┬────────────────────────────────────────────────────────────┘ │ ┌────▼────────────────────────────────────────────────────────────┐ │ Shared Infrastructure │ ├─────────────────┬─────────────────┬─────────────────────────────┤ │ Omni.Agent.Tools│ Omni.Agent.Memory│ Omni.Agent.Evals │ │ - read_file │ - Vector DB │ - Regression tests │ │ - edit_file │ - Fact retrieval │ - Quality metrics │ │ - run_bash │ - Session history│ - Model comparison │ │ - search │ │ │ │ - web_search │ │ │ │ - (pluggable) │ │ │ ├─────────────────┴─────────────────┴─────────────────────────────┤ │ Omni.Agent.Training │ │ - LoRA fine-tuning orchestration │ │ - Model snapshotting │ │ - Training data collection │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## 2. Immediate Work Items ### 2.1 Add Amp Backend Support (--amp flag) **Problem**: Custom engine works but Amp is better for complex coding tasks. **Solution**: Add `--engine` flag to `jr work`: ```bash jr work # Uses native Engine (default) jr work --engine=amp # Uses Amp via subprocess jr work --engine=ollama # Uses local Ollama ``` **Implementation**: 1. Add `EngineBackend` type: `Native | Amp | Ollama Text` 2. Modify `Omni.Agent.Worker.start` to accept backend selection 3. For Amp: spawn `amp --prompt-file` subprocess, capture output 4. For Ollama: call local API instead of OpenRouter **Files to modify**: - `Omni/Jr.hs` - CLI parsing - `Omni/Agent/Worker.hs` - Backend dispatch - `Omni/Agent/Engine.hs` - Add Ollama provider ### 2.2 Abstract LLM Provider **Current state**: `Engine.hs` hardcodes OpenRouter. **Target state**: Pluggable `LLMProvider` interface. ```haskell -- Omni/Agent/Provider.hs data Provider = OpenRouter { apiKey :: Text, model :: Text } | Ollama { baseUrl :: Text, model :: Text } | AmpCLI { promptFile :: FilePath } chat :: Provider -> [Message] -> [Tool] -> IO (Either Text Message) ``` ### 2.3 Memory / Vector DB Integration **Purpose**: Long-term memory across agent sessions, shared across all agents, private per user. **Decision**: Use sqlite-vss for vector similarity search (not Omni.Fact - that's project-scoped, not user-scoped). **Key requirements**: - Cross-agent sharing: Telegram agent learns "Ben is an AI engineer" → Researcher agent recalls this - Multi-user: Each family member has private memories (identified by Telegram ID initially) - Embeddings via Ollama `/api/embeddings` endpoint with nomic-embed-text model See task t-248 for full implementation details. ### 2.4 Pluggable Tool System **Current**: `Omni.Agent.Tools` has 6 hardcoded tools. **Target**: Registry pattern allowing agents to declare their tool sets. ```haskell -- Each agent specifies its tools coderTools :: [Tool] coderTools = [readFileTool, writeFileTool, editFileTool, runBashTool, searchCodebaseTool] researcherTools :: [Tool] researcherTools = [webSearchTool, readWebPageTool, extractFactsTool, readFileTool] plannerTools :: [Tool] plannerTools = [taskCreateTool, taskListTool, taskUpdateTool, factQueryTool] telegramTools :: [Tool] telegramTools = [sendMessageTool, getUpdatesTool, factQueryTool] ``` --- ## 3. Agent Specifications ### 3.1 Jr/Coder (existing) **Purpose**: Autonomous coding agent for task completion. **Tools**: read_file, write_file, edit_file, run_bash, search_codebase, search_and_read **System prompt**: Task-focused, code conventions, test requirements. ### 3.2 Researcher (new) **Purpose**: Information gathering, analysis, summarization. **Tools**: - `web_search` - Search the web - `read_web_page` - Fetch and parse web content - `extract_facts` - Store learned facts in knowledge base - `read_file` - Read local documents - `query_facts` - Retrieve from knowledge base **System prompt**: Focus on accuracy, citation, verification. ### 3.3 Project Planner (new) **Purpose**: Break down high-level goals into actionable tasks. **Tools**: - `task_create` - Create new tasks - `task_list` - Query existing tasks - `task_update` - Modify task status/content - `fact_query` - Get project context - `dependency_graph` - Visualize task dependencies **System prompt**: Project management, task decomposition, dependency analysis. ### 3.4 Telegram Bot (FIRST AGENT TO BUILD) **Purpose**: Family assistant accessible via Telegram. First concrete agent to validate infrastructure. **Tools**: - `remember` - Store facts about the user (from Memory module) - `recall` - Query user's memories (from Memory module) - `web_search` - Answer questions requiring web lookup (from Registry) **System prompt**: Friendly, helpful, family-appropriate, concise for chat interface. **User identification**: Telegram user ID → creates/retrieves User record in memory.db See task t-251 for full implementation details. --- ## 4. Shared Infrastructure ### 4.1 Model Backend Configuration ```haskell -- ~/.config/omni/models.yaml or environment variables data ModelConfig = ModelConfig { defaultProvider :: Provider , modelOverrides :: Map Text Provider -- per-agent overrides } -- Example config: -- default_provider: openrouter -- openrouter: -- api_key: $OPENROUTER_API_KEY -- default_model: anthropic/claude-sonnet-4.5 -- ollama: -- base_url: http://localhost:11434 -- default_model: llama3.1:70b -- agents: -- telegram: { provider: ollama, model: llama3.1:8b } # cheaper for chat -- coder: { provider: openrouter, model: anthropic/claude-sonnet-4.5 } ``` ### 4.2 Evals Framework **Purpose**: Prevent regression when changing prompts, tools, or models. **Components**: 1. **Test Cases**: Known task + expected outcome pairs 2. **Runner**: Execute agent on test cases, capture results 3. **Scorer**: Compare results (exact match, semantic similarity, human eval) 4. **Dashboard**: Track scores over time **Implementation**: ```haskell -- Omni/Agent/Eval.hs data EvalCase = EvalCase { evalId :: Text , evalPrompt :: Text , evalExpectedBehavior :: Text -- or structured criteria , evalTools :: [Tool] } runEval :: AgentConfig -> EvalCase -> IO EvalResult ``` ### 4.3 Shared Memory System (Omni.Agent.Memory) **Critical requirement**: Cross-agent memory sharing with multi-user support. **Example**: User tells Telegram bot "I'm an AI engineer" → Research agent later searching for papers should recall this context. #### Why not Omni.Fact? Current `Omni.Fact` limitations: - Project-scoped, not user-scoped - No user/identity concept - No embeddings for semantic retrieval - Tied to task system #### Memory Design ```haskell -- Omni/Agent/Memory.hs -- | A memory is a piece of information about a user, learned by any agent data Memory = Memory { memoryId :: UUID , memoryUserId :: UserId -- Who this memory is about , memoryContent :: Text -- The actual information , memoryEmbedding :: Maybe Vector -- For semantic search , memorySource :: MemorySource -- Which agent learned this , memoryConfidence :: Double -- 0.0-1.0 , memoryCreatedAt :: UTCTime , memoryLastAccessedAt :: UTCTime -- For relevance decay , memoryTags :: [Text] -- Optional categorization } data MemorySource = MemorySource { sourceAgent :: Text -- "telegram", "researcher", "coder", etc. , sourceSession :: UUID -- Session ID where this was learned , sourceContext :: Text -- Brief context of how it was learned } data User = User { userId :: UUID , userTelegramId :: Maybe Int64 -- Primary identifier initially , userEmail :: Maybe Text -- Added later when email interface exists , userName :: Text -- Display name ("Ben", "Alice", etc.) , userCreatedAt :: UTCTime } -- Users are identified by Telegram ID initially -- The agent learns more about users over time and stores in memories -- e.g., "Ben is an AI engineer" becomes a memory, not a user field -- | Core operations storeMemory :: UserId -> Text -> MemorySource -> IO Memory recallMemories :: UserId -> Text -> Int -> IO [Memory] -- semantic search forgetMemory :: UUID -> IO () -- | Embedding integration (via Ollama or other provider) embedText :: Text -> IO Vector similaritySearch :: Vector -> [Memory] -> Int -> [Memory] ``` #### Multi-User Architecture ``` ┌─────────────────────────────────────────────────────────┐ │ Memory Store │ ├─────────────────────────────────────────────────────────┤ │ users table: │ │ id TEXT PRIMARY KEY │ │ name TEXT │ │ created_at TIMESTAMP │ ├─────────────────────────────────────────────────────────┤ │ memories table: │ │ id TEXT PRIMARY KEY │ │ user_id TEXT REFERENCES users(id) │ │ content TEXT │ │ embedding BLOB -- serialized float vector │ │ source_agent TEXT │ │ source_session TEXT │ │ source_context TEXT │ │ confidence REAL │ │ created_at TIMESTAMP │ │ last_accessed_at TIMESTAMP │ │ tags TEXT -- JSON array │ └─────────────────────────────────────────────────────────┘ ``` #### Memory Retrieval in Agent Loop When any agent runs, it: 1. Identifies the current user (from context/session) 2. Extracts key concepts from the user's request 3. Calls `recallMemories userId query 10` to get relevant memories 4. Injects memories into system prompt as context 5. After completion, extracts new learnings and calls `storeMemory` ```haskell -- In agent loop runAgentWithMemory :: UserId -> AgentConfig -> Text -> IO AgentResult runAgentWithMemory userId config prompt = do -- Recall relevant memories memories <- recallMemories userId prompt 10 let memoryContext = formatMemoriesForPrompt memories -- Inject into system prompt let enhancedPrompt = agentSystemPrompt config <> "\n\n## User Context\n" <> memoryContext -- Run agent result <- runAgent config { agentSystemPrompt = enhancedPrompt } prompt -- Extract and store new memories (could be done by the agent via tool) pure result ``` #### Memory Extraction Tool Agents can explicitly store memories: ```haskell storeMemoryTool :: Tool storeMemoryTool = Tool { toolName = "remember" , toolDescription = "Store a piece of information about the user for future reference" , toolExecute = \args -> do let content = args .: "content" tags = args .:? "tags" .!= [] memory <- storeMemory currentUserId content currentSource pure (toJSON memory) } ``` ### 4.4 LoRA Fine-tuning Service **Purpose**: Custom-tune models on successful task completions. **Workflow**: 1. Collect successful agent sessions (prompt + tool calls + result) 2. Format as training data (instruction, input, output) 3. Run LoRA training via Ollama or external service 4. Snapshot trained model with version tag 5. A/B test against base model via evals **Storage**: - Training data: `_/training//.jsonl` - Models: Ollama model registry with tags --- ## 5. Infrastructure Build Plan Focus: Library primitives first, agents later. ### Phase 1: Provider Abstraction (1-2 days) - [ ] Create `Omni.Agent.Provider` module with unified interface - [ ] Extract OpenRouter logic from `Engine.hs` - [ ] Add Ollama provider implementation - [ ] Add `--engine` flag to `jr work` - [ ] Test with local Llama model ### Phase 2: Amp Re-integration (1 day) - [ ] Add Amp subprocess backend to Provider - [ ] Handle Amp's streaming output - [ ] Parse Amp thread URL for linking ### Phase 3: Memory System (3-4 days) - [ ] Create `Omni.Agent.Memory` module (separate from Fact) - [ ] Design schema: users, memories tables - [ ] Implement `storeMemory`, `recallMemories`, `forgetMemory` - [ ] Add embedding support via Ollama `/api/embeddings` - [ ] Implement similarity search - [ ] Create `remember` tool for agents - [ ] Add `runAgentWithMemory` wrapper ### Phase 4: Tool Registry (1-2 days) - [ ] Create `Omni.Agent.Registry` for tool management - [ ] Define tool categories (coding, web, memory, task) - [ ] Allow agents to declare tool requirements - [ ] Add web tools (web_search, read_web_page) ### Phase 5: Evals Framework (2-3 days) - [ ] Create `Omni.Agent.Eval` module - [ ] Define `EvalCase` and `EvalResult` types - [ ] Build eval runner - [ ] Add scoring (exact match, semantic, custom) - [ ] Create initial eval suite for Jr/coder ### Phase 6: Telegram Bot Agent (3-4 days) **First concrete agent** - validates the infrastructure. - [ ] Create `Omni.Agent.Telegram` module - [ ] Telegram Bot API integration (getUpdates polling or webhook) - [ ] User identification via Telegram user ID - [ ] Auto-create user record on first message - [ ] Wire up memory system (recall on message, store learnings) - [ ] Basic conversation loop with LLM - [ ] Deploy as background service - [ ] Add `jr telegram` command for manual start **Tools for Telegram agent:** - `remember` - store facts about user - `recall` - query user's memories - `web_search` - answer questions (optional, phase 4) ### Phase 7: Training Data Collection (1-2 days) - [ ] Add session export to training format - [ ] Store successful completions in `_/training/` - [ ] Create `jr train export` command ### (Future) Additional Agents - Researcher agent - Planner agent - Email interface (links to Telegram user identity) - Others... --- ## 6. Design Decisions | Question | Decision | |----------|----------| | Vector DB | **sqlite-vss** - SQLite extension for vector similarity | | User identity | **Telegram ID** initially, link to email later when adding email interface | | Memory privacy | **Cross-agent shared, per-user private** - all agents see all memories for a user, but users can't see each other's memories | | Amp integration | TBD - subprocess likely | | Memory decay | TBD - probably keep forever with relevance scoring | | LoRA training | TBD - local Ollama or cloud | --- ## 7. File Structure (Proposed) ``` Omni/Agent/ ├── Core.hs # Base agent types, Worker state (existing) ├── Engine.hs # Agent loop, tool execution (existing) ├── Provider.hs # LLM provider abstraction (NEW) ├── Provider/ │ ├── OpenRouter.hs # Extracted from Engine.hs │ ├── Ollama.hs # Local model support │ └── Amp.hs # Amp CLI subprocess ├── Memory.hs # Shared memory system (NEW) ├── Memory/ │ └── Embedding.hs # Vector operations, Ollama embeddings ├── Tools.hs # Core coding tools (existing) ├── Tools/ │ ├── Web.hs # web_search, read_web_page (NEW) │ └── Memory.hs # remember, recall tools (NEW) ├── Eval.hs # Evaluation framework (NEW) ├── Training.hs # Training data collection (NEW) ├── Worker.hs # Jr worker loop (existing) ├── Git.hs # Git operations (existing) ├── Log.hs # Logging utilities (existing) ├── Event.hs # Event types (existing) ├── DESIGN.md # Current design doc └── PLAN.md # This document ``` --- ## 8. Database Schema Additions ```sql -- Memory system tables (new database: memory.db) CREATE TABLE users ( id TEXT PRIMARY KEY, -- UUID telegram_id INTEGER UNIQUE, -- Telegram user ID (primary identifier) email TEXT UNIQUE, -- Added later for email interface name TEXT NOT NULL, -- Display name created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE memories ( id TEXT PRIMARY KEY, -- UUID user_id TEXT NOT NULL REFERENCES users(id), content TEXT NOT NULL, embedding BLOB, -- float32 vector for sqlite-vss source_agent TEXT NOT NULL, -- "telegram", "coder", etc. source_session TEXT, -- Session UUID source_context TEXT, -- How this was learned confidence REAL DEFAULT 0.8, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, last_accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, tags TEXT -- JSON array ); -- sqlite-vss virtual table for vector similarity search CREATE VIRTUAL TABLE memories_vss USING vss0(embedding(1536)); CREATE INDEX idx_memories_user ON memories(user_id); CREATE INDEX idx_memories_agent ON memories(source_agent); ``` --- ## 9. Key Code References for Implementers When implementing tasks, refer to these existing patterns: ### Existing Agent Infrastructure | File | Purpose | Key Functions/Types | |------|---------|---------------------| | `Omni/Agent/Engine.hs` | Agent loop, LLM calls | `runAgent`, `chat`, `Tool`, `LLM`, `AgentConfig` | | `Omni/Agent/Tools.hs` | Tool implementations | `readFileTool`, `editFileTool`, `runBashTool`, `allTools` | | `Omni/Agent/Worker.hs` | Jr worker loop | `start`, `runWithEngine`, `buildFullPrompt` | | `Omni/Agent/Core.hs` | Worker state types | `Worker`, `WorkerStatus` | ### Database Patterns (follow these) | File | Purpose | Key Patterns | |------|---------|--------------| | `Omni/Task/Core.hs` | SQLite usage | `withDb`, schema migrations, ToRow/FromRow instances | | `Omni/Fact.hs` | CRUD operations | `createFact`, `getFact`, `getAllFacts` | ### CLI Patterns | File | Purpose | Key Patterns | |------|---------|--------------| | `Omni/Jr.hs` | Main CLI entry | Docopt usage, command dispatch in `move` function | | `Omni/Cli.hs` | CLI helpers | `Cli.Plan`, `Cli.has`, `Cli.getArg` | ### HTTP Patterns | File | Purpose | Key Patterns | |------|---------|--------------| | `Omni/Agent/Engine.hs` lines 560-594 | HTTP POST to LLM API | `http-conduit` usage, JSON encoding | ### Build System - Build: `bild Omni/Agent/NewModule.hs` - Test: `bild --test Omni/Agent/NewModule.hs` - Dependencies: Add to module header comments (`: dep package-name`) --- ## 10. Next Steps Execute tasks in order: 1. **t-247** Provider Abstraction (unblocked, start here) 2. **t-248** Memory System (after t-247) 3. **t-249** Tool Registry (after t-247, can parallel with t-248) 4. **t-250** Evals Framework (after t-247) 5. **t-251** Telegram Bot Agent (after t-248 + t-249) Run `jr task ready` to see what's available to work on.