Add Omni/Agent/PLAN.md - agent infrastructure roadmap

Defines architecture for multi-agent system with: - Provider abstraction (OpenRouter, Ollama, Amp backends) - Shared memory system (sqlite-vss, multi-user, cross-agent) - Tool registry for pluggable tool sets - Evals framework for regression testing - Telegram bot as first concrete agent Tasks: t-247 through t-251
author: Ben Sima <ben@bensima.com> 2025-12-11 19:15:33 -0500
committer: Ben Sima <ben@bensima.com> 2025-12-11 19:15:33 -0500
commit: 225e5b7a24f0b30f6de1bd7418bf834ad345b0f3 (patch)
tree: 50228e177bc5e4e04f486dea60329210e2653f22
parent: b60fc6f95e68c8581e2cec48f8d99e7c467a1db2 (diff)
1 files changed, 589 insertions, 0 deletions
diff --git a/Omni/Agent/PLAN.md b/Omni/Agent/PLAN.md
new file mode 100644
index 0000000..e51d09b
--- /dev/null
+++ b/Omni/Agent/PLAN.md
@@ -0,0 +1,589 @@
+# Omni Agent Infrastructure Plan
+
+**Status**: Draft  
+**Author**: Ben (with AI assistance)  
+**Date**: 2025-12-11
+
+## Vision
+
+A unified agent infrastructure supporting multiple specialized agents (coder, researcher, planner, telegram bot, etc.) with:
+- Shared tools, memory, and model backends
+- LoRA fine-tuning with model snapshots
+- Evals to prevent regression
+- Configurable LLM providers (local Ollama or OpenRouter)
+
+---
+
+## 0. Scope & Task Tracking
+
+**Building now**: Infrastructure and library primitives  
+**First concrete agent**: Telegram Bot (validates the infrastructure)
+**Building later**: Researcher, Planner, and other agents
+
+### Active Tasks (in dependency order)
+
+| Task ID | Title | Status | Blocks |
+|---------|-------|--------|--------|
+| t-247 | Provider Abstraction | Open | t-248, t-249, t-250 |
+| t-248 | Memory System | Open (blocked by t-247) | t-251 |
+| t-249 | Tool Registry | Open (blocked by t-247) | t-251 |
+| t-250 | Evals Framework | Open (blocked by t-247) | - |
+| t-251 | Telegram Bot Agent | Open (blocked by t-248, t-249) | - |
+
+Run `jr task show <id>` for full implementation details on each task.
+
+---
+
+## 1. Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         Agent Layer                              │
+├──────────┬──────────┬──────────┬──────────┬────────────────────┤
+│ Jr/Coder │Researcher│ Planner  │ Telegram │ Future Agents...   │
+└────┬─────┴────┬─────┴────┬─────┴────┬─────┴────────────────────┘
+     │          │          │          │
+┌────▼──────────▼──────────▼──────────▼──────────────────────────┐
+│                    Omni.Agent.Core                              │
+│  - Agent protocol (system prompt, tool execution loop)          │
+│  - Model backend abstraction (Ollama | OpenRouter | Amp)        │
+│  - Conversation/session management                              │
+└────┬────────────────────────────────────────────────────────────┘
+     │
+┌────▼────────────────────────────────────────────────────────────┐
+│                    Shared Infrastructure                         │
+├─────────────────┬─────────────────┬─────────────────────────────┤
+│ Omni.Agent.Tools│ Omni.Agent.Memory│ Omni.Agent.Evals           │
+│ - read_file     │ - Vector DB      │ - Regression tests         │
+│ - edit_file     │ - Fact retrieval │ - Quality metrics          │
+│ - run_bash      │ - Session history│ - Model comparison         │
+│ - search        │                  │                             │
+│ - web_search    │                  │                             │
+│ - (pluggable)   │                  │                             │
+├─────────────────┴─────────────────┴─────────────────────────────┤
+│                    Omni.Agent.Training                           │
+│  - LoRA fine-tuning orchestration                               │
+│  - Model snapshotting                                           │
+│  - Training data collection                                     │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 2. Immediate Work Items
+
+### 2.1 Add Amp Backend Support (--amp flag)
+
+**Problem**: Custom engine works but Amp is better for complex coding tasks.
+
+**Solution**: Add `--engine` flag to `jr work`:
+
+```bash
+jr work <task-id>                    # Uses native Engine (default)
+jr work <task-id> --engine=amp       # Uses Amp via subprocess
+jr work <task-id> --engine=ollama    # Uses local Ollama
+```
+
+**Implementation**:
+1. Add `EngineBackend` type: `Native | Amp | Ollama Text`
+2. Modify `Omni.Agent.Worker.start` to accept backend selection
+3. For Amp: spawn `amp --prompt-file` subprocess, capture output
+4. For Ollama: call local API instead of OpenRouter
+
+**Files to modify**:
+- `Omni/Jr.hs` - CLI parsing
+- `Omni/Agent/Worker.hs` - Backend dispatch
+- `Omni/Agent/Engine.hs` - Add Ollama provider
+
+### 2.2 Abstract LLM Provider
+
+**Current state**: `Engine.hs` hardcodes OpenRouter.
+
+**Target state**: Pluggable `LLMProvider` interface.
+
+```haskell
+-- Omni/Agent/Provider.hs
+data Provider
+  = OpenRouter { apiKey :: Text, model :: Text }
+  | Ollama { baseUrl :: Text, model :: Text }
+  | AmpCLI { promptFile :: FilePath }
+
+chat :: Provider -> [Message] -> [Tool] -> IO (Either Text Message)
+```
+
+### 2.3 Memory / Vector DB Integration
+
+**Purpose**: Long-term memory across agent sessions, shared across all agents, private per user.
+
+**Decision**: Use sqlite-vss for vector similarity search (not Omni.Fact - that's project-scoped, not user-scoped).
+
+**Key requirements**:
+- Cross-agent sharing: Telegram agent learns "Ben is an AI engineer" → Researcher agent recalls this
+- Multi-user: Each family member has private memories (identified by Telegram ID initially)
+- Embeddings via Ollama `/api/embeddings` endpoint with nomic-embed-text model
+
+See task t-248 for full implementation details.
+
+### 2.4 Pluggable Tool System
+
+**Current**: `Omni.Agent.Tools` has 6 hardcoded tools.
+
+**Target**: Registry pattern allowing agents to declare their tool sets.
+
+```haskell
+-- Each agent specifies its tools
+coderTools :: [Tool]
+coderTools = [readFileTool, writeFileTool, editFileTool, runBashTool, searchCodebaseTool]
+
+researcherTools :: [Tool]  
+researcherTools = [webSearchTool, readWebPageTool, extractFactsTool, readFileTool]
+
+plannerTools :: [Tool]
+plannerTools = [taskCreateTool, taskListTool, taskUpdateTool, factQueryTool]
+
+telegramTools :: [Tool]
+telegramTools = [sendMessageTool, getUpdatesTool, factQueryTool]
+```
+
+---
+
+## 3. Agent Specifications
+
+### 3.1 Jr/Coder (existing)
+
+**Purpose**: Autonomous coding agent for task completion.
+
+**Tools**: read_file, write_file, edit_file, run_bash, search_codebase, search_and_read
+
+**System prompt**: Task-focused, code conventions, test requirements.
+
+### 3.2 Researcher (new)
+
+**Purpose**: Information gathering, analysis, summarization.
+
+**Tools**: 
+- `web_search` - Search the web
+- `read_web_page` - Fetch and parse web content  
+- `extract_facts` - Store learned facts in knowledge base
+- `read_file` - Read local documents
+- `query_facts` - Retrieve from knowledge base
+
+**System prompt**: Focus on accuracy, citation, verification.
+
+### 3.3 Project Planner (new)
+
+**Purpose**: Break down high-level goals into actionable tasks.
+
+**Tools**:
+- `task_create` - Create new tasks
+- `task_list` - Query existing tasks
+- `task_update` - Modify task status/content
+- `fact_query` - Get project context
+- `dependency_graph` - Visualize task dependencies
+
+**System prompt**: Project management, task decomposition, dependency analysis.
+
+### 3.4 Telegram Bot (FIRST AGENT TO BUILD)
+
+**Purpose**: Family assistant accessible via Telegram. First concrete agent to validate infrastructure.
+
+**Tools**:
+- `remember` - Store facts about the user (from Memory module)
+- `recall` - Query user's memories (from Memory module)
+- `web_search` - Answer questions requiring web lookup (from Registry)
+
+**System prompt**: Friendly, helpful, family-appropriate, concise for chat interface.
+
+**User identification**: Telegram user ID → creates/retrieves User record in memory.db
+
+See task t-251 for full implementation details.
+
+---
+
+## 4. Shared Infrastructure
+
+### 4.1 Model Backend Configuration
+
+```haskell
+-- ~/.config/omni/models.yaml or environment variables
+data ModelConfig = ModelConfig
+  { defaultProvider :: Provider
+  , modelOverrides :: Map Text Provider  -- per-agent overrides
+  }
+
+-- Example config:
+-- default_provider: openrouter
+-- openrouter:
+--   api_key: $OPENROUTER_API_KEY
+--   default_model: anthropic/claude-sonnet-4.5
+-- ollama:
+--   base_url: http://localhost:11434
+--   default_model: llama3.1:70b
+-- agents:
+--   telegram: { provider: ollama, model: llama3.1:8b }  # cheaper for chat
+--   coder: { provider: openrouter, model: anthropic/claude-sonnet-4.5 }
+```
+
+### 4.2 Evals Framework
+
+**Purpose**: Prevent regression when changing prompts, tools, or models.
+
+**Components**:
+1. **Test Cases**: Known task + expected outcome pairs
+2. **Runner**: Execute agent on test cases, capture results
+3. **Scorer**: Compare results (exact match, semantic similarity, human eval)
+4. **Dashboard**: Track scores over time
+
+**Implementation**:
+```haskell
+-- Omni/Agent/Eval.hs
+data EvalCase = EvalCase
+  { evalId :: Text
+  , evalPrompt :: Text
+  , evalExpectedBehavior :: Text  -- or structured criteria
+  , evalTools :: [Tool]
+  }
+
+runEval :: AgentConfig -> EvalCase -> IO EvalResult
+```
+
+### 4.3 Shared Memory System (Omni.Agent.Memory)
+
+**Critical requirement**: Cross-agent memory sharing with multi-user support.
+
+**Example**: User tells Telegram bot "I'm an AI engineer" → Research agent later searching for papers should recall this context.
+
+#### Why not Omni.Fact?
+
+Current `Omni.Fact` limitations:
+- Project-scoped, not user-scoped
+- No user/identity concept
+- No embeddings for semantic retrieval
+- Tied to task system
+
+#### Memory Design
+
+```haskell
+-- Omni/Agent/Memory.hs
+
+-- | A memory is a piece of information about a user, learned by any agent
+data Memory = Memory
+  { memoryId :: UUID
+  , memoryUserId :: UserId           -- Who this memory is about
+  , memoryContent :: Text            -- The actual information
+  , memoryEmbedding :: Maybe Vector  -- For semantic search
+  , memorySource :: MemorySource     -- Which agent learned this
+  , memoryConfidence :: Double       -- 0.0-1.0
+  , memoryCreatedAt :: UTCTime
+  , memoryLastAccessedAt :: UTCTime  -- For relevance decay
+  , memoryTags :: [Text]             -- Optional categorization
+  }
+
+data MemorySource = MemorySource
+  { sourceAgent :: Text      -- "telegram", "researcher", "coder", etc.
+  , sourceSession :: UUID    -- Session ID where this was learned
+  , sourceContext :: Text    -- Brief context of how it was learned
+  }
+
+data User = User
+  { userId :: UUID
+  , userTelegramId :: Maybe Int64    -- Primary identifier initially
+  , userEmail :: Maybe Text          -- Added later when email interface exists
+  , userName :: Text                 -- Display name ("Ben", "Alice", etc.)
+  , userCreatedAt :: UTCTime
+  }
+
+-- Users are identified by Telegram ID initially
+-- The agent learns more about users over time and stores in memories
+-- e.g., "Ben is an AI engineer" becomes a memory, not a user field
+
+-- | Core operations
+storeMemory :: UserId -> Text -> MemorySource -> IO Memory
+recallMemories :: UserId -> Text -> Int -> IO [Memory]  -- semantic search
+forgetMemory :: UUID -> IO ()
+
+-- | Embedding integration (via Ollama or other provider)
+embedText :: Text -> IO Vector
+similaritySearch :: Vector -> [Memory] -> Int -> [Memory]
+```
+
+#### Multi-User Architecture
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    Memory Store                          │
+├─────────────────────────────────────────────────────────┤
+│  users table:                                            │
+│    id TEXT PRIMARY KEY                                   │
+│    name TEXT                                             │
+│    created_at TIMESTAMP                                  │
+├─────────────────────────────────────────────────────────┤
+│  memories table:                                         │
+│    id TEXT PRIMARY KEY                                   │
+│    user_id TEXT REFERENCES users(id)                     │
+│    content TEXT                                          │
+│    embedding BLOB  -- serialized float vector            │
+│    source_agent TEXT                                     │
+│    source_session TEXT                                   │
+│    source_context TEXT                                   │
+│    confidence REAL                                       │
+│    created_at TIMESTAMP                                  │
+│    last_accessed_at TIMESTAMP                            │
+│    tags TEXT  -- JSON array                              │
+└─────────────────────────────────────────────────────────┘
+```
+
+#### Memory Retrieval in Agent Loop
+
+When any agent runs, it:
+1. Identifies the current user (from context/session)
+2. Extracts key concepts from the user's request
+3. Calls `recallMemories userId query 10` to get relevant memories
+4. Injects memories into system prompt as context
+5. After completion, extracts new learnings and calls `storeMemory`
+
+```haskell
+-- In agent loop
+runAgentWithMemory :: UserId -> AgentConfig -> Text -> IO AgentResult
+runAgentWithMemory userId config prompt = do
+  -- Recall relevant memories
+  memories <- recallMemories userId prompt 10
+  let memoryContext = formatMemoriesForPrompt memories
+  
+  -- Inject into system prompt
+  let enhancedPrompt = agentSystemPrompt config <> "\n\n## User Context\n" <> memoryContext
+  
+  -- Run agent
+  result <- runAgent config { agentSystemPrompt = enhancedPrompt } prompt
+  
+  -- Extract and store new memories (could be done by the agent via tool)
+  pure result
+```
+
+#### Memory Extraction Tool
+
+Agents can explicitly store memories:
+
+```haskell
+storeMemoryTool :: Tool
+storeMemoryTool = Tool
+  { toolName = "remember"
+  , toolDescription = "Store a piece of information about the user for future reference"
+  , toolExecute = \args -> do
+      let content = args .: "content"
+          tags = args .:? "tags" .!= []
+      memory <- storeMemory currentUserId content currentSource
+      pure (toJSON memory)
+  }
+```
+
+### 4.4 LoRA Fine-tuning Service
+
+**Purpose**: Custom-tune models on successful task completions.
+
+**Workflow**:
+1. Collect successful agent sessions (prompt + tool calls + result)
+2. Format as training data (instruction, input, output)
+3. Run LoRA training via Ollama or external service
+4. Snapshot trained model with version tag
+5. A/B test against base model via evals
+
+**Storage**:
+- Training data: `_/training/<agent>/<date>.jsonl`
+- Models: Ollama model registry with tags
+
+---
+
+## 5. Infrastructure Build Plan
+
+Focus: Library primitives first, agents later.
+
+### Phase 1: Provider Abstraction (1-2 days)
+- [ ] Create `Omni.Agent.Provider` module with unified interface
+- [ ] Extract OpenRouter logic from `Engine.hs`
+- [ ] Add Ollama provider implementation
+- [ ] Add `--engine` flag to `jr work`
+- [ ] Test with local Llama model
+
+### Phase 2: Amp Re-integration (1 day)  
+- [ ] Add Amp subprocess backend to Provider
+- [ ] Handle Amp's streaming output
+- [ ] Parse Amp thread URL for linking
+
+### Phase 3: Memory System (3-4 days)
+- [ ] Create `Omni.Agent.Memory` module (separate from Fact)
+- [ ] Design schema: users, memories tables
+- [ ] Implement `storeMemory`, `recallMemories`, `forgetMemory`
+- [ ] Add embedding support via Ollama `/api/embeddings`
+- [ ] Implement similarity search
+- [ ] Create `remember` tool for agents
+- [ ] Add `runAgentWithMemory` wrapper
+
+### Phase 4: Tool Registry (1-2 days)
+- [ ] Create `Omni.Agent.Registry` for tool management
+- [ ] Define tool categories (coding, web, memory, task)
+- [ ] Allow agents to declare tool requirements
+- [ ] Add web tools (web_search, read_web_page)
+
+### Phase 5: Evals Framework (2-3 days)
+- [ ] Create `Omni.Agent.Eval` module
+- [ ] Define `EvalCase` and `EvalResult` types
+- [ ] Build eval runner
+- [ ] Add scoring (exact match, semantic, custom)
+- [ ] Create initial eval suite for Jr/coder
+
+### Phase 6: Telegram Bot Agent (3-4 days)
+**First concrete agent** - validates the infrastructure.
+
+- [ ] Create `Omni.Agent.Telegram` module
+- [ ] Telegram Bot API integration (getUpdates polling or webhook)
+- [ ] User identification via Telegram user ID
+- [ ] Auto-create user record on first message
+- [ ] Wire up memory system (recall on message, store learnings)
+- [ ] Basic conversation loop with LLM
+- [ ] Deploy as background service
+- [ ] Add `jr telegram` command for manual start
+
+**Tools for Telegram agent:**
+- `remember` - store facts about user
+- `recall` - query user's memories
+- `web_search` - answer questions (optional, phase 4)
+
+### Phase 7: Training Data Collection (1-2 days)
+- [ ] Add session export to training format
+- [ ] Store successful completions in `_/training/`
+- [ ] Create `jr train export` command
+
+### (Future) Additional Agents
+- Researcher agent
+- Planner agent  
+- Email interface (links to Telegram user identity)
+- Others...
+
+---
+
+## 6. Design Decisions
+
+| Question | Decision |
+|----------|----------|
+| Vector DB | **sqlite-vss** - SQLite extension for vector similarity |
+| User identity | **Telegram ID** initially, link to email later when adding email interface |
+| Memory privacy | **Cross-agent shared, per-user private** - all agents see all memories for a user, but users can't see each other's memories |
+| Amp integration | TBD - subprocess likely |
+| Memory decay | TBD - probably keep forever with relevance scoring |
+| LoRA training | TBD - local Ollama or cloud |
+
+---
+
+## 7. File Structure (Proposed)
+
+```
+Omni/Agent/
+├── Core.hs           # Base agent types, Worker state (existing)
+├── Engine.hs         # Agent loop, tool execution (existing)
+├── Provider.hs       # LLM provider abstraction (NEW)
+├── Provider/
+│   ├── OpenRouter.hs # Extracted from Engine.hs
+│   ├── Ollama.hs     # Local model support
+│   └── Amp.hs        # Amp CLI subprocess
+├── Memory.hs         # Shared memory system (NEW)
+├── Memory/
+│   └── Embedding.hs  # Vector operations, Ollama embeddings
+├── Tools.hs          # Core coding tools (existing)
+├── Tools/
+│   ├── Web.hs        # web_search, read_web_page (NEW)
+│   └── Memory.hs     # remember, recall tools (NEW)
+├── Eval.hs           # Evaluation framework (NEW)
+├── Training.hs       # Training data collection (NEW)
+├── Worker.hs         # Jr worker loop (existing)
+├── Git.hs            # Git operations (existing)
+├── Log.hs            # Logging utilities (existing)
+├── Event.hs          # Event types (existing)
+├── DESIGN.md         # Current design doc
+└── PLAN.md           # This document
+```
+
+---
+
+## 8. Database Schema Additions
+
+```sql
+-- Memory system tables (new database: memory.db)
+
+CREATE TABLE users (
+  id TEXT PRIMARY KEY,              -- UUID
+  telegram_id INTEGER UNIQUE,       -- Telegram user ID (primary identifier)
+  email TEXT UNIQUE,                -- Added later for email interface
+  name TEXT NOT NULL,               -- Display name
+  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE TABLE memories (
+  id TEXT PRIMARY KEY,              -- UUID
+  user_id TEXT NOT NULL REFERENCES users(id),
+  content TEXT NOT NULL,
+  embedding BLOB,                   -- float32 vector for sqlite-vss
+  source_agent TEXT NOT NULL,       -- "telegram", "coder", etc.
+  source_session TEXT,              -- Session UUID
+  source_context TEXT,              -- How this was learned
+  confidence REAL DEFAULT 0.8,
+  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+  last_accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+  tags TEXT                         -- JSON array
+);
+
+-- sqlite-vss virtual table for vector similarity search
+CREATE VIRTUAL TABLE memories_vss USING vss0(embedding(1536));
+
+CREATE INDEX idx_memories_user ON memories(user_id);
+CREATE INDEX idx_memories_agent ON memories(source_agent);
+```
+
+---
+
+## 9. Key Code References for Implementers
+
+When implementing tasks, refer to these existing patterns:
+
+### Existing Agent Infrastructure
+| File | Purpose | Key Functions/Types |
+|------|---------|---------------------|
+| `Omni/Agent/Engine.hs` | Agent loop, LLM calls | `runAgent`, `chat`, `Tool`, `LLM`, `AgentConfig` |
+| `Omni/Agent/Tools.hs` | Tool implementations | `readFileTool`, `editFileTool`, `runBashTool`, `allTools` |
+| `Omni/Agent/Worker.hs` | Jr worker loop | `start`, `runWithEngine`, `buildFullPrompt` |
+| `Omni/Agent/Core.hs` | Worker state types | `Worker`, `WorkerStatus` |
+
+### Database Patterns (follow these)
+| File | Purpose | Key Patterns |
+|------|---------|--------------|
+| `Omni/Task/Core.hs` | SQLite usage | `withDb`, schema migrations, ToRow/FromRow instances |
+| `Omni/Fact.hs` | CRUD operations | `createFact`, `getFact`, `getAllFacts` |
+
+### CLI Patterns
+| File | Purpose | Key Patterns |
+|------|---------|--------------|
+| `Omni/Jr.hs` | Main CLI entry | Docopt usage, command dispatch in `move` function |
+| `Omni/Cli.hs` | CLI helpers | `Cli.Plan`, `Cli.has`, `Cli.getArg` |
+
+### HTTP Patterns
+| File | Purpose | Key Patterns |
+|------|---------|--------------|
+| `Omni/Agent/Engine.hs` lines 560-594 | HTTP POST to LLM API | `http-conduit` usage, JSON encoding |
+
+### Build System
+- Build: `bild Omni/Agent/NewModule.hs`
+- Test: `bild --test Omni/Agent/NewModule.hs`
+- Dependencies: Add to module header comments (`: dep package-name`)
+
+---
+
+## 10. Next Steps
+
+Execute tasks in order:
+1. **t-247** Provider Abstraction (unblocked, start here)
+2. **t-248** Memory System (after t-247)
+3. **t-249** Tool Registry (after t-247, can parallel with t-248)
+4. **t-250** Evals Framework (after t-247)
+5. **t-251** Telegram Bot Agent (after t-248 + t-249)
+
+Run `jr task ready` to see what's available to work on.
author	Ben Sima <ben@bensima.com>	2025-12-11 19:15:33 -0500
committer	Ben Sima <ben@bensima.com>	2025-12-11 19:15:33 -0500
commit	225e5b7a24f0b30f6de1bd7418bf834ad345b0f3 (patch)
tree	50228e177bc5e4e04f486dea60329210e2653f22
parent	b60fc6f95e68c8581e2cec48f8d99e7c467a1db2 (diff)