diff options
| author | Ben Sima <ben@bensima.com> | 2025-12-11 19:15:33 -0500 |
|---|---|---|
| committer | Ben Sima <ben@bensima.com> | 2025-12-11 19:15:33 -0500 |
| commit | 225e5b7a24f0b30f6de1bd7418bf834ad345b0f3 (patch) | |
| tree | 50228e177bc5e4e04f486dea60329210e2653f22 | |
| parent | b60fc6f95e68c8581e2cec48f8d99e7c467a1db2 (diff) | |
Add Omni/Agent/PLAN.md - agent infrastructure roadmap
Defines architecture for multi-agent system with:
- Provider abstraction (OpenRouter, Ollama, Amp backends)
- Shared memory system (sqlite-vss, multi-user, cross-agent)
- Tool registry for pluggable tool sets
- Evals framework for regression testing
- Telegram bot as first concrete agent
Tasks: t-247 through t-251
| -rw-r--r-- | Omni/Agent/PLAN.md | 589 |
1 files changed, 589 insertions, 0 deletions
diff --git a/Omni/Agent/PLAN.md b/Omni/Agent/PLAN.md new file mode 100644 index 0000000..e51d09b --- /dev/null +++ b/Omni/Agent/PLAN.md @@ -0,0 +1,589 @@ +# Omni Agent Infrastructure Plan + +**Status**: Draft +**Author**: Ben (with AI assistance) +**Date**: 2025-12-11 + +## Vision + +A unified agent infrastructure supporting multiple specialized agents (coder, researcher, planner, telegram bot, etc.) with: +- Shared tools, memory, and model backends +- LoRA fine-tuning with model snapshots +- Evals to prevent regression +- Configurable LLM providers (local Ollama or OpenRouter) + +--- + +## 0. Scope & Task Tracking + +**Building now**: Infrastructure and library primitives +**First concrete agent**: Telegram Bot (validates the infrastructure) +**Building later**: Researcher, Planner, and other agents + +### Active Tasks (in dependency order) + +| Task ID | Title | Status | Blocks | +|---------|-------|--------|--------| +| t-247 | Provider Abstraction | Open | t-248, t-249, t-250 | +| t-248 | Memory System | Open (blocked by t-247) | t-251 | +| t-249 | Tool Registry | Open (blocked by t-247) | t-251 | +| t-250 | Evals Framework | Open (blocked by t-247) | - | +| t-251 | Telegram Bot Agent | Open (blocked by t-248, t-249) | - | + +Run `jr task show <id>` for full implementation details on each task. + +--- + +## 1. Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ Agent Layer │ +├──────────┬──────────┬──────────┬──────────┬────────────────────┤ +│ Jr/Coder │Researcher│ Planner │ Telegram │ Future Agents... │ +└────┬─────┴────┬─────┴────┬─────┴────┬─────┴────────────────────┘ + │ │ │ │ +┌────▼──────────▼──────────▼──────────▼──────────────────────────┐ +│ Omni.Agent.Core │ +│ - Agent protocol (system prompt, tool execution loop) │ +│ - Model backend abstraction (Ollama | OpenRouter | Amp) │ +│ - Conversation/session management │ +└────┬────────────────────────────────────────────────────────────┘ + │ +┌────▼────────────────────────────────────────────────────────────┐ +│ Shared Infrastructure │ +├─────────────────┬─────────────────┬─────────────────────────────┤ +│ Omni.Agent.Tools│ Omni.Agent.Memory│ Omni.Agent.Evals │ +│ - read_file │ - Vector DB │ - Regression tests │ +│ - edit_file │ - Fact retrieval │ - Quality metrics │ +│ - run_bash │ - Session history│ - Model comparison │ +│ - search │ │ │ +│ - web_search │ │ │ +│ - (pluggable) │ │ │ +├─────────────────┴─────────────────┴─────────────────────────────┤ +│ Omni.Agent.Training │ +│ - LoRA fine-tuning orchestration │ +│ - Model snapshotting │ +│ - Training data collection │ +└─────────────────────────────────────────────────────────────────┘ +``` + +--- + +## 2. Immediate Work Items + +### 2.1 Add Amp Backend Support (--amp flag) + +**Problem**: Custom engine works but Amp is better for complex coding tasks. + +**Solution**: Add `--engine` flag to `jr work`: + +```bash +jr work <task-id> # Uses native Engine (default) +jr work <task-id> --engine=amp # Uses Amp via subprocess +jr work <task-id> --engine=ollama # Uses local Ollama +``` + +**Implementation**: +1. Add `EngineBackend` type: `Native | Amp | Ollama Text` +2. Modify `Omni.Agent.Worker.start` to accept backend selection +3. For Amp: spawn `amp --prompt-file` subprocess, capture output +4. For Ollama: call local API instead of OpenRouter + +**Files to modify**: +- `Omni/Jr.hs` - CLI parsing +- `Omni/Agent/Worker.hs` - Backend dispatch +- `Omni/Agent/Engine.hs` - Add Ollama provider + +### 2.2 Abstract LLM Provider + +**Current state**: `Engine.hs` hardcodes OpenRouter. + +**Target state**: Pluggable `LLMProvider` interface. + +```haskell +-- Omni/Agent/Provider.hs +data Provider + = OpenRouter { apiKey :: Text, model :: Text } + | Ollama { baseUrl :: Text, model :: Text } + | AmpCLI { promptFile :: FilePath } + +chat :: Provider -> [Message] -> [Tool] -> IO (Either Text Message) +``` + +### 2.3 Memory / Vector DB Integration + +**Purpose**: Long-term memory across agent sessions, shared across all agents, private per user. + +**Decision**: Use sqlite-vss for vector similarity search (not Omni.Fact - that's project-scoped, not user-scoped). + +**Key requirements**: +- Cross-agent sharing: Telegram agent learns "Ben is an AI engineer" → Researcher agent recalls this +- Multi-user: Each family member has private memories (identified by Telegram ID initially) +- Embeddings via Ollama `/api/embeddings` endpoint with nomic-embed-text model + +See task t-248 for full implementation details. + +### 2.4 Pluggable Tool System + +**Current**: `Omni.Agent.Tools` has 6 hardcoded tools. + +**Target**: Registry pattern allowing agents to declare their tool sets. + +```haskell +-- Each agent specifies its tools +coderTools :: [Tool] +coderTools = [readFileTool, writeFileTool, editFileTool, runBashTool, searchCodebaseTool] + +researcherTools :: [Tool] +researcherTools = [webSearchTool, readWebPageTool, extractFactsTool, readFileTool] + +plannerTools :: [Tool] +plannerTools = [taskCreateTool, taskListTool, taskUpdateTool, factQueryTool] + +telegramTools :: [Tool] +telegramTools = [sendMessageTool, getUpdatesTool, factQueryTool] +``` + +--- + +## 3. Agent Specifications + +### 3.1 Jr/Coder (existing) + +**Purpose**: Autonomous coding agent for task completion. + +**Tools**: read_file, write_file, edit_file, run_bash, search_codebase, search_and_read + +**System prompt**: Task-focused, code conventions, test requirements. + +### 3.2 Researcher (new) + +**Purpose**: Information gathering, analysis, summarization. + +**Tools**: +- `web_search` - Search the web +- `read_web_page` - Fetch and parse web content +- `extract_facts` - Store learned facts in knowledge base +- `read_file` - Read local documents +- `query_facts` - Retrieve from knowledge base + +**System prompt**: Focus on accuracy, citation, verification. + +### 3.3 Project Planner (new) + +**Purpose**: Break down high-level goals into actionable tasks. + +**Tools**: +- `task_create` - Create new tasks +- `task_list` - Query existing tasks +- `task_update` - Modify task status/content +- `fact_query` - Get project context +- `dependency_graph` - Visualize task dependencies + +**System prompt**: Project management, task decomposition, dependency analysis. + +### 3.4 Telegram Bot (FIRST AGENT TO BUILD) + +**Purpose**: Family assistant accessible via Telegram. First concrete agent to validate infrastructure. + +**Tools**: +- `remember` - Store facts about the user (from Memory module) +- `recall` - Query user's memories (from Memory module) +- `web_search` - Answer questions requiring web lookup (from Registry) + +**System prompt**: Friendly, helpful, family-appropriate, concise for chat interface. + +**User identification**: Telegram user ID → creates/retrieves User record in memory.db + +See task t-251 for full implementation details. + +--- + +## 4. Shared Infrastructure + +### 4.1 Model Backend Configuration + +```haskell +-- ~/.config/omni/models.yaml or environment variables +data ModelConfig = ModelConfig + { defaultProvider :: Provider + , modelOverrides :: Map Text Provider -- per-agent overrides + } + +-- Example config: +-- default_provider: openrouter +-- openrouter: +-- api_key: $OPENROUTER_API_KEY +-- default_model: anthropic/claude-sonnet-4.5 +-- ollama: +-- base_url: http://localhost:11434 +-- default_model: llama3.1:70b +-- agents: +-- telegram: { provider: ollama, model: llama3.1:8b } # cheaper for chat +-- coder: { provider: openrouter, model: anthropic/claude-sonnet-4.5 } +``` + +### 4.2 Evals Framework + +**Purpose**: Prevent regression when changing prompts, tools, or models. + +**Components**: +1. **Test Cases**: Known task + expected outcome pairs +2. **Runner**: Execute agent on test cases, capture results +3. **Scorer**: Compare results (exact match, semantic similarity, human eval) +4. **Dashboard**: Track scores over time + +**Implementation**: +```haskell +-- Omni/Agent/Eval.hs +data EvalCase = EvalCase + { evalId :: Text + , evalPrompt :: Text + , evalExpectedBehavior :: Text -- or structured criteria + , evalTools :: [Tool] + } + +runEval :: AgentConfig -> EvalCase -> IO EvalResult +``` + +### 4.3 Shared Memory System (Omni.Agent.Memory) + +**Critical requirement**: Cross-agent memory sharing with multi-user support. + +**Example**: User tells Telegram bot "I'm an AI engineer" → Research agent later searching for papers should recall this context. + +#### Why not Omni.Fact? + +Current `Omni.Fact` limitations: +- Project-scoped, not user-scoped +- No user/identity concept +- No embeddings for semantic retrieval +- Tied to task system + +#### Memory Design + +```haskell +-- Omni/Agent/Memory.hs + +-- | A memory is a piece of information about a user, learned by any agent +data Memory = Memory + { memoryId :: UUID + , memoryUserId :: UserId -- Who this memory is about + , memoryContent :: Text -- The actual information + , memoryEmbedding :: Maybe Vector -- For semantic search + , memorySource :: MemorySource -- Which agent learned this + , memoryConfidence :: Double -- 0.0-1.0 + , memoryCreatedAt :: UTCTime + , memoryLastAccessedAt :: UTCTime -- For relevance decay + , memoryTags :: [Text] -- Optional categorization + } + +data MemorySource = MemorySource + { sourceAgent :: Text -- "telegram", "researcher", "coder", etc. + , sourceSession :: UUID -- Session ID where this was learned + , sourceContext :: Text -- Brief context of how it was learned + } + +data User = User + { userId :: UUID + , userTelegramId :: Maybe Int64 -- Primary identifier initially + , userEmail :: Maybe Text -- Added later when email interface exists + , userName :: Text -- Display name ("Ben", "Alice", etc.) + , userCreatedAt :: UTCTime + } + +-- Users are identified by Telegram ID initially +-- The agent learns more about users over time and stores in memories +-- e.g., "Ben is an AI engineer" becomes a memory, not a user field + +-- | Core operations +storeMemory :: UserId -> Text -> MemorySource -> IO Memory +recallMemories :: UserId -> Text -> Int -> IO [Memory] -- semantic search +forgetMemory :: UUID -> IO () + +-- | Embedding integration (via Ollama or other provider) +embedText :: Text -> IO Vector +similaritySearch :: Vector -> [Memory] -> Int -> [Memory] +``` + +#### Multi-User Architecture + +``` +┌─────────────────────────────────────────────────────────┐ +│ Memory Store │ +├─────────────────────────────────────────────────────────┤ +│ users table: │ +│ id TEXT PRIMARY KEY │ +│ name TEXT │ +│ created_at TIMESTAMP │ +├─────────────────────────────────────────────────────────┤ +│ memories table: │ +│ id TEXT PRIMARY KEY │ +│ user_id TEXT REFERENCES users(id) │ +│ content TEXT │ +│ embedding BLOB -- serialized float vector │ +│ source_agent TEXT │ +│ source_session TEXT │ +│ source_context TEXT │ +│ confidence REAL │ +│ created_at TIMESTAMP │ +│ last_accessed_at TIMESTAMP │ +│ tags TEXT -- JSON array │ +└─────────────────────────────────────────────────────────┘ +``` + +#### Memory Retrieval in Agent Loop + +When any agent runs, it: +1. Identifies the current user (from context/session) +2. Extracts key concepts from the user's request +3. Calls `recallMemories userId query 10` to get relevant memories +4. Injects memories into system prompt as context +5. After completion, extracts new learnings and calls `storeMemory` + +```haskell +-- In agent loop +runAgentWithMemory :: UserId -> AgentConfig -> Text -> IO AgentResult +runAgentWithMemory userId config prompt = do + -- Recall relevant memories + memories <- recallMemories userId prompt 10 + let memoryContext = formatMemoriesForPrompt memories + + -- Inject into system prompt + let enhancedPrompt = agentSystemPrompt config <> "\n\n## User Context\n" <> memoryContext + + -- Run agent + result <- runAgent config { agentSystemPrompt = enhancedPrompt } prompt + + -- Extract and store new memories (could be done by the agent via tool) + pure result +``` + +#### Memory Extraction Tool + +Agents can explicitly store memories: + +```haskell +storeMemoryTool :: Tool +storeMemoryTool = Tool + { toolName = "remember" + , toolDescription = "Store a piece of information about the user for future reference" + , toolExecute = \args -> do + let content = args .: "content" + tags = args .:? "tags" .!= [] + memory <- storeMemory currentUserId content currentSource + pure (toJSON memory) + } +``` + +### 4.4 LoRA Fine-tuning Service + +**Purpose**: Custom-tune models on successful task completions. + +**Workflow**: +1. Collect successful agent sessions (prompt + tool calls + result) +2. Format as training data (instruction, input, output) +3. Run LoRA training via Ollama or external service +4. Snapshot trained model with version tag +5. A/B test against base model via evals + +**Storage**: +- Training data: `_/training/<agent>/<date>.jsonl` +- Models: Ollama model registry with tags + +--- + +## 5. Infrastructure Build Plan + +Focus: Library primitives first, agents later. + +### Phase 1: Provider Abstraction (1-2 days) +- [ ] Create `Omni.Agent.Provider` module with unified interface +- [ ] Extract OpenRouter logic from `Engine.hs` +- [ ] Add Ollama provider implementation +- [ ] Add `--engine` flag to `jr work` +- [ ] Test with local Llama model + +### Phase 2: Amp Re-integration (1 day) +- [ ] Add Amp subprocess backend to Provider +- [ ] Handle Amp's streaming output +- [ ] Parse Amp thread URL for linking + +### Phase 3: Memory System (3-4 days) +- [ ] Create `Omni.Agent.Memory` module (separate from Fact) +- [ ] Design schema: users, memories tables +- [ ] Implement `storeMemory`, `recallMemories`, `forgetMemory` +- [ ] Add embedding support via Ollama `/api/embeddings` +- [ ] Implement similarity search +- [ ] Create `remember` tool for agents +- [ ] Add `runAgentWithMemory` wrapper + +### Phase 4: Tool Registry (1-2 days) +- [ ] Create `Omni.Agent.Registry` for tool management +- [ ] Define tool categories (coding, web, memory, task) +- [ ] Allow agents to declare tool requirements +- [ ] Add web tools (web_search, read_web_page) + +### Phase 5: Evals Framework (2-3 days) +- [ ] Create `Omni.Agent.Eval` module +- [ ] Define `EvalCase` and `EvalResult` types +- [ ] Build eval runner +- [ ] Add scoring (exact match, semantic, custom) +- [ ] Create initial eval suite for Jr/coder + +### Phase 6: Telegram Bot Agent (3-4 days) +**First concrete agent** - validates the infrastructure. + +- [ ] Create `Omni.Agent.Telegram` module +- [ ] Telegram Bot API integration (getUpdates polling or webhook) +- [ ] User identification via Telegram user ID +- [ ] Auto-create user record on first message +- [ ] Wire up memory system (recall on message, store learnings) +- [ ] Basic conversation loop with LLM +- [ ] Deploy as background service +- [ ] Add `jr telegram` command for manual start + +**Tools for Telegram agent:** +- `remember` - store facts about user +- `recall` - query user's memories +- `web_search` - answer questions (optional, phase 4) + +### Phase 7: Training Data Collection (1-2 days) +- [ ] Add session export to training format +- [ ] Store successful completions in `_/training/` +- [ ] Create `jr train export` command + +### (Future) Additional Agents +- Researcher agent +- Planner agent +- Email interface (links to Telegram user identity) +- Others... + +--- + +## 6. Design Decisions + +| Question | Decision | +|----------|----------| +| Vector DB | **sqlite-vss** - SQLite extension for vector similarity | +| User identity | **Telegram ID** initially, link to email later when adding email interface | +| Memory privacy | **Cross-agent shared, per-user private** - all agents see all memories for a user, but users can't see each other's memories | +| Amp integration | TBD - subprocess likely | +| Memory decay | TBD - probably keep forever with relevance scoring | +| LoRA training | TBD - local Ollama or cloud | + +--- + +## 7. File Structure (Proposed) + +``` +Omni/Agent/ +├── Core.hs # Base agent types, Worker state (existing) +├── Engine.hs # Agent loop, tool execution (existing) +├── Provider.hs # LLM provider abstraction (NEW) +├── Provider/ +│ ├── OpenRouter.hs # Extracted from Engine.hs +│ ├── Ollama.hs # Local model support +│ └── Amp.hs # Amp CLI subprocess +├── Memory.hs # Shared memory system (NEW) +├── Memory/ +│ └── Embedding.hs # Vector operations, Ollama embeddings +├── Tools.hs # Core coding tools (existing) +├── Tools/ +│ ├── Web.hs # web_search, read_web_page (NEW) +│ └── Memory.hs # remember, recall tools (NEW) +├── Eval.hs # Evaluation framework (NEW) +├── Training.hs # Training data collection (NEW) +├── Worker.hs # Jr worker loop (existing) +├── Git.hs # Git operations (existing) +├── Log.hs # Logging utilities (existing) +├── Event.hs # Event types (existing) +├── DESIGN.md # Current design doc +└── PLAN.md # This document +``` + +--- + +## 8. Database Schema Additions + +```sql +-- Memory system tables (new database: memory.db) + +CREATE TABLE users ( + id TEXT PRIMARY KEY, -- UUID + telegram_id INTEGER UNIQUE, -- Telegram user ID (primary identifier) + email TEXT UNIQUE, -- Added later for email interface + name TEXT NOT NULL, -- Display name + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); + +CREATE TABLE memories ( + id TEXT PRIMARY KEY, -- UUID + user_id TEXT NOT NULL REFERENCES users(id), + content TEXT NOT NULL, + embedding BLOB, -- float32 vector for sqlite-vss + source_agent TEXT NOT NULL, -- "telegram", "coder", etc. + source_session TEXT, -- Session UUID + source_context TEXT, -- How this was learned + confidence REAL DEFAULT 0.8, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + last_accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + tags TEXT -- JSON array +); + +-- sqlite-vss virtual table for vector similarity search +CREATE VIRTUAL TABLE memories_vss USING vss0(embedding(1536)); + +CREATE INDEX idx_memories_user ON memories(user_id); +CREATE INDEX idx_memories_agent ON memories(source_agent); +``` + +--- + +## 9. Key Code References for Implementers + +When implementing tasks, refer to these existing patterns: + +### Existing Agent Infrastructure +| File | Purpose | Key Functions/Types | +|------|---------|---------------------| +| `Omni/Agent/Engine.hs` | Agent loop, LLM calls | `runAgent`, `chat`, `Tool`, `LLM`, `AgentConfig` | +| `Omni/Agent/Tools.hs` | Tool implementations | `readFileTool`, `editFileTool`, `runBashTool`, `allTools` | +| `Omni/Agent/Worker.hs` | Jr worker loop | `start`, `runWithEngine`, `buildFullPrompt` | +| `Omni/Agent/Core.hs` | Worker state types | `Worker`, `WorkerStatus` | + +### Database Patterns (follow these) +| File | Purpose | Key Patterns | +|------|---------|--------------| +| `Omni/Task/Core.hs` | SQLite usage | `withDb`, schema migrations, ToRow/FromRow instances | +| `Omni/Fact.hs` | CRUD operations | `createFact`, `getFact`, `getAllFacts` | + +### CLI Patterns +| File | Purpose | Key Patterns | +|------|---------|--------------| +| `Omni/Jr.hs` | Main CLI entry | Docopt usage, command dispatch in `move` function | +| `Omni/Cli.hs` | CLI helpers | `Cli.Plan`, `Cli.has`, `Cli.getArg` | + +### HTTP Patterns +| File | Purpose | Key Patterns | +|------|---------|--------------| +| `Omni/Agent/Engine.hs` lines 560-594 | HTTP POST to LLM API | `http-conduit` usage, JSON encoding | + +### Build System +- Build: `bild Omni/Agent/NewModule.hs` +- Test: `bild --test Omni/Agent/NewModule.hs` +- Dependencies: Add to module header comments (`: dep package-name`) + +--- + +## 10. Next Steps + +Execute tasks in order: +1. **t-247** Provider Abstraction (unblocked, start here) +2. **t-248** Memory System (after t-247) +3. **t-249** Tool Registry (after t-247, can parallel with t-248) +4. **t-250** Evals Framework (after t-247) +5. **t-251** Telegram Bot Agent (after t-248 + t-249) + +Run `jr task ready` to see what's available to work on. |
