summaryrefslogtreecommitdiff
path: root/Omni
diff options
context:
space:
mode:
Diffstat (limited to 'Omni')
-rw-r--r--Omni/Agent/PLAN.md589
1 files changed, 589 insertions, 0 deletions
diff --git a/Omni/Agent/PLAN.md b/Omni/Agent/PLAN.md
new file mode 100644
index 0000000..e51d09b
--- /dev/null
+++ b/Omni/Agent/PLAN.md
@@ -0,0 +1,589 @@
+# Omni Agent Infrastructure Plan
+
+**Status**: Draft
+**Author**: Ben (with AI assistance)
+**Date**: 2025-12-11
+
+## Vision
+
+A unified agent infrastructure supporting multiple specialized agents (coder, researcher, planner, telegram bot, etc.) with:
+- Shared tools, memory, and model backends
+- LoRA fine-tuning with model snapshots
+- Evals to prevent regression
+- Configurable LLM providers (local Ollama or OpenRouter)
+
+---
+
+## 0. Scope & Task Tracking
+
+**Building now**: Infrastructure and library primitives
+**First concrete agent**: Telegram Bot (validates the infrastructure)
+**Building later**: Researcher, Planner, and other agents
+
+### Active Tasks (in dependency order)
+
+| Task ID | Title | Status | Blocks |
+|---------|-------|--------|--------|
+| t-247 | Provider Abstraction | Open | t-248, t-249, t-250 |
+| t-248 | Memory System | Open (blocked by t-247) | t-251 |
+| t-249 | Tool Registry | Open (blocked by t-247) | t-251 |
+| t-250 | Evals Framework | Open (blocked by t-247) | - |
+| t-251 | Telegram Bot Agent | Open (blocked by t-248, t-249) | - |
+
+Run `jr task show <id>` for full implementation details on each task.
+
+---
+
+## 1. Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│ Agent Layer │
+├──────────┬──────────┬──────────┬──────────┬────────────────────┤
+│ Jr/Coder │Researcher│ Planner │ Telegram │ Future Agents... │
+└────┬─────┴────┬─────┴────┬─────┴────┬─────┴────────────────────┘
+ │ │ │ │
+┌────▼──────────▼──────────▼──────────▼──────────────────────────┐
+│ Omni.Agent.Core │
+│ - Agent protocol (system prompt, tool execution loop) │
+│ - Model backend abstraction (Ollama | OpenRouter | Amp) │
+│ - Conversation/session management │
+└────┬────────────────────────────────────────────────────────────┘
+ │
+┌────▼────────────────────────────────────────────────────────────┐
+│ Shared Infrastructure │
+├─────────────────┬─────────────────┬─────────────────────────────┤
+│ Omni.Agent.Tools│ Omni.Agent.Memory│ Omni.Agent.Evals │
+│ - read_file │ - Vector DB │ - Regression tests │
+│ - edit_file │ - Fact retrieval │ - Quality metrics │
+│ - run_bash │ - Session history│ - Model comparison │
+│ - search │ │ │
+│ - web_search │ │ │
+│ - (pluggable) │ │ │
+├─────────────────┴─────────────────┴─────────────────────────────┤
+│ Omni.Agent.Training │
+│ - LoRA fine-tuning orchestration │
+│ - Model snapshotting │
+│ - Training data collection │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 2. Immediate Work Items
+
+### 2.1 Add Amp Backend Support (--amp flag)
+
+**Problem**: Custom engine works but Amp is better for complex coding tasks.
+
+**Solution**: Add `--engine` flag to `jr work`:
+
+```bash
+jr work <task-id> # Uses native Engine (default)
+jr work <task-id> --engine=amp # Uses Amp via subprocess
+jr work <task-id> --engine=ollama # Uses local Ollama
+```
+
+**Implementation**:
+1. Add `EngineBackend` type: `Native | Amp | Ollama Text`
+2. Modify `Omni.Agent.Worker.start` to accept backend selection
+3. For Amp: spawn `amp --prompt-file` subprocess, capture output
+4. For Ollama: call local API instead of OpenRouter
+
+**Files to modify**:
+- `Omni/Jr.hs` - CLI parsing
+- `Omni/Agent/Worker.hs` - Backend dispatch
+- `Omni/Agent/Engine.hs` - Add Ollama provider
+
+### 2.2 Abstract LLM Provider
+
+**Current state**: `Engine.hs` hardcodes OpenRouter.
+
+**Target state**: Pluggable `LLMProvider` interface.
+
+```haskell
+-- Omni/Agent/Provider.hs
+data Provider
+ = OpenRouter { apiKey :: Text, model :: Text }
+ | Ollama { baseUrl :: Text, model :: Text }
+ | AmpCLI { promptFile :: FilePath }
+
+chat :: Provider -> [Message] -> [Tool] -> IO (Either Text Message)
+```
+
+### 2.3 Memory / Vector DB Integration
+
+**Purpose**: Long-term memory across agent sessions, shared across all agents, private per user.
+
+**Decision**: Use sqlite-vss for vector similarity search (not Omni.Fact - that's project-scoped, not user-scoped).
+
+**Key requirements**:
+- Cross-agent sharing: Telegram agent learns "Ben is an AI engineer" → Researcher agent recalls this
+- Multi-user: Each family member has private memories (identified by Telegram ID initially)
+- Embeddings via Ollama `/api/embeddings` endpoint with nomic-embed-text model
+
+See task t-248 for full implementation details.
+
+### 2.4 Pluggable Tool System
+
+**Current**: `Omni.Agent.Tools` has 6 hardcoded tools.
+
+**Target**: Registry pattern allowing agents to declare their tool sets.
+
+```haskell
+-- Each agent specifies its tools
+coderTools :: [Tool]
+coderTools = [readFileTool, writeFileTool, editFileTool, runBashTool, searchCodebaseTool]
+
+researcherTools :: [Tool]
+researcherTools = [webSearchTool, readWebPageTool, extractFactsTool, readFileTool]
+
+plannerTools :: [Tool]
+plannerTools = [taskCreateTool, taskListTool, taskUpdateTool, factQueryTool]
+
+telegramTools :: [Tool]
+telegramTools = [sendMessageTool, getUpdatesTool, factQueryTool]
+```
+
+---
+
+## 3. Agent Specifications
+
+### 3.1 Jr/Coder (existing)
+
+**Purpose**: Autonomous coding agent for task completion.
+
+**Tools**: read_file, write_file, edit_file, run_bash, search_codebase, search_and_read
+
+**System prompt**: Task-focused, code conventions, test requirements.
+
+### 3.2 Researcher (new)
+
+**Purpose**: Information gathering, analysis, summarization.
+
+**Tools**:
+- `web_search` - Search the web
+- `read_web_page` - Fetch and parse web content
+- `extract_facts` - Store learned facts in knowledge base
+- `read_file` - Read local documents
+- `query_facts` - Retrieve from knowledge base
+
+**System prompt**: Focus on accuracy, citation, verification.
+
+### 3.3 Project Planner (new)
+
+**Purpose**: Break down high-level goals into actionable tasks.
+
+**Tools**:
+- `task_create` - Create new tasks
+- `task_list` - Query existing tasks
+- `task_update` - Modify task status/content
+- `fact_query` - Get project context
+- `dependency_graph` - Visualize task dependencies
+
+**System prompt**: Project management, task decomposition, dependency analysis.
+
+### 3.4 Telegram Bot (FIRST AGENT TO BUILD)
+
+**Purpose**: Family assistant accessible via Telegram. First concrete agent to validate infrastructure.
+
+**Tools**:
+- `remember` - Store facts about the user (from Memory module)
+- `recall` - Query user's memories (from Memory module)
+- `web_search` - Answer questions requiring web lookup (from Registry)
+
+**System prompt**: Friendly, helpful, family-appropriate, concise for chat interface.
+
+**User identification**: Telegram user ID → creates/retrieves User record in memory.db
+
+See task t-251 for full implementation details.
+
+---
+
+## 4. Shared Infrastructure
+
+### 4.1 Model Backend Configuration
+
+```haskell
+-- ~/.config/omni/models.yaml or environment variables
+data ModelConfig = ModelConfig
+ { defaultProvider :: Provider
+ , modelOverrides :: Map Text Provider -- per-agent overrides
+ }
+
+-- Example config:
+-- default_provider: openrouter
+-- openrouter:
+-- api_key: $OPENROUTER_API_KEY
+-- default_model: anthropic/claude-sonnet-4.5
+-- ollama:
+-- base_url: http://localhost:11434
+-- default_model: llama3.1:70b
+-- agents:
+-- telegram: { provider: ollama, model: llama3.1:8b } # cheaper for chat
+-- coder: { provider: openrouter, model: anthropic/claude-sonnet-4.5 }
+```
+
+### 4.2 Evals Framework
+
+**Purpose**: Prevent regression when changing prompts, tools, or models.
+
+**Components**:
+1. **Test Cases**: Known task + expected outcome pairs
+2. **Runner**: Execute agent on test cases, capture results
+3. **Scorer**: Compare results (exact match, semantic similarity, human eval)
+4. **Dashboard**: Track scores over time
+
+**Implementation**:
+```haskell
+-- Omni/Agent/Eval.hs
+data EvalCase = EvalCase
+ { evalId :: Text
+ , evalPrompt :: Text
+ , evalExpectedBehavior :: Text -- or structured criteria
+ , evalTools :: [Tool]
+ }
+
+runEval :: AgentConfig -> EvalCase -> IO EvalResult
+```
+
+### 4.3 Shared Memory System (Omni.Agent.Memory)
+
+**Critical requirement**: Cross-agent memory sharing with multi-user support.
+
+**Example**: User tells Telegram bot "I'm an AI engineer" → Research agent later searching for papers should recall this context.
+
+#### Why not Omni.Fact?
+
+Current `Omni.Fact` limitations:
+- Project-scoped, not user-scoped
+- No user/identity concept
+- No embeddings for semantic retrieval
+- Tied to task system
+
+#### Memory Design
+
+```haskell
+-- Omni/Agent/Memory.hs
+
+-- | A memory is a piece of information about a user, learned by any agent
+data Memory = Memory
+ { memoryId :: UUID
+ , memoryUserId :: UserId -- Who this memory is about
+ , memoryContent :: Text -- The actual information
+ , memoryEmbedding :: Maybe Vector -- For semantic search
+ , memorySource :: MemorySource -- Which agent learned this
+ , memoryConfidence :: Double -- 0.0-1.0
+ , memoryCreatedAt :: UTCTime
+ , memoryLastAccessedAt :: UTCTime -- For relevance decay
+ , memoryTags :: [Text] -- Optional categorization
+ }
+
+data MemorySource = MemorySource
+ { sourceAgent :: Text -- "telegram", "researcher", "coder", etc.
+ , sourceSession :: UUID -- Session ID where this was learned
+ , sourceContext :: Text -- Brief context of how it was learned
+ }
+
+data User = User
+ { userId :: UUID
+ , userTelegramId :: Maybe Int64 -- Primary identifier initially
+ , userEmail :: Maybe Text -- Added later when email interface exists
+ , userName :: Text -- Display name ("Ben", "Alice", etc.)
+ , userCreatedAt :: UTCTime
+ }
+
+-- Users are identified by Telegram ID initially
+-- The agent learns more about users over time and stores in memories
+-- e.g., "Ben is an AI engineer" becomes a memory, not a user field
+
+-- | Core operations
+storeMemory :: UserId -> Text -> MemorySource -> IO Memory
+recallMemories :: UserId -> Text -> Int -> IO [Memory] -- semantic search
+forgetMemory :: UUID -> IO ()
+
+-- | Embedding integration (via Ollama or other provider)
+embedText :: Text -> IO Vector
+similaritySearch :: Vector -> [Memory] -> Int -> [Memory]
+```
+
+#### Multi-User Architecture
+
+```
+┌─────────────────────────────────────────────────────────┐
+│ Memory Store │
+├─────────────────────────────────────────────────────────┤
+│ users table: │
+│ id TEXT PRIMARY KEY │
+│ name TEXT │
+│ created_at TIMESTAMP │
+├─────────────────────────────────────────────────────────┤
+│ memories table: │
+│ id TEXT PRIMARY KEY │
+│ user_id TEXT REFERENCES users(id) │
+│ content TEXT │
+│ embedding BLOB -- serialized float vector │
+│ source_agent TEXT │
+│ source_session TEXT │
+│ source_context TEXT │
+│ confidence REAL │
+│ created_at TIMESTAMP │
+│ last_accessed_at TIMESTAMP │
+│ tags TEXT -- JSON array │
+└─────────────────────────────────────────────────────────┘
+```
+
+#### Memory Retrieval in Agent Loop
+
+When any agent runs, it:
+1. Identifies the current user (from context/session)
+2. Extracts key concepts from the user's request
+3. Calls `recallMemories userId query 10` to get relevant memories
+4. Injects memories into system prompt as context
+5. After completion, extracts new learnings and calls `storeMemory`
+
+```haskell
+-- In agent loop
+runAgentWithMemory :: UserId -> AgentConfig -> Text -> IO AgentResult
+runAgentWithMemory userId config prompt = do
+ -- Recall relevant memories
+ memories <- recallMemories userId prompt 10
+ let memoryContext = formatMemoriesForPrompt memories
+
+ -- Inject into system prompt
+ let enhancedPrompt = agentSystemPrompt config <> "\n\n## User Context\n" <> memoryContext
+
+ -- Run agent
+ result <- runAgent config { agentSystemPrompt = enhancedPrompt } prompt
+
+ -- Extract and store new memories (could be done by the agent via tool)
+ pure result
+```
+
+#### Memory Extraction Tool
+
+Agents can explicitly store memories:
+
+```haskell
+storeMemoryTool :: Tool
+storeMemoryTool = Tool
+ { toolName = "remember"
+ , toolDescription = "Store a piece of information about the user for future reference"
+ , toolExecute = \args -> do
+ let content = args .: "content"
+ tags = args .:? "tags" .!= []
+ memory <- storeMemory currentUserId content currentSource
+ pure (toJSON memory)
+ }
+```
+
+### 4.4 LoRA Fine-tuning Service
+
+**Purpose**: Custom-tune models on successful task completions.
+
+**Workflow**:
+1. Collect successful agent sessions (prompt + tool calls + result)
+2. Format as training data (instruction, input, output)
+3. Run LoRA training via Ollama or external service
+4. Snapshot trained model with version tag
+5. A/B test against base model via evals
+
+**Storage**:
+- Training data: `_/training/<agent>/<date>.jsonl`
+- Models: Ollama model registry with tags
+
+---
+
+## 5. Infrastructure Build Plan
+
+Focus: Library primitives first, agents later.
+
+### Phase 1: Provider Abstraction (1-2 days)
+- [ ] Create `Omni.Agent.Provider` module with unified interface
+- [ ] Extract OpenRouter logic from `Engine.hs`
+- [ ] Add Ollama provider implementation
+- [ ] Add `--engine` flag to `jr work`
+- [ ] Test with local Llama model
+
+### Phase 2: Amp Re-integration (1 day)
+- [ ] Add Amp subprocess backend to Provider
+- [ ] Handle Amp's streaming output
+- [ ] Parse Amp thread URL for linking
+
+### Phase 3: Memory System (3-4 days)
+- [ ] Create `Omni.Agent.Memory` module (separate from Fact)
+- [ ] Design schema: users, memories tables
+- [ ] Implement `storeMemory`, `recallMemories`, `forgetMemory`
+- [ ] Add embedding support via Ollama `/api/embeddings`
+- [ ] Implement similarity search
+- [ ] Create `remember` tool for agents
+- [ ] Add `runAgentWithMemory` wrapper
+
+### Phase 4: Tool Registry (1-2 days)
+- [ ] Create `Omni.Agent.Registry` for tool management
+- [ ] Define tool categories (coding, web, memory, task)
+- [ ] Allow agents to declare tool requirements
+- [ ] Add web tools (web_search, read_web_page)
+
+### Phase 5: Evals Framework (2-3 days)
+- [ ] Create `Omni.Agent.Eval` module
+- [ ] Define `EvalCase` and `EvalResult` types
+- [ ] Build eval runner
+- [ ] Add scoring (exact match, semantic, custom)
+- [ ] Create initial eval suite for Jr/coder
+
+### Phase 6: Telegram Bot Agent (3-4 days)
+**First concrete agent** - validates the infrastructure.
+
+- [ ] Create `Omni.Agent.Telegram` module
+- [ ] Telegram Bot API integration (getUpdates polling or webhook)
+- [ ] User identification via Telegram user ID
+- [ ] Auto-create user record on first message
+- [ ] Wire up memory system (recall on message, store learnings)
+- [ ] Basic conversation loop with LLM
+- [ ] Deploy as background service
+- [ ] Add `jr telegram` command for manual start
+
+**Tools for Telegram agent:**
+- `remember` - store facts about user
+- `recall` - query user's memories
+- `web_search` - answer questions (optional, phase 4)
+
+### Phase 7: Training Data Collection (1-2 days)
+- [ ] Add session export to training format
+- [ ] Store successful completions in `_/training/`
+- [ ] Create `jr train export` command
+
+### (Future) Additional Agents
+- Researcher agent
+- Planner agent
+- Email interface (links to Telegram user identity)
+- Others...
+
+---
+
+## 6. Design Decisions
+
+| Question | Decision |
+|----------|----------|
+| Vector DB | **sqlite-vss** - SQLite extension for vector similarity |
+| User identity | **Telegram ID** initially, link to email later when adding email interface |
+| Memory privacy | **Cross-agent shared, per-user private** - all agents see all memories for a user, but users can't see each other's memories |
+| Amp integration | TBD - subprocess likely |
+| Memory decay | TBD - probably keep forever with relevance scoring |
+| LoRA training | TBD - local Ollama or cloud |
+
+---
+
+## 7. File Structure (Proposed)
+
+```
+Omni/Agent/
+├── Core.hs # Base agent types, Worker state (existing)
+├── Engine.hs # Agent loop, tool execution (existing)
+├── Provider.hs # LLM provider abstraction (NEW)
+├── Provider/
+│ ├── OpenRouter.hs # Extracted from Engine.hs
+│ ├── Ollama.hs # Local model support
+│ └── Amp.hs # Amp CLI subprocess
+├── Memory.hs # Shared memory system (NEW)
+├── Memory/
+│ └── Embedding.hs # Vector operations, Ollama embeddings
+├── Tools.hs # Core coding tools (existing)
+├── Tools/
+│ ├── Web.hs # web_search, read_web_page (NEW)
+│ └── Memory.hs # remember, recall tools (NEW)
+├── Eval.hs # Evaluation framework (NEW)
+├── Training.hs # Training data collection (NEW)
+├── Worker.hs # Jr worker loop (existing)
+├── Git.hs # Git operations (existing)
+├── Log.hs # Logging utilities (existing)
+├── Event.hs # Event types (existing)
+├── DESIGN.md # Current design doc
+└── PLAN.md # This document
+```
+
+---
+
+## 8. Database Schema Additions
+
+```sql
+-- Memory system tables (new database: memory.db)
+
+CREATE TABLE users (
+ id TEXT PRIMARY KEY, -- UUID
+ telegram_id INTEGER UNIQUE, -- Telegram user ID (primary identifier)
+ email TEXT UNIQUE, -- Added later for email interface
+ name TEXT NOT NULL, -- Display name
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE TABLE memories (
+ id TEXT PRIMARY KEY, -- UUID
+ user_id TEXT NOT NULL REFERENCES users(id),
+ content TEXT NOT NULL,
+ embedding BLOB, -- float32 vector for sqlite-vss
+ source_agent TEXT NOT NULL, -- "telegram", "coder", etc.
+ source_session TEXT, -- Session UUID
+ source_context TEXT, -- How this was learned
+ confidence REAL DEFAULT 0.8,
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+ last_accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+ tags TEXT -- JSON array
+);
+
+-- sqlite-vss virtual table for vector similarity search
+CREATE VIRTUAL TABLE memories_vss USING vss0(embedding(1536));
+
+CREATE INDEX idx_memories_user ON memories(user_id);
+CREATE INDEX idx_memories_agent ON memories(source_agent);
+```
+
+---
+
+## 9. Key Code References for Implementers
+
+When implementing tasks, refer to these existing patterns:
+
+### Existing Agent Infrastructure
+| File | Purpose | Key Functions/Types |
+|------|---------|---------------------|
+| `Omni/Agent/Engine.hs` | Agent loop, LLM calls | `runAgent`, `chat`, `Tool`, `LLM`, `AgentConfig` |
+| `Omni/Agent/Tools.hs` | Tool implementations | `readFileTool`, `editFileTool`, `runBashTool`, `allTools` |
+| `Omni/Agent/Worker.hs` | Jr worker loop | `start`, `runWithEngine`, `buildFullPrompt` |
+| `Omni/Agent/Core.hs` | Worker state types | `Worker`, `WorkerStatus` |
+
+### Database Patterns (follow these)
+| File | Purpose | Key Patterns |
+|------|---------|--------------|
+| `Omni/Task/Core.hs` | SQLite usage | `withDb`, schema migrations, ToRow/FromRow instances |
+| `Omni/Fact.hs` | CRUD operations | `createFact`, `getFact`, `getAllFacts` |
+
+### CLI Patterns
+| File | Purpose | Key Patterns |
+|------|---------|--------------|
+| `Omni/Jr.hs` | Main CLI entry | Docopt usage, command dispatch in `move` function |
+| `Omni/Cli.hs` | CLI helpers | `Cli.Plan`, `Cli.has`, `Cli.getArg` |
+
+### HTTP Patterns
+| File | Purpose | Key Patterns |
+|------|---------|--------------|
+| `Omni/Agent/Engine.hs` lines 560-594 | HTTP POST to LLM API | `http-conduit` usage, JSON encoding |
+
+### Build System
+- Build: `bild Omni/Agent/NewModule.hs`
+- Test: `bild --test Omni/Agent/NewModule.hs`
+- Dependencies: Add to module header comments (`: dep package-name`)
+
+---
+
+## 10. Next Steps
+
+Execute tasks in order:
+1. **t-247** Provider Abstraction (unblocked, start here)
+2. **t-248** Memory System (after t-247)
+3. **t-249** Tool Registry (after t-247, can parallel with t-248)
+4. **t-250** Evals Framework (after t-247)
+5. **t-251** Telegram Bot Agent (after t-248 + t-249)
+
+Run `jr task ready` to see what's available to work on.