# Omni Agent Infrastructure Plan

**Status**: Draft  
**Author**: Ben (with AI assistance)  
**Date**: 2025-12-11

## Vision

A unified agent infrastructure supporting multiple specialized agents (coder, researcher, planner, telegram bot, etc.) with:
- Shared tools, memory, and model backends
- LoRA fine-tuning with model snapshots
- Evals to prevent regression
- Configurable LLM providers (local Ollama or OpenRouter)

---

## 0. Scope & Task Tracking

**Building now**: Infrastructure and library primitives  
**First concrete agent**: Telegram Bot (validates the infrastructure)
**Building later**: Researcher, Planner, and other agents

### Active Tasks (in dependency order)

| Task ID | Title | Status | Blocks |
|---------|-------|--------|--------|
| t-247 | Provider Abstraction | Open | t-248, t-249, t-250 |
| t-248 | Memory System | Open (blocked by t-247) | t-251 |
| t-249 | Tool Registry | Open (blocked by t-247) | t-251 |
| t-250 | Evals Framework | Open (blocked by t-247) | - |
| t-251 | Telegram Bot Agent | Open (blocked by t-248, t-249) | - |

Run `jr task show <id>` for full implementation details on each task.

---

## 1. Architecture Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                         Agent Layer                              │
├──────────┬──────────┬──────────┬──────────┬────────────────────┤
│ Jr/Coder │Researcher│ Planner  │ Telegram │ Future Agents...   │
└────┬─────┴────┬─────┴────┬─────┴────┬─────┴────────────────────┘
     │          │          │          │
┌────▼──────────▼──────────▼──────────▼──────────────────────────┐
│                    Omni.Agent.Core                              │
│  - Agent protocol (system prompt, tool execution loop)          │
│  - Model backend abstraction (Ollama | OpenRouter | Amp)        │
│  - Conversation/session management                              │
└────┬────────────────────────────────────────────────────────────┘
     │
┌────▼────────────────────────────────────────────────────────────┐
│                    Shared Infrastructure                         │
├─────────────────┬─────────────────┬─────────────────────────────┤
│ Omni.Agent.Tools│ Omni.Agent.Memory│ Omni.Agent.Evals           │
│ - read_file     │ - Vector DB      │ - Regression tests         │
│ - edit_file     │ - Fact retrieval │ - Quality metrics          │
│ - run_bash      │ - Session history│ - Model comparison         │
│ - search        │                  │                             │
│ - web_search    │                  │                             │
│ - (pluggable)   │                  │                             │
├─────────────────┴─────────────────┴─────────────────────────────┤
│                    Omni.Agent.Training                           │
│  - LoRA fine-tuning orchestration                               │
│  - Model snapshotting                                           │
│  - Training data collection                                     │
└─────────────────────────────────────────────────────────────────┘
```

---

## 2. Immediate Work Items

### 2.1 Add Amp Backend Support (--amp flag)

**Problem**: Custom engine works but Amp is better for complex coding tasks.

**Solution**: Add `--engine` flag to `jr work`:

```bash
jr work <task-id>                    # Uses native Engine (default)
jr work <task-id> --engine=amp       # Uses Amp via subprocess
jr work <task-id> --engine=ollama    # Uses local Ollama
```

**Implementation**:
1. Add `EngineBackend` type: `Native | Amp | Ollama Text`
2. Modify `Omni.Agent.Worker.start` to accept backend selection
3. For Amp: spawn `amp --prompt-file` subprocess, capture output
4. For Ollama: call local API instead of OpenRouter

**Files to modify**:
- `Omni/Jr.hs` - CLI parsing
- `Omni/Agent/Worker.hs` - Backend dispatch
- `Omni/Agent/Engine.hs` - Add Ollama provider

### 2.2 Abstract LLM Provider

**Current state**: `Engine.hs` hardcodes OpenRouter.

**Target state**: Pluggable `LLMProvider` interface.

```haskell
-- Omni/Agent/Provider.hs
data Provider
  = OpenRouter { apiKey :: Text, model :: Text }
  | Ollama { baseUrl :: Text, model :: Text }
  | AmpCLI { promptFile :: FilePath }

chat :: Provider -> [Message] -> [Tool] -> IO (Either Text Message)
```

### 2.3 Memory / Vector DB Integration

**Purpose**: Long-term memory across agent sessions, shared across all agents, private per user.

**Decision**: Use sqlite-vss for vector similarity search (not Omni.Fact - that's project-scoped, not user-scoped).

**Key requirements**:
- Cross-agent sharing: Telegram agent learns "Ben is an AI engineer" → Researcher agent recalls this
- Multi-user: Each family member has private memories (identified by Telegram ID initially)
- Embeddings via Ollama `/api/embeddings` endpoint with nomic-embed-text model

See task t-248 for full implementation details.

### 2.4 Pluggable Tool System

**Current**: `Omni.Agent.Tools` has 6 hardcoded tools.

**Target**: Registry pattern allowing agents to declare their tool sets.

```haskell
-- Each agent specifies its tools
coderTools :: [Tool]
coderTools = [readFileTool, writeFileTool, editFileTool, runBashTool, searchCodebaseTool]

researcherTools :: [Tool]  
researcherTools = [webSearchTool, readWebPageTool, extractFactsTool, readFileTool]

plannerTools :: [Tool]
plannerTools = [taskCreateTool, taskListTool, taskUpdateTool, factQueryTool]

telegramTools :: [Tool]
telegramTools = [sendMessageTool, getUpdatesTool, factQueryTool]
```

---

## 3. Agent Specifications

### 3.1 Jr/Coder (existing)

**Purpose**: Autonomous coding agent for task completion.

**Tools**: read_file, write_file, edit_file, run_bash, search_codebase, search_and_read

**System prompt**: Task-focused, code conventions, test requirements.

### 3.2 Researcher (new)

**Purpose**: Information gathering, analysis, summarization.

**Tools**: 
- `web_search` - Search the web
- `read_web_page` - Fetch and parse web content  
- `extract_facts` - Store learned facts in knowledge base
- `read_file` - Read local documents
- `query_facts` - Retrieve from knowledge base

**System prompt**: Focus on accuracy, citation, verification.

### 3.3 Project Planner (new)

**Purpose**: Break down high-level goals into actionable tasks.

**Tools**:
- `task_create` - Create new tasks
- `task_list` - Query existing tasks
- `task_update` - Modify task status/content
- `fact_query` - Get project context
- `dependency_graph` - Visualize task dependencies

**System prompt**: Project management, task decomposition, dependency analysis.

### 3.4 Telegram Bot (FIRST AGENT TO BUILD)

**Purpose**: Family assistant accessible via Telegram. First concrete agent to validate infrastructure.

**Tools**:
- `remember` - Store facts about the user (from Memory module)
- `recall` - Query user's memories (from Memory module)
- `web_search` - Answer questions requiring web lookup (from Registry)

**System prompt**: Friendly, helpful, family-appropriate, concise for chat interface.

**User identification**: Telegram user ID → creates/retrieves User record in memory.db

See task t-251 for full implementation details.

---

## 4. Shared Infrastructure

### 4.1 Model Backend Configuration

```haskell
-- ~/.config/omni/models.yaml or environment variables
data ModelConfig = ModelConfig
  { defaultProvider :: Provider
  , modelOverrides :: Map Text Provider  -- per-agent overrides
  }

-- Example config:
-- default_provider: openrouter
-- openrouter:
--   api_key: $OPENROUTER_API_KEY
--   default_model: anthropic/claude-sonnet-4.5
-- ollama:
--   base_url: http://localhost:11434
--   default_model: llama3.1:70b
-- agents:
--   telegram: { provider: ollama, model: llama3.1:8b }  # cheaper for chat
--   coder: { provider: openrouter, model: anthropic/claude-sonnet-4.5 }
```

### 4.2 Evals Framework

**Purpose**: Prevent regression when changing prompts, tools, or models.

**Components**:
1. **Test Cases**: Known task + expected outcome pairs
2. **Runner**: Execute agent on test cases, capture results
3. **Scorer**: Compare results (exact match, semantic similarity, human eval)
4. **Dashboard**: Track scores over time

**Implementation**:
```haskell
-- Omni/Agent/Eval.hs
data EvalCase = EvalCase
  { evalId :: Text
  , evalPrompt :: Text
  , evalExpectedBehavior :: Text  -- or structured criteria
  , evalTools :: [Tool]
  }

runEval :: AgentConfig -> EvalCase -> IO EvalResult
```

### 4.3 Shared Memory System (Omni.Agent.Memory)

**Critical requirement**: Cross-agent memory sharing with multi-user support.

**Example**: User tells Telegram bot "I'm an AI engineer" → Research agent later searching for papers should recall this context.

#### Why not Omni.Fact?

Current `Omni.Fact` limitations:
- Project-scoped, not user-scoped
- No user/identity concept
- No embeddings for semantic retrieval
- Tied to task system

#### Memory Design

```haskell
-- Omni/Agent/Memory.hs

-- | A memory is a piece of information about a user, learned by any agent
data Memory = Memory
  { memoryId :: UUID
  , memoryUserId :: UserId           -- Who this memory is about
  , memoryContent :: Text            -- The actual information
  , memoryEmbedding :: Maybe Vector  -- For semantic search
  , memorySource :: MemorySource     -- Which agent learned this
  , memoryConfidence :: Double       -- 0.0-1.0
  , memoryCreatedAt :: UTCTime
  , memoryLastAccessedAt :: UTCTime  -- For relevance decay
  , memoryTags :: [Text]             -- Optional categorization
  }

data MemorySource = MemorySource
  { sourceAgent :: Text      -- "telegram", "researcher", "coder", etc.
  , sourceSession :: UUID    -- Session ID where this was learned
  , sourceContext :: Text    -- Brief context of how it was learned
  }

data User = User
  { userId :: UUID
  , userTelegramId :: Maybe Int64    -- Primary identifier initially
  , userEmail :: Maybe Text          -- Added later when email interface exists
  , userName :: Text                 -- Display name ("Ben", "Alice", etc.)
  , userCreatedAt :: UTCTime
  }

-- Users are identified by Telegram ID initially
-- The agent learns more about users over time and stores in memories
-- e.g., "Ben is an AI engineer" becomes a memory, not a user field

-- | Core operations
storeMemory :: UserId -> Text -> MemorySource -> IO Memory
recallMemories :: UserId -> Text -> Int -> IO [Memory]  -- semantic search
forgetMemory :: UUID -> IO ()

-- | Embedding integration (via Ollama or other provider)
embedText :: Text -> IO Vector
similaritySearch :: Vector -> [Memory] -> Int -> [Memory]
```

#### Multi-User Architecture

```
┌─────────────────────────────────────────────────────────┐
│                    Memory Store                          │
├─────────────────────────────────────────────────────────┤
│  users table:                                            │
│    id TEXT PRIMARY KEY                                   │
│    name TEXT                                             │
│    created_at TIMESTAMP                                  │
├─────────────────────────────────────────────────────────┤
│  memories table:                                         │
│    id TEXT PRIMARY KEY                                   │
│    user_id TEXT REFERENCES users(id)                     │
│    content TEXT                                          │
│    embedding BLOB  -- serialized float vector            │
│    source_agent TEXT                                     │
│    source_session TEXT                                   │
│    source_context TEXT                                   │
│    confidence REAL                                       │
│    created_at TIMESTAMP                                  │
│    last_accessed_at TIMESTAMP                            │
│    tags TEXT  -- JSON array                              │
└─────────────────────────────────────────────────────────┘
```

#### Memory Retrieval in Agent Loop

When any agent runs, it:
1. Identifies the current user (from context/session)
2. Extracts key concepts from the user's request
3. Calls `recallMemories userId query 10` to get relevant memories
4. Injects memories into system prompt as context
5. After completion, extracts new learnings and calls `storeMemory`

```haskell
-- In agent loop
runAgentWithMemory :: UserId -> AgentConfig -> Text -> IO AgentResult
runAgentWithMemory userId config prompt = do
  -- Recall relevant memories
  memories <- recallMemories userId prompt 10
  let memoryContext = formatMemoriesForPrompt memories
  
  -- Inject into system prompt
  let enhancedPrompt = agentSystemPrompt config <> "\n\n## User Context\n" <> memoryContext
  
  -- Run agent
  result <- runAgent config { agentSystemPrompt = enhancedPrompt } prompt
  
  -- Extract and store new memories (could be done by the agent via tool)
  pure result
```

#### Memory Extraction Tool

Agents can explicitly store memories:

```haskell
storeMemoryTool :: Tool
storeMemoryTool = Tool
  { toolName = "remember"
  , toolDescription = "Store a piece of information about the user for future reference"
  , toolExecute = \args -> do
      let content = args .: "content"
          tags = args .:? "tags" .!= []
      memory <- storeMemory currentUserId content currentSource
      pure (toJSON memory)
  }
```

### 4.4 LoRA Fine-tuning Service

**Purpose**: Custom-tune models on successful task completions.

**Workflow**:
1. Collect successful agent sessions (prompt + tool calls + result)
2. Format as training data (instruction, input, output)
3. Run LoRA training via Ollama or external service
4. Snapshot trained model with version tag
5. A/B test against base model via evals

**Storage**:
- Training data: `_/training/<agent>/<date>.jsonl`
- Models: Ollama model registry with tags

---

## 5. Infrastructure Build Plan

Focus: Library primitives first, agents later.

### Phase 1: Provider Abstraction (1-2 days)
- [ ] Create `Omni.Agent.Provider` module with unified interface
- [ ] Extract OpenRouter logic from `Engine.hs`
- [ ] Add Ollama provider implementation
- [ ] Add `--engine` flag to `jr work`
- [ ] Test with local Llama model

### Phase 2: Amp Re-integration (1 day)  
- [ ] Add Amp subprocess backend to Provider
- [ ] Handle Amp's streaming output
- [ ] Parse Amp thread URL for linking

### Phase 3: Memory System (3-4 days)
- [ ] Create `Omni.Agent.Memory` module (separate from Fact)
- [ ] Design schema: users, memories tables
- [ ] Implement `storeMemory`, `recallMemories`, `forgetMemory`
- [ ] Add embedding support via Ollama `/api/embeddings`
- [ ] Implement similarity search
- [ ] Create `remember` tool for agents
- [ ] Add `runAgentWithMemory` wrapper

### Phase 4: Tool Registry (1-2 days)
- [ ] Create `Omni.Agent.Registry` for tool management
- [ ] Define tool categories (coding, web, memory, task)
- [ ] Allow agents to declare tool requirements
- [ ] Add web tools (web_search, read_web_page)

### Phase 5: Evals Framework (2-3 days)
- [ ] Create `Omni.Agent.Eval` module
- [ ] Define `EvalCase` and `EvalResult` types
- [ ] Build eval runner
- [ ] Add scoring (exact match, semantic, custom)
- [ ] Create initial eval suite for Jr/coder

### Phase 6: Telegram Bot Agent (3-4 days)
**First concrete agent** - validates the infrastructure.

- [ ] Create `Omni.Agent.Telegram` module
- [ ] Telegram Bot API integration (getUpdates polling or webhook)
- [ ] User identification via Telegram user ID
- [ ] Auto-create user record on first message
- [ ] Wire up memory system (recall on message, store learnings)
- [ ] Basic conversation loop with LLM
- [ ] Deploy as background service
- [ ] Add `jr telegram` command for manual start

**Tools for Telegram agent:**
- `remember` - store facts about user
- `recall` - query user's memories
- `web_search` - answer questions (optional, phase 4)

### Phase 7: Training Data Collection (1-2 days)
- [ ] Add session export to training format
- [ ] Store successful completions in `_/training/`
- [ ] Create `jr train export` command

### (Future) Additional Agents
- Researcher agent
- Planner agent  
- Email interface (links to Telegram user identity)
- Others...

---

## 6. Design Decisions

| Question | Decision |
|----------|----------|
| Vector DB | **sqlite-vss** - SQLite extension for vector similarity |
| User identity | **Telegram ID** initially, link to email later when adding email interface |
| Memory privacy | **Cross-agent shared, per-user private** - all agents see all memories for a user, but users can't see each other's memories |
| Amp integration | TBD - subprocess likely |
| Memory decay | TBD - probably keep forever with relevance scoring |
| LoRA training | TBD - local Ollama or cloud |

---

## 7. File Structure (Proposed)

```
Omni/Agent/
├── Core.hs           # Base agent types, Worker state (existing)
├── Engine.hs         # Agent loop, tool execution (existing)
├── Provider.hs       # LLM provider abstraction (NEW)
├── Provider/
│   ├── OpenRouter.hs # Extracted from Engine.hs
│   ├── Ollama.hs     # Local model support
│   └── Amp.hs        # Amp CLI subprocess
├── Memory.hs         # Shared memory system (NEW)
├── Memory/
│   └── Embedding.hs  # Vector operations, Ollama embeddings
├── Tools.hs          # Core coding tools (existing)
├── Tools/
│   ├── Web.hs        # web_search, read_web_page (NEW)
│   └── Memory.hs     # remember, recall tools (NEW)
├── Eval.hs           # Evaluation framework (NEW)
├── Training.hs       # Training data collection (NEW)
├── Worker.hs         # Jr worker loop (existing)
├── Git.hs            # Git operations (existing)
├── Log.hs            # Logging utilities (existing)
├── Event.hs          # Event types (existing)
├── DESIGN.md         # Current design doc
└── PLAN.md           # This document
```

---

## 8. Database Schema Additions

```sql
-- Memory system tables (new database: memory.db)

CREATE TABLE users (
  id TEXT PRIMARY KEY,              -- UUID
  telegram_id INTEGER UNIQUE,       -- Telegram user ID (primary identifier)
  email TEXT UNIQUE,                -- Added later for email interface
  name TEXT NOT NULL,               -- Display name
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE memories (
  id TEXT PRIMARY KEY,              -- UUID
  user_id TEXT NOT NULL REFERENCES users(id),
  content TEXT NOT NULL,
  embedding BLOB,                   -- float32 vector for sqlite-vss
  source_agent TEXT NOT NULL,       -- "telegram", "coder", etc.
  source_session TEXT,              -- Session UUID
  source_context TEXT,              -- How this was learned
  confidence REAL DEFAULT 0.8,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  last_accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  tags TEXT                         -- JSON array
);

-- sqlite-vss virtual table for vector similarity search
CREATE VIRTUAL TABLE memories_vss USING vss0(embedding(1536));

CREATE INDEX idx_memories_user ON memories(user_id);
CREATE INDEX idx_memories_agent ON memories(source_agent);
```

---

## 9. Key Code References for Implementers

When implementing tasks, refer to these existing patterns:

### Existing Agent Infrastructure
| File | Purpose | Key Functions/Types |
|------|---------|---------------------|
| `Omni/Agent/Engine.hs` | Agent loop, LLM calls | `runAgent`, `chat`, `Tool`, `LLM`, `AgentConfig` |
| `Omni/Agent/Tools.hs` | Tool implementations | `readFileTool`, `editFileTool`, `runBashTool`, `allTools` |
| `Omni/Agent/Worker.hs` | Jr worker loop | `start`, `runWithEngine`, `buildFullPrompt` |
| `Omni/Agent/Core.hs` | Worker state types | `Worker`, `WorkerStatus` |

### Database Patterns (follow these)
| File | Purpose | Key Patterns |
|------|---------|--------------|
| `Omni/Task/Core.hs` | SQLite usage | `withDb`, schema migrations, ToRow/FromRow instances |
| `Omni/Fact.hs` | CRUD operations | `createFact`, `getFact`, `getAllFacts` |

### CLI Patterns
| File | Purpose | Key Patterns |
|------|---------|--------------|
| `Omni/Jr.hs` | Main CLI entry | Docopt usage, command dispatch in `move` function |
| `Omni/Cli.hs` | CLI helpers | `Cli.Plan`, `Cli.has`, `Cli.getArg` |

### HTTP Patterns
| File | Purpose | Key Patterns |
|------|---------|--------------|
| `Omni/Agent/Engine.hs` lines 560-594 | HTTP POST to LLM API | `http-conduit` usage, JSON encoding |

### Build System
- Build: `bild Omni/Agent/NewModule.hs`
- Test: `bild --test Omni/Agent/NewModule.hs`
- Dependencies: Add to module header comments (`: dep package-name`)

---

## 10. Next Steps

Execute tasks in order:
1. **t-247** Provider Abstraction (unblocked, start here)
2. **t-248** Memory System (after t-247)
3. **t-249** Tool Registry (after t-247, can parallel with t-248)
4. **t-250** Evals Framework (after t-247)
5. **t-251** Telegram Bot Agent (after t-248 + t-249)

Run `jr task ready` to see what's available to work on.