summaryrefslogtreecommitdiff
path: root/Omni/Agent/PLAN.md
blob: e51d09ba1db0d39e0556031e90042723f3475898 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
# Omni Agent Infrastructure Plan

**Status**: Draft  
**Author**: Ben (with AI assistance)  
**Date**: 2025-12-11

## Vision

A unified agent infrastructure supporting multiple specialized agents (coder, researcher, planner, telegram bot, etc.) with:
- Shared tools, memory, and model backends
- LoRA fine-tuning with model snapshots
- Evals to prevent regression
- Configurable LLM providers (local Ollama or OpenRouter)

---

## 0. Scope & Task Tracking

**Building now**: Infrastructure and library primitives  
**First concrete agent**: Telegram Bot (validates the infrastructure)
**Building later**: Researcher, Planner, and other agents

### Active Tasks (in dependency order)

| Task ID | Title | Status | Blocks |
|---------|-------|--------|--------|
| t-247 | Provider Abstraction | Open | t-248, t-249, t-250 |
| t-248 | Memory System | Open (blocked by t-247) | t-251 |
| t-249 | Tool Registry | Open (blocked by t-247) | t-251 |
| t-250 | Evals Framework | Open (blocked by t-247) | - |
| t-251 | Telegram Bot Agent | Open (blocked by t-248, t-249) | - |

Run `jr task show <id>` for full implementation details on each task.

---

## 1. Architecture Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                         Agent Layer                              │
├──────────┬──────────┬──────────┬──────────┬────────────────────┤
│ Jr/Coder │Researcher│ Planner  │ Telegram │ Future Agents...   │
└────┬─────┴────┬─────┴────┬─────┴────┬─────┴────────────────────┘
     │          │          │          │
┌────▼──────────▼──────────▼──────────▼──────────────────────────┐
│                    Omni.Agent.Core                              │
│  - Agent protocol (system prompt, tool execution loop)          │
│  - Model backend abstraction (Ollama | OpenRouter | Amp)        │
│  - Conversation/session management                              │
└────┬────────────────────────────────────────────────────────────┘
     │
┌────▼────────────────────────────────────────────────────────────┐
│                    Shared Infrastructure                         │
├─────────────────┬─────────────────┬─────────────────────────────┤
│ Omni.Agent.Tools│ Omni.Agent.Memory│ Omni.Agent.Evals           │
│ - read_file     │ - Vector DB      │ - Regression tests         │
│ - edit_file     │ - Fact retrieval │ - Quality metrics          │
│ - run_bash      │ - Session history│ - Model comparison         │
│ - search        │                  │                             │
│ - web_search    │                  │                             │
│ - (pluggable)   │                  │                             │
├─────────────────┴─────────────────┴─────────────────────────────┤
│                    Omni.Agent.Training                           │
│  - LoRA fine-tuning orchestration                               │
│  - Model snapshotting                                           │
│  - Training data collection                                     │
└─────────────────────────────────────────────────────────────────┘
```

---

## 2. Immediate Work Items

### 2.1 Add Amp Backend Support (--amp flag)

**Problem**: Custom engine works but Amp is better for complex coding tasks.

**Solution**: Add `--engine` flag to `jr work`:

```bash
jr work <task-id>                    # Uses native Engine (default)
jr work <task-id> --engine=amp       # Uses Amp via subprocess
jr work <task-id> --engine=ollama    # Uses local Ollama
```

**Implementation**:
1. Add `EngineBackend` type: `Native | Amp | Ollama Text`
2. Modify `Omni.Agent.Worker.start` to accept backend selection
3. For Amp: spawn `amp --prompt-file` subprocess, capture output
4. For Ollama: call local API instead of OpenRouter

**Files to modify**:
- `Omni/Jr.hs` - CLI parsing
- `Omni/Agent/Worker.hs` - Backend dispatch
- `Omni/Agent/Engine.hs` - Add Ollama provider

### 2.2 Abstract LLM Provider

**Current state**: `Engine.hs` hardcodes OpenRouter.

**Target state**: Pluggable `LLMProvider` interface.

```haskell
-- Omni/Agent/Provider.hs
data Provider
  = OpenRouter { apiKey :: Text, model :: Text }
  | Ollama { baseUrl :: Text, model :: Text }
  | AmpCLI { promptFile :: FilePath }

chat :: Provider -> [Message] -> [Tool] -> IO (Either Text Message)
```

### 2.3 Memory / Vector DB Integration

**Purpose**: Long-term memory across agent sessions, shared across all agents, private per user.

**Decision**: Use sqlite-vss for vector similarity search (not Omni.Fact - that's project-scoped, not user-scoped).

**Key requirements**:
- Cross-agent sharing: Telegram agent learns "Ben is an AI engineer" → Researcher agent recalls this
- Multi-user: Each family member has private memories (identified by Telegram ID initially)
- Embeddings via Ollama `/api/embeddings` endpoint with nomic-embed-text model

See task t-248 for full implementation details.

### 2.4 Pluggable Tool System

**Current**: `Omni.Agent.Tools` has 6 hardcoded tools.

**Target**: Registry pattern allowing agents to declare their tool sets.

```haskell
-- Each agent specifies its tools
coderTools :: [Tool]
coderTools = [readFileTool, writeFileTool, editFileTool, runBashTool, searchCodebaseTool]

researcherTools :: [Tool]  
researcherTools = [webSearchTool, readWebPageTool, extractFactsTool, readFileTool]

plannerTools :: [Tool]
plannerTools = [taskCreateTool, taskListTool, taskUpdateTool, factQueryTool]

telegramTools :: [Tool]
telegramTools = [sendMessageTool, getUpdatesTool, factQueryTool]
```

---

## 3. Agent Specifications

### 3.1 Jr/Coder (existing)

**Purpose**: Autonomous coding agent for task completion.

**Tools**: read_file, write_file, edit_file, run_bash, search_codebase, search_and_read

**System prompt**: Task-focused, code conventions, test requirements.

### 3.2 Researcher (new)

**Purpose**: Information gathering, analysis, summarization.

**Tools**: 
- `web_search` - Search the web
- `read_web_page` - Fetch and parse web content  
- `extract_facts` - Store learned facts in knowledge base
- `read_file` - Read local documents
- `query_facts` - Retrieve from knowledge base

**System prompt**: Focus on accuracy, citation, verification.

### 3.3 Project Planner (new)

**Purpose**: Break down high-level goals into actionable tasks.

**Tools**:
- `task_create` - Create new tasks
- `task_list` - Query existing tasks
- `task_update` - Modify task status/content
- `fact_query` - Get project context
- `dependency_graph` - Visualize task dependencies

**System prompt**: Project management, task decomposition, dependency analysis.

### 3.4 Telegram Bot (FIRST AGENT TO BUILD)

**Purpose**: Family assistant accessible via Telegram. First concrete agent to validate infrastructure.

**Tools**:
- `remember` - Store facts about the user (from Memory module)
- `recall` - Query user's memories (from Memory module)
- `web_search` - Answer questions requiring web lookup (from Registry)

**System prompt**: Friendly, helpful, family-appropriate, concise for chat interface.

**User identification**: Telegram user ID → creates/retrieves User record in memory.db

See task t-251 for full implementation details.

---

## 4. Shared Infrastructure

### 4.1 Model Backend Configuration

```haskell
-- ~/.config/omni/models.yaml or environment variables
data ModelConfig = ModelConfig
  { defaultProvider :: Provider
  , modelOverrides :: Map Text Provider  -- per-agent overrides
  }

-- Example config:
-- default_provider: openrouter
-- openrouter:
--   api_key: $OPENROUTER_API_KEY
--   default_model: anthropic/claude-sonnet-4.5
-- ollama:
--   base_url: http://localhost:11434
--   default_model: llama3.1:70b
-- agents:
--   telegram: { provider: ollama, model: llama3.1:8b }  # cheaper for chat
--   coder: { provider: openrouter, model: anthropic/claude-sonnet-4.5 }
```

### 4.2 Evals Framework

**Purpose**: Prevent regression when changing prompts, tools, or models.

**Components**:
1. **Test Cases**: Known task + expected outcome pairs
2. **Runner**: Execute agent on test cases, capture results
3. **Scorer**: Compare results (exact match, semantic similarity, human eval)
4. **Dashboard**: Track scores over time

**Implementation**:
```haskell
-- Omni/Agent/Eval.hs
data EvalCase = EvalCase
  { evalId :: Text
  , evalPrompt :: Text
  , evalExpectedBehavior :: Text  -- or structured criteria
  , evalTools :: [Tool]
  }

runEval :: AgentConfig -> EvalCase -> IO EvalResult
```

### 4.3 Shared Memory System (Omni.Agent.Memory)

**Critical requirement**: Cross-agent memory sharing with multi-user support.

**Example**: User tells Telegram bot "I'm an AI engineer" → Research agent later searching for papers should recall this context.

#### Why not Omni.Fact?

Current `Omni.Fact` limitations:
- Project-scoped, not user-scoped
- No user/identity concept
- No embeddings for semantic retrieval
- Tied to task system

#### Memory Design

```haskell
-- Omni/Agent/Memory.hs

-- | A memory is a piece of information about a user, learned by any agent
data Memory = Memory
  { memoryId :: UUID
  , memoryUserId :: UserId           -- Who this memory is about
  , memoryContent :: Text            -- The actual information
  , memoryEmbedding :: Maybe Vector  -- For semantic search
  , memorySource :: MemorySource     -- Which agent learned this
  , memoryConfidence :: Double       -- 0.0-1.0
  , memoryCreatedAt :: UTCTime
  , memoryLastAccessedAt :: UTCTime  -- For relevance decay
  , memoryTags :: [Text]             -- Optional categorization
  }

data MemorySource = MemorySource
  { sourceAgent :: Text      -- "telegram", "researcher", "coder", etc.
  , sourceSession :: UUID    -- Session ID where this was learned
  , sourceContext :: Text    -- Brief context of how it was learned
  }

data User = User
  { userId :: UUID
  , userTelegramId :: Maybe Int64    -- Primary identifier initially
  , userEmail :: Maybe Text          -- Added later when email interface exists
  , userName :: Text                 -- Display name ("Ben", "Alice", etc.)
  , userCreatedAt :: UTCTime
  }

-- Users are identified by Telegram ID initially
-- The agent learns more about users over time and stores in memories
-- e.g., "Ben is an AI engineer" becomes a memory, not a user field

-- | Core operations
storeMemory :: UserId -> Text -> MemorySource -> IO Memory
recallMemories :: UserId -> Text -> Int -> IO [Memory]  -- semantic search
forgetMemory :: UUID -> IO ()

-- | Embedding integration (via Ollama or other provider)
embedText :: Text -> IO Vector
similaritySearch :: Vector -> [Memory] -> Int -> [Memory]
```

#### Multi-User Architecture

```
┌─────────────────────────────────────────────────────────┐
│                    Memory Store                          │
├─────────────────────────────────────────────────────────┤
│  users table:                                            │
│    id TEXT PRIMARY KEY                                   │
│    name TEXT                                             │
│    created_at TIMESTAMP                                  │
├─────────────────────────────────────────────────────────┤
│  memories table:                                         │
│    id TEXT PRIMARY KEY                                   │
│    user_id TEXT REFERENCES users(id)                     │
│    content TEXT                                          │
│    embedding BLOB  -- serialized float vector            │
│    source_agent TEXT                                     │
│    source_session TEXT                                   │
│    source_context TEXT                                   │
│    confidence REAL                                       │
│    created_at TIMESTAMP                                  │
│    last_accessed_at TIMESTAMP                            │
│    tags TEXT  -- JSON array                              │
└─────────────────────────────────────────────────────────┘
```

#### Memory Retrieval in Agent Loop

When any agent runs, it:
1. Identifies the current user (from context/session)
2. Extracts key concepts from the user's request
3. Calls `recallMemories userId query 10` to get relevant memories
4. Injects memories into system prompt as context
5. After completion, extracts new learnings and calls `storeMemory`

```haskell
-- In agent loop
runAgentWithMemory :: UserId -> AgentConfig -> Text -> IO AgentResult
runAgentWithMemory userId config prompt = do
  -- Recall relevant memories
  memories <- recallMemories userId prompt 10
  let memoryContext = formatMemoriesForPrompt memories
  
  -- Inject into system prompt
  let enhancedPrompt = agentSystemPrompt config <> "\n\n## User Context\n" <> memoryContext
  
  -- Run agent
  result <- runAgent config { agentSystemPrompt = enhancedPrompt } prompt
  
  -- Extract and store new memories (could be done by the agent via tool)
  pure result
```

#### Memory Extraction Tool

Agents can explicitly store memories:

```haskell
storeMemoryTool :: Tool
storeMemoryTool = Tool
  { toolName = "remember"
  , toolDescription = "Store a piece of information about the user for future reference"
  , toolExecute = \args -> do
      let content = args .: "content"
          tags = args .:? "tags" .!= []
      memory <- storeMemory currentUserId content currentSource
      pure (toJSON memory)
  }
```

### 4.4 LoRA Fine-tuning Service

**Purpose**: Custom-tune models on successful task completions.

**Workflow**:
1. Collect successful agent sessions (prompt + tool calls + result)
2. Format as training data (instruction, input, output)
3. Run LoRA training via Ollama or external service
4. Snapshot trained model with version tag
5. A/B test against base model via evals

**Storage**:
- Training data: `_/training/<agent>/<date>.jsonl`
- Models: Ollama model registry with tags

---

## 5. Infrastructure Build Plan

Focus: Library primitives first, agents later.

### Phase 1: Provider Abstraction (1-2 days)
- [ ] Create `Omni.Agent.Provider` module with unified interface
- [ ] Extract OpenRouter logic from `Engine.hs`
- [ ] Add Ollama provider implementation
- [ ] Add `--engine` flag to `jr work`
- [ ] Test with local Llama model

### Phase 2: Amp Re-integration (1 day)  
- [ ] Add Amp subprocess backend to Provider
- [ ] Handle Amp's streaming output
- [ ] Parse Amp thread URL for linking

### Phase 3: Memory System (3-4 days)
- [ ] Create `Omni.Agent.Memory` module (separate from Fact)
- [ ] Design schema: users, memories tables
- [ ] Implement `storeMemory`, `recallMemories`, `forgetMemory`
- [ ] Add embedding support via Ollama `/api/embeddings`
- [ ] Implement similarity search
- [ ] Create `remember` tool for agents
- [ ] Add `runAgentWithMemory` wrapper

### Phase 4: Tool Registry (1-2 days)
- [ ] Create `Omni.Agent.Registry` for tool management
- [ ] Define tool categories (coding, web, memory, task)
- [ ] Allow agents to declare tool requirements
- [ ] Add web tools (web_search, read_web_page)

### Phase 5: Evals Framework (2-3 days)
- [ ] Create `Omni.Agent.Eval` module
- [ ] Define `EvalCase` and `EvalResult` types
- [ ] Build eval runner
- [ ] Add scoring (exact match, semantic, custom)
- [ ] Create initial eval suite for Jr/coder

### Phase 6: Telegram Bot Agent (3-4 days)
**First concrete agent** - validates the infrastructure.

- [ ] Create `Omni.Agent.Telegram` module
- [ ] Telegram Bot API integration (getUpdates polling or webhook)
- [ ] User identification via Telegram user ID
- [ ] Auto-create user record on first message
- [ ] Wire up memory system (recall on message, store learnings)
- [ ] Basic conversation loop with LLM
- [ ] Deploy as background service
- [ ] Add `jr telegram` command for manual start

**Tools for Telegram agent:**
- `remember` - store facts about user
- `recall` - query user's memories
- `web_search` - answer questions (optional, phase 4)

### Phase 7: Training Data Collection (1-2 days)
- [ ] Add session export to training format
- [ ] Store successful completions in `_/training/`
- [ ] Create `jr train export` command

### (Future) Additional Agents
- Researcher agent
- Planner agent  
- Email interface (links to Telegram user identity)
- Others...

---

## 6. Design Decisions

| Question | Decision |
|----------|----------|
| Vector DB | **sqlite-vss** - SQLite extension for vector similarity |
| User identity | **Telegram ID** initially, link to email later when adding email interface |
| Memory privacy | **Cross-agent shared, per-user private** - all agents see all memories for a user, but users can't see each other's memories |
| Amp integration | TBD - subprocess likely |
| Memory decay | TBD - probably keep forever with relevance scoring |
| LoRA training | TBD - local Ollama or cloud |

---

## 7. File Structure (Proposed)

```
Omni/Agent/
├── Core.hs           # Base agent types, Worker state (existing)
├── Engine.hs         # Agent loop, tool execution (existing)
├── Provider.hs       # LLM provider abstraction (NEW)
├── Provider/
│   ├── OpenRouter.hs # Extracted from Engine.hs
│   ├── Ollama.hs     # Local model support
│   └── Amp.hs        # Amp CLI subprocess
├── Memory.hs         # Shared memory system (NEW)
├── Memory/
│   └── Embedding.hs  # Vector operations, Ollama embeddings
├── Tools.hs          # Core coding tools (existing)
├── Tools/
│   ├── Web.hs        # web_search, read_web_page (NEW)
│   └── Memory.hs     # remember, recall tools (NEW)
├── Eval.hs           # Evaluation framework (NEW)
├── Training.hs       # Training data collection (NEW)
├── Worker.hs         # Jr worker loop (existing)
├── Git.hs            # Git operations (existing)
├── Log.hs            # Logging utilities (existing)
├── Event.hs          # Event types (existing)
├── DESIGN.md         # Current design doc
└── PLAN.md           # This document
```

---

## 8. Database Schema Additions

```sql
-- Memory system tables (new database: memory.db)

CREATE TABLE users (
  id TEXT PRIMARY KEY,              -- UUID
  telegram_id INTEGER UNIQUE,       -- Telegram user ID (primary identifier)
  email TEXT UNIQUE,                -- Added later for email interface
  name TEXT NOT NULL,               -- Display name
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE memories (
  id TEXT PRIMARY KEY,              -- UUID
  user_id TEXT NOT NULL REFERENCES users(id),
  content TEXT NOT NULL,
  embedding BLOB,                   -- float32 vector for sqlite-vss
  source_agent TEXT NOT NULL,       -- "telegram", "coder", etc.
  source_session TEXT,              -- Session UUID
  source_context TEXT,              -- How this was learned
  confidence REAL DEFAULT 0.8,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  last_accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  tags TEXT                         -- JSON array
);

-- sqlite-vss virtual table for vector similarity search
CREATE VIRTUAL TABLE memories_vss USING vss0(embedding(1536));

CREATE INDEX idx_memories_user ON memories(user_id);
CREATE INDEX idx_memories_agent ON memories(source_agent);
```

---

## 9. Key Code References for Implementers

When implementing tasks, refer to these existing patterns:

### Existing Agent Infrastructure
| File | Purpose | Key Functions/Types |
|------|---------|---------------------|
| `Omni/Agent/Engine.hs` | Agent loop, LLM calls | `runAgent`, `chat`, `Tool`, `LLM`, `AgentConfig` |
| `Omni/Agent/Tools.hs` | Tool implementations | `readFileTool`, `editFileTool`, `runBashTool`, `allTools` |
| `Omni/Agent/Worker.hs` | Jr worker loop | `start`, `runWithEngine`, `buildFullPrompt` |
| `Omni/Agent/Core.hs` | Worker state types | `Worker`, `WorkerStatus` |

### Database Patterns (follow these)
| File | Purpose | Key Patterns |
|------|---------|--------------|
| `Omni/Task/Core.hs` | SQLite usage | `withDb`, schema migrations, ToRow/FromRow instances |
| `Omni/Fact.hs` | CRUD operations | `createFact`, `getFact`, `getAllFacts` |

### CLI Patterns
| File | Purpose | Key Patterns |
|------|---------|--------------|
| `Omni/Jr.hs` | Main CLI entry | Docopt usage, command dispatch in `move` function |
| `Omni/Cli.hs` | CLI helpers | `Cli.Plan`, `Cli.has`, `Cli.getArg` |

### HTTP Patterns
| File | Purpose | Key Patterns |
|------|---------|--------------|
| `Omni/Agent/Engine.hs` lines 560-594 | HTTP POST to LLM API | `http-conduit` usage, JSON encoding |

### Build System
- Build: `bild Omni/Agent/NewModule.hs`
- Test: `bild --test Omni/Agent/NewModule.hs`
- Dependencies: Add to module header comments (`: dep package-name`)

---

## 10. Next Steps

Execute tasks in order:
1. **t-247** Provider Abstraction (unblocked, start here)
2. **t-248** Memory System (after t-247)
3. **t-249** Tool Registry (after t-247, can parallel with t-248)
4. **t-250** Evals Framework (after t-247)
5. **t-251** Telegram Bot Agent (after t-248 + t-249)

Run `jr task ready` to see what's available to work on.