Omni/Agent/Subagent/HARDENING.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397

# Subagent Hardening Design

**Status:** Draft  
**Goal:** Robust background execution, async updates, audit logging, user confirmation.

Based on Anthropic's [Effective Harnesses for Long-Running Agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents).

## 1. Background Execution with Async Updates

### 1.1 SubagentHandle

Replace synchronous `runSubagent` with async spawn returning a handle:

```haskell
-- | Handle to a running subagent for status queries and control
data SubagentHandle = SubagentHandle
  { handleId :: SubagentId              -- Unique ID (UUID)
  , handleAsync :: Async SubagentResult -- async thread handle  
  , handleStartTime :: UTCTime
  , handleConfig :: SubagentConfig
  , handleStatus :: TVar SubagentRunStatus
  , handleEvents :: TQueue SubagentEvent  -- Event stream
  }

-- | Runtime status of a subagent (queryable)
data SubagentRunStatus = SubagentRunStatus
  { runIteration :: Int
  , runTokensUsed :: Int
  , runCostCents :: Double
  , runElapsedSeconds :: Int
  , runCurrentActivity :: Text  -- e.g. "Reading https://..."
  , runLastToolCall :: Maybe (Text, UTCTime)  -- (tool_name, timestamp)
  }

-- | Subagent lifecycle events for logging/streaming
data SubagentEvent
  = SubagentStarted SubagentId SubagentConfig UTCTime
  | SubagentActivity SubagentId Text UTCTime
  | SubagentToolCall SubagentId Text Aeson.Value UTCTime
  | SubagentToolResult SubagentId Text Bool Text UTCTime
  | SubagentThinking SubagentId Text UTCTime  -- Extended thinking
  | SubagentCost SubagentId Int Double UTCTime  -- tokens, cents
  | SubagentCompleted SubagentId SubagentResult UTCTime
  | SubagentError SubagentId Text UTCTime
  deriving (Show, Eq, Generic)
```

### 1.2 New API

```haskell
-- | Spawn subagent in background, return handle immediately
spawnSubagentAsync :: SubagentApiKeys -> SubagentConfig -> IO SubagentHandle

-- | Query current status (non-blocking)
querySubagentStatus :: SubagentHandle -> IO SubagentRunStatus

-- | Check if complete (non-blocking)
isSubagentDone :: SubagentHandle -> IO Bool

-- | Wait for completion (blocking)
waitSubagent :: SubagentHandle -> IO SubagentResult

-- | Cancel a running subagent
cancelSubagent :: SubagentHandle -> IO ()

-- | Read all events so far (for logging/UI)
drainSubagentEvents :: SubagentHandle -> IO [SubagentEvent]
```

### 1.3 Ava Integration

Ava's orchestrator loop can now:
1. Spawn subagents in background
2. Continue conversation with user
3. Periodically poll for updates: `"🔍 WebCrawler running (45s, 12k tokens)..."`
4. Receive completion and synthesize result

```haskell
-- In Ava's message handler:
handle <- spawnSubagentAsync keys config

-- Non-blocking check in conversation loop:
status <- querySubagentStatus handle
when (runElapsedSeconds status > 30) $
  sendMessage chat $ "⏳ Subagent still working: " <> runCurrentActivity status

-- When user asks for status:
status <- querySubagentStatus handle
sendMessage chat $ formatSubagentStatus status

-- On completion:
result <- waitSubagent handle
sendMessage chat $ "✅ " <> subagentSummary result
```

## 2. User Confirmation Before Spawning

### 2.1 Confirmation Flow

Before spawning any subagent or long-running process, Ava must:

```
User: Research competitors for podcast transcription

Ava: I'll spawn a WebCrawler subagent to research this. Estimated:
     • Time: ~5-10 minutes
     • Cost: up to $0.50
     • Tools: web_search, read_webpages
     
     Proceed? [Yes/No]

User: Yes

Ava: 🚀 Spawning WebCrawler subagent...
     🔍 [WebCrawler] Starting research...
```

### 2.2 Implementation

```haskell
data SpawnRequest = SpawnRequest
  { spawnConfig :: SubagentConfig
  , spawnEstimatedTime :: (Int, Int)  -- (min, max) minutes
  , spawnEstimatedCost :: Double      -- max cents
  , spawnRationale :: Text            -- why we need this
  }

-- | Generate confirmation message for user
formatSpawnConfirmation :: SpawnRequest -> Text

-- | Parse user confirmation response
data ConfirmationResponse = Confirmed | Rejected | Modified SubagentConfig

parseConfirmation :: Text -> ConfirmationResponse
```

### 2.3 Tool Modification

The `spawn_subagent` tool becomes a two-phase operation:

1. **Phase 1 (propose):** Returns confirmation request, doesn't spawn
2. **Phase 2 (confirm):** User confirms, actually spawns

Alternative: Add `confirm_spawn` as separate tool that takes a pending spawn ID.

## 3. Audit Logging System

### 3.1 Log Storage

All agent activity persisted to append-only JSONL files under `AVA_DATA_ROOT/logs/`:

```
$AVA_DATA_ROOT/logs/           # e.g. /home/ava/logs/ or _/var/ava/logs/
├── ava/
│   ├── 2024-01-15.jsonl       # Daily Ava conversation logs
│   └── 2024-01-16.jsonl
└── subagents/
    ├── S-7f3a2b.jsonl         # Per-subagent trace (named by SubagentId)
    └── S-9e4c1d.jsonl
```

### 3.2 SubagentId Linking

Each subagent gets a unique `SubagentId` (short UUID prefix) that links:
- The `SubagentResult` returned to Ava
- The JSONL log file (`S-{id}.jsonl`)
- References in Ava's daily log

```haskell
-- | Unique identifier for a subagent run
newtype SubagentId = SubagentId { unSubagentId :: Text }
  deriving (Show, Eq, Generic, Aeson.ToJSON, Aeson.FromJSON)

-- | Generate a new subagent ID (first 6 chars of UUID)
newSubagentId :: IO SubagentId
newSubagentId = SubagentId . Text.take 6 . UUID.toText <$> UUID.nextRandom

-- | Path to subagent's log file
subagentLogPath :: SubagentId -> FilePath
subagentLogPath (SubagentId sid) = 
  avaDataRoot </> "logs" </> "subagents" </> Text.unpack sid <> ".jsonl"
```

The `SubagentResult` includes the ID for cross-referencing:

```haskell
data SubagentResult = SubagentResult
  { subagentId :: SubagentId        -- NEW: links to S-{id}.jsonl
  , subagentOutput :: Aeson.Value
  , subagentSummary :: Text
  , ...
  }
```

### 3.3 Log Entry Schema

```haskell
data AuditLogEntry = AuditLogEntry
  { logTimestamp :: UTCTime
  , logSessionId :: SessionId      -- Conversation session
  , logAgentId :: AgentId          -- Ava or subagent ID
  , logUserId :: Maybe UserId      -- Human user (Telegram, etc.)
  , logEventType :: AuditEventType
  , logContent :: Aeson.Value
  , logMetadata :: LogMetadata
  }

data AuditEventType
  = UserMessage           -- Incoming user message
  | AssistantMessage      -- Ava response
  | ToolCall              -- Tool invocation
  | ToolResult            -- Tool response
  | SubagentSpawn         -- Subagent created
  | SubagentComplete      -- Subagent finished
  | ExtendedThinking      -- Thinking block content
  | CostUpdate            -- Token/cost tracking
  | ErrorOccurred         -- Any error
  | SessionStart          -- New conversation
  | SessionEnd            -- Conversation ended
  deriving (Show, Eq, Generic)

data LogMetadata = LogMetadata
  { metaInputTokens :: Maybe Int
  , metaOutputTokens :: Maybe Int
  , metaCostCents :: Maybe Double
  , metaModelId :: Maybe Text
  , metaParentAgentId :: Maybe AgentId  -- For subagents
  , metaDuration :: Maybe Int           -- Milliseconds
  }
```

### 3.4 Logging Interface

```haskell
-- | Append entry to audit log
writeAuditLog :: AuditLogEntry -> IO ()

-- | Query logs by various criteria
data LogQuery = LogQuery
  { queryAgentId :: Maybe AgentId
  , queryUserId :: Maybe UserId
  , queryTimeRange :: Maybe (UTCTime, UTCTime)
  , queryEventTypes :: Maybe [AuditEventType]
  , querySessionId :: Maybe SessionId
  , queryLimit :: Int
  }

queryAuditLogs :: LogQuery -> IO [AuditLogEntry]

-- | Get recent logs for debugging
getRecentLogs :: AgentId -> Int -> IO [AuditLogEntry]

-- | Search logs by content
searchLogs :: Text -> IO [AuditLogEntry]
```

### 3.5 Tools for Querying Logs

**For Ben (CLI):**

```bash
# View recent Ava logs
ava logs --last 100

# View specific subagent trace by ID
ava logs S-7f3a2b

# Search for errors
ava logs --type error --since "1 hour ago"

# Follow live logs
ava logs -f

# Quick lookup with standard tools
tail -f $AVA_DATA_ROOT/logs/ava/$(date +%Y-%m-%d).jsonl
jq 'select(.eventType == "Error")' $AVA_DATA_ROOT/logs/ava/*.jsonl
cat $AVA_DATA_ROOT/logs/subagents/S-7f3a2b.jsonl | jq .
```

**For Ava (Agent Tool):**

```haskell
-- | Tool for Ava to query her own logs
readAvaLogsTool :: Engine.Tool
readAvaLogsTool = Engine.Tool
  { toolName = "read_ava_logs"
  , toolDescription = 
      "Read Ava's audit logs or subagent traces. "
      <> "Use to diagnose issues, review past conversations, or inspect subagent runs."
  , toolJsonSchema = ...
  , toolExecute = executeReadLogs
  }

-- Parameters:
-- { "subagent_id": "S-7f3a2b" }           -- Read specific subagent trace
-- { "last_n": 50 }                         -- Last N entries from today's log
-- { "search": "error", "since": "1h" }     -- Search with time filter
```

This allows Ava to self-diagnose: "Let me check my logs for that subagent run..."

### 3.6 Automatic Logging Hook

Integrate into Engine callbacks so logging is automatic:

```haskell
auditingEngineConfig :: SessionId -> AgentId -> UserId -> EngineConfig
auditingEngineConfig session agent user = EngineConfig
  { engineOnActivity = \txt -> writeAuditLog $ mkActivityEntry session agent txt
  , engineOnToolCall = \name args -> writeAuditLog $ mkToolCallEntry session agent name args
  , engineOnToolResult = \name success output -> writeAuditLog $ mkToolResultEntry session agent name success output
  , engineOnCost = \tokens cents -> writeAuditLog $ mkCostEntry session agent tokens cents
  , engineOnError = \err -> writeAuditLog $ mkErrorEntry session agent err
  , ...
  }
```

## 4. Subagent Thinking Logs

Capture extended thinking for debugging:

```haskell
-- In Engine, when extended thinking is enabled:
onThinkingBlock :: Text -> IO ()
onThinkingBlock content = do
  ts <- getCurrentTime
  writeAuditLog $ AuditLogEntry
    { logEventType = ExtendedThinking
    , logContent = object ["thinking" .= content]
    , ...
    }
```

## 5. Implementation Plan

### Phase 1: Audit Logging (Foundation)
1. Create `Omni/Agent/AuditLog.hs` with types and writers
2. Integrate into Engine callbacks
3. Add CLI commands: `jr agent logs`
4. Migrate existing status logging to audit system

### Phase 2: Async Subagent Execution
1. Create `SubagentHandle` and `SubagentRunStatus`
2. Implement `spawnSubagentAsync`, `querySubagentStatus`
3. Add event queue for real-time updates
4. Update Ava integration for background polling

### Phase 3: User Confirmation
1. Add confirmation prompt generation
2. Implement two-phase spawn flow
3. Update Telegram handler for confirmation UX
4. Add timeout for pending confirmations

### Phase 4: CLI & Diagnostics
1. Full `jr agent logs` implementation with queries
2. Live log streaming (`-f` flag)
3. Subagent dashboard in status output
4. Health checks and metrics

## 6. Example Session with All Features

```
[14:05:22] User (ben): Research podcast transcription pricing

[14:05:23] Ava → User: I'll spawn a WebCrawler subagent to research competitor pricing.
         Estimated: 5-10 min, up to $0.50
         Proceed? [Yes/No]

[14:05:28] User (ben): yes

[14:05:29] Ava → User: 🚀 Spawning WebCrawler subagent (S-7f3a2b)...
[14:05:29] [AUDIT] SubagentSpawn S-7f3a2b role=WebCrawler user=ben session=sess-123

[14:05:30] [AUDIT/S-7f3a2b] ToolCall web_search {"query": "podcast transcription pricing 2024"}
[14:05:32] [AUDIT/S-7f3a2b] ToolResult web_search success=true

[14:06:00] Ava → User: ⏳ Research in progress (30s, reading otter.ai/pricing...)

[14:07:45] [AUDIT/S-7f3a2b] SubagentComplete status=success cost=$0.24 tokens=45000

[14:07:46] Ava → User: ✅ Research complete! Found 5 competitors...
         [structured findings with citations]

# Later debugging:
$ jr agent logs S-7f3a2b
[14:05:30] ToolCall web_search {"query": "podcast transcription pricing 2024"}
[14:05:32] ToolResult web_search (success, 5 results)
[14:05:35] Thinking: "Looking at search results, otter.ai and descript appear most relevant..."
[14:05:40] ToolCall read_webpages {"urls": ["https://otter.ai/pricing"]}
...
```

## 7. References

- Anthropic: [Effective Harnesses for Long-Running Agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
- Current: `Omni/Agent/Subagent.hs`, `Omni/Agent/Event.hs`
- Async: `Control.Concurrent.Async`, `Control.Concurrent.STM`