From e2ea8308d74582d5651ed933dea9428ce8982d25 Mon Sep 17 00:00:00 2001
From: Ben Sima <ben@bensima.com>
Date: Wed, 17 Dec 2025 22:05:40 -0500
Subject: feat(ava): subagent hardening with audit logging

Based on Anthropic's effective harnesses research.

New modules:
- Omni/Agent/AuditLog.hs: JSONL audit logging with SubagentId linking
- Omni/Agent/Tools/AvaLogs.hs: Tool for Ava to query her own logs
- Omni/Agent/Subagent/HARDENING.md: Design documentation

Key features:
- SubagentHandle with TVar status for async execution and polling
- spawnSubagentAsync, querySubagentStatus, waitSubagent, cancelSubagent
- User confirmation: spawn_subagent requires confirmed=true after approval
- Audit logs stored in $AVA_DATA_ROOT/logs/{ava,subagents}/
- CLI: ava logs [--last=N] [<subagent_id>]
- read_ava_logs tool for Ava self-diagnosis

Tasks: t-267, t-268, t-269, t-270, t-271
---
 Omni/Agent/Subagent/HARDENING.md | 397 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 397 insertions(+)
 create mode 100644 Omni/Agent/Subagent/HARDENING.md

(limited to 'Omni/Agent/Subagent')

diff --git a/Omni/Agent/Subagent/HARDENING.md b/Omni/Agent/Subagent/HARDENING.md
new file mode 100644
index 0000000..2368fd2
--- /dev/null
+++ b/Omni/Agent/Subagent/HARDENING.md
@@ -0,0 +1,397 @@
+# Subagent Hardening Design
+
+**Status:** Draft  
+**Goal:** Robust background execution, async updates, audit logging, user confirmation.
+
+Based on Anthropic's [Effective Harnesses for Long-Running Agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents).
+
+## 1. Background Execution with Async Updates
+
+### 1.1 SubagentHandle
+
+Replace synchronous `runSubagent` with async spawn returning a handle:
+
+```haskell
+-- | Handle to a running subagent for status queries and control
+data SubagentHandle = SubagentHandle
+  { handleId :: SubagentId              -- Unique ID (UUID)
+  , handleAsync :: Async SubagentResult -- async thread handle  
+  , handleStartTime :: UTCTime
+  , handleConfig :: SubagentConfig
+  , handleStatus :: TVar SubagentRunStatus
+  , handleEvents :: TQueue SubagentEvent  -- Event stream
+  }
+
+-- | Runtime status of a subagent (queryable)
+data SubagentRunStatus = SubagentRunStatus
+  { runIteration :: Int
+  , runTokensUsed :: Int
+  , runCostCents :: Double
+  , runElapsedSeconds :: Int
+  , runCurrentActivity :: Text  -- e.g. "Reading https://..."
+  , runLastToolCall :: Maybe (Text, UTCTime)  -- (tool_name, timestamp)
+  }
+
+-- | Subagent lifecycle events for logging/streaming
+data SubagentEvent
+  = SubagentStarted SubagentId SubagentConfig UTCTime
+  | SubagentActivity SubagentId Text UTCTime
+  | SubagentToolCall SubagentId Text Aeson.Value UTCTime
+  | SubagentToolResult SubagentId Text Bool Text UTCTime
+  | SubagentThinking SubagentId Text UTCTime  -- Extended thinking
+  | SubagentCost SubagentId Int Double UTCTime  -- tokens, cents
+  | SubagentCompleted SubagentId SubagentResult UTCTime
+  | SubagentError SubagentId Text UTCTime
+  deriving (Show, Eq, Generic)
+```
+
+### 1.2 New API
+
+```haskell
+-- | Spawn subagent in background, return handle immediately
+spawnSubagentAsync :: SubagentApiKeys -> SubagentConfig -> IO SubagentHandle
+
+-- | Query current status (non-blocking)
+querySubagentStatus :: SubagentHandle -> IO SubagentRunStatus
+
+-- | Check if complete (non-blocking)
+isSubagentDone :: SubagentHandle -> IO Bool
+
+-- | Wait for completion (blocking)
+waitSubagent :: SubagentHandle -> IO SubagentResult
+
+-- | Cancel a running subagent
+cancelSubagent :: SubagentHandle -> IO ()
+
+-- | Read all events so far (for logging/UI)
+drainSubagentEvents :: SubagentHandle -> IO [SubagentEvent]
+```
+
+### 1.3 Ava Integration
+
+Ava's orchestrator loop can now:
+1. Spawn subagents in background
+2. Continue conversation with user
+3. Periodically poll for updates: `"🔍 WebCrawler running (45s, 12k tokens)..."`
+4. Receive completion and synthesize result
+
+```haskell
+-- In Ava's message handler:
+handle <- spawnSubagentAsync keys config
+
+-- Non-blocking check in conversation loop:
+status <- querySubagentStatus handle
+when (runElapsedSeconds status > 30) $
+  sendMessage chat $ "⏳ Subagent still working: " <> runCurrentActivity status
+
+-- When user asks for status:
+status <- querySubagentStatus handle
+sendMessage chat $ formatSubagentStatus status
+
+-- On completion:
+result <- waitSubagent handle
+sendMessage chat $ "✅ " <> subagentSummary result
+```
+
+## 2. User Confirmation Before Spawning
+
+### 2.1 Confirmation Flow
+
+Before spawning any subagent or long-running process, Ava must:
+
+```
+User: Research competitors for podcast transcription
+
+Ava: I'll spawn a WebCrawler subagent to research this. Estimated:
+     • Time: ~5-10 minutes
+     • Cost: up to $0.50
+     • Tools: web_search, read_webpages
+     
+     Proceed? [Yes/No]
+
+User: Yes
+
+Ava: 🚀 Spawning WebCrawler subagent...
+     🔍 [WebCrawler] Starting research...
+```
+
+### 2.2 Implementation
+
+```haskell
+data SpawnRequest = SpawnRequest
+  { spawnConfig :: SubagentConfig
+  , spawnEstimatedTime :: (Int, Int)  -- (min, max) minutes
+  , spawnEstimatedCost :: Double      -- max cents
+  , spawnRationale :: Text            -- why we need this
+  }
+
+-- | Generate confirmation message for user
+formatSpawnConfirmation :: SpawnRequest -> Text
+
+-- | Parse user confirmation response
+data ConfirmationResponse = Confirmed | Rejected | Modified SubagentConfig
+
+parseConfirmation :: Text -> ConfirmationResponse
+```
+
+### 2.3 Tool Modification
+
+The `spawn_subagent` tool becomes a two-phase operation:
+
+1. **Phase 1 (propose):** Returns confirmation request, doesn't spawn
+2. **Phase 2 (confirm):** User confirms, actually spawns
+
+Alternative: Add `confirm_spawn` as separate tool that takes a pending spawn ID.
+
+## 3. Audit Logging System
+
+### 3.1 Log Storage
+
+All agent activity persisted to append-only JSONL files under `AVA_DATA_ROOT/logs/`:
+
+```
+$AVA_DATA_ROOT/logs/           # e.g. /home/ava/logs/ or _/var/ava/logs/
+├── ava/
+│   ├── 2024-01-15.jsonl       # Daily Ava conversation logs
+│   └── 2024-01-16.jsonl
+└── subagents/
+    ├── S-7f3a2b.jsonl         # Per-subagent trace (named by SubagentId)
+    └── S-9e4c1d.jsonl
+```
+
+### 3.2 SubagentId Linking
+
+Each subagent gets a unique `SubagentId` (short UUID prefix) that links:
+- The `SubagentResult` returned to Ava
+- The JSONL log file (`S-{id}.jsonl`)
+- References in Ava's daily log
+
+```haskell
+-- | Unique identifier for a subagent run
+newtype SubagentId = SubagentId { unSubagentId :: Text }
+  deriving (Show, Eq, Generic, Aeson.ToJSON, Aeson.FromJSON)
+
+-- | Generate a new subagent ID (first 6 chars of UUID)
+newSubagentId :: IO SubagentId
+newSubagentId = SubagentId . Text.take 6 . UUID.toText <$> UUID.nextRandom
+
+-- | Path to subagent's log file
+subagentLogPath :: SubagentId -> FilePath
+subagentLogPath (SubagentId sid) = 
+  avaDataRoot </> "logs" </> "subagents" </> Text.unpack sid <> ".jsonl"
+```
+
+The `SubagentResult` includes the ID for cross-referencing:
+
+```haskell
+data SubagentResult = SubagentResult
+  { subagentId :: SubagentId        -- NEW: links to S-{id}.jsonl
+  , subagentOutput :: Aeson.Value
+  , subagentSummary :: Text
+  , ...
+  }
+```
+
+### 3.3 Log Entry Schema
+
+```haskell
+data AuditLogEntry = AuditLogEntry
+  { logTimestamp :: UTCTime
+  , logSessionId :: SessionId      -- Conversation session
+  , logAgentId :: AgentId          -- Ava or subagent ID
+  , logUserId :: Maybe UserId      -- Human user (Telegram, etc.)
+  , logEventType :: AuditEventType
+  , logContent :: Aeson.Value
+  , logMetadata :: LogMetadata
+  }
+
+data AuditEventType
+  = UserMessage           -- Incoming user message
+  | AssistantMessage      -- Ava response
+  | ToolCall              -- Tool invocation
+  | ToolResult            -- Tool response
+  | SubagentSpawn         -- Subagent created
+  | SubagentComplete      -- Subagent finished
+  | ExtendedThinking      -- Thinking block content
+  | CostUpdate            -- Token/cost tracking
+  | ErrorOccurred         -- Any error
+  | SessionStart          -- New conversation
+  | SessionEnd            -- Conversation ended
+  deriving (Show, Eq, Generic)
+
+data LogMetadata = LogMetadata
+  { metaInputTokens :: Maybe Int
+  , metaOutputTokens :: Maybe Int
+  , metaCostCents :: Maybe Double
+  , metaModelId :: Maybe Text
+  , metaParentAgentId :: Maybe AgentId  -- For subagents
+  , metaDuration :: Maybe Int           -- Milliseconds
+  }
+```
+
+### 3.4 Logging Interface
+
+```haskell
+-- | Append entry to audit log
+writeAuditLog :: AuditLogEntry -> IO ()
+
+-- | Query logs by various criteria
+data LogQuery = LogQuery
+  { queryAgentId :: Maybe AgentId
+  , queryUserId :: Maybe UserId
+  , queryTimeRange :: Maybe (UTCTime, UTCTime)
+  , queryEventTypes :: Maybe [AuditEventType]
+  , querySessionId :: Maybe SessionId
+  , queryLimit :: Int
+  }
+
+queryAuditLogs :: LogQuery -> IO [AuditLogEntry]
+
+-- | Get recent logs for debugging
+getRecentLogs :: AgentId -> Int -> IO [AuditLogEntry]
+
+-- | Search logs by content
+searchLogs :: Text -> IO [AuditLogEntry]
+```
+
+### 3.5 Tools for Querying Logs
+
+**For Ben (CLI):**
+
+```bash
+# View recent Ava logs
+ava logs --last 100
+
+# View specific subagent trace by ID
+ava logs S-7f3a2b
+
+# Search for errors
+ava logs --type error --since "1 hour ago"
+
+# Follow live logs
+ava logs -f
+
+# Quick lookup with standard tools
+tail -f $AVA_DATA_ROOT/logs/ava/$(date +%Y-%m-%d).jsonl
+jq 'select(.eventType == "Error")' $AVA_DATA_ROOT/logs/ava/*.jsonl
+cat $AVA_DATA_ROOT/logs/subagents/S-7f3a2b.jsonl | jq .
+```
+
+**For Ava (Agent Tool):**
+
+```haskell
+-- | Tool for Ava to query her own logs
+readAvaLogsTool :: Engine.Tool
+readAvaLogsTool = Engine.Tool
+  { toolName = "read_ava_logs"
+  , toolDescription = 
+      "Read Ava's audit logs or subagent traces. "
+      <> "Use to diagnose issues, review past conversations, or inspect subagent runs."
+  , toolJsonSchema = ...
+  , toolExecute = executeReadLogs
+  }
+
+-- Parameters:
+-- { "subagent_id": "S-7f3a2b" }           -- Read specific subagent trace
+-- { "last_n": 50 }                         -- Last N entries from today's log
+-- { "search": "error", "since": "1h" }     -- Search with time filter
+```
+
+This allows Ava to self-diagnose: "Let me check my logs for that subagent run..."
+
+### 3.6 Automatic Logging Hook
+
+Integrate into Engine callbacks so logging is automatic:
+
+```haskell
+auditingEngineConfig :: SessionId -> AgentId -> UserId -> EngineConfig
+auditingEngineConfig session agent user = EngineConfig
+  { engineOnActivity = \txt -> writeAuditLog $ mkActivityEntry session agent txt
+  , engineOnToolCall = \name args -> writeAuditLog $ mkToolCallEntry session agent name args
+  , engineOnToolResult = \name success output -> writeAuditLog $ mkToolResultEntry session agent name success output
+  , engineOnCost = \tokens cents -> writeAuditLog $ mkCostEntry session agent tokens cents
+  , engineOnError = \err -> writeAuditLog $ mkErrorEntry session agent err
+  , ...
+  }
+```
+
+## 4. Subagent Thinking Logs
+
+Capture extended thinking for debugging:
+
+```haskell
+-- In Engine, when extended thinking is enabled:
+onThinkingBlock :: Text -> IO ()
+onThinkingBlock content = do
+  ts <- getCurrentTime
+  writeAuditLog $ AuditLogEntry
+    { logEventType = ExtendedThinking
+    , logContent = object ["thinking" .= content]
+    , ...
+    }
+```
+
+## 5. Implementation Plan
+
+### Phase 1: Audit Logging (Foundation)
+1. Create `Omni/Agent/AuditLog.hs` with types and writers
+2. Integrate into Engine callbacks
+3. Add CLI commands: `jr agent logs`
+4. Migrate existing status logging to audit system
+
+### Phase 2: Async Subagent Execution
+1. Create `SubagentHandle` and `SubagentRunStatus`
+2. Implement `spawnSubagentAsync`, `querySubagentStatus`
+3. Add event queue for real-time updates
+4. Update Ava integration for background polling
+
+### Phase 3: User Confirmation
+1. Add confirmation prompt generation
+2. Implement two-phase spawn flow
+3. Update Telegram handler for confirmation UX
+4. Add timeout for pending confirmations
+
+### Phase 4: CLI & Diagnostics
+1. Full `jr agent logs` implementation with queries
+2. Live log streaming (`-f` flag)
+3. Subagent dashboard in status output
+4. Health checks and metrics
+
+## 6. Example Session with All Features
+
+```
+[14:05:22] User (ben): Research podcast transcription pricing
+
+[14:05:23] Ava → User: I'll spawn a WebCrawler subagent to research competitor pricing.
+         Estimated: 5-10 min, up to $0.50
+         Proceed? [Yes/No]
+
+[14:05:28] User (ben): yes
+
+[14:05:29] Ava → User: 🚀 Spawning WebCrawler subagent (S-7f3a2b)...
+[14:05:29] [AUDIT] SubagentSpawn S-7f3a2b role=WebCrawler user=ben session=sess-123
+
+[14:05:30] [AUDIT/S-7f3a2b] ToolCall web_search {"query": "podcast transcription pricing 2024"}
+[14:05:32] [AUDIT/S-7f3a2b] ToolResult web_search success=true
+
+[14:06:00] Ava → User: ⏳ Research in progress (30s, reading otter.ai/pricing...)
+
+[14:07:45] [AUDIT/S-7f3a2b] SubagentComplete status=success cost=$0.24 tokens=45000
+
+[14:07:46] Ava → User: ✅ Research complete! Found 5 competitors...
+         [structured findings with citations]
+
+# Later debugging:
+$ jr agent logs S-7f3a2b
+[14:05:30] ToolCall web_search {"query": "podcast transcription pricing 2024"}
+[14:05:32] ToolResult web_search (success, 5 results)
+[14:05:35] Thinking: "Looking at search results, otter.ai and descript appear most relevant..."
+[14:05:40] ToolCall read_webpages {"urls": ["https://otter.ai/pricing"]}
+...
+```
+
+## 7. References
+
+- Anthropic: [Effective Harnesses for Long-Running Agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
+- Current: `Omni/Agent/Subagent.hs`, `Omni/Agent/Event.hs`
+- Async: `Control.Concurrent.Async`, `Control.Concurrent.STM`
-- 
cgit v1.2.3