From e2ea8308d74582d5651ed933dea9428ce8982d25 Mon Sep 17 00:00:00 2001 From: Ben Sima Date: Wed, 17 Dec 2025 22:05:40 -0500 Subject: feat(ava): subagent hardening with audit logging Based on Anthropic's effective harnesses research. New modules: - Omni/Agent/AuditLog.hs: JSONL audit logging with SubagentId linking - Omni/Agent/Tools/AvaLogs.hs: Tool for Ava to query her own logs - Omni/Agent/Subagent/HARDENING.md: Design documentation Key features: - SubagentHandle with TVar status for async execution and polling - spawnSubagentAsync, querySubagentStatus, waitSubagent, cancelSubagent - User confirmation: spawn_subagent requires confirmed=true after approval - Audit logs stored in $AVA_DATA_ROOT/logs/{ava,subagents}/ - CLI: ava logs [--last=N] [] - read_ava_logs tool for Ava self-diagnosis Tasks: t-267, t-268, t-269, t-270, t-271 --- Omni/Agent/Subagent/HARDENING.md | 397 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 397 insertions(+) create mode 100644 Omni/Agent/Subagent/HARDENING.md (limited to 'Omni/Agent/Subagent') diff --git a/Omni/Agent/Subagent/HARDENING.md b/Omni/Agent/Subagent/HARDENING.md new file mode 100644 index 0000000..2368fd2 --- /dev/null +++ b/Omni/Agent/Subagent/HARDENING.md @@ -0,0 +1,397 @@ +# Subagent Hardening Design + +**Status:** Draft +**Goal:** Robust background execution, async updates, audit logging, user confirmation. + +Based on Anthropic's [Effective Harnesses for Long-Running Agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents). + +## 1. Background Execution with Async Updates + +### 1.1 SubagentHandle + +Replace synchronous `runSubagent` with async spawn returning a handle: + +```haskell +-- | Handle to a running subagent for status queries and control +data SubagentHandle = SubagentHandle + { handleId :: SubagentId -- Unique ID (UUID) + , handleAsync :: Async SubagentResult -- async thread handle + , handleStartTime :: UTCTime + , handleConfig :: SubagentConfig + , handleStatus :: TVar SubagentRunStatus + , handleEvents :: TQueue SubagentEvent -- Event stream + } + +-- | Runtime status of a subagent (queryable) +data SubagentRunStatus = SubagentRunStatus + { runIteration :: Int + , runTokensUsed :: Int + , runCostCents :: Double + , runElapsedSeconds :: Int + , runCurrentActivity :: Text -- e.g. "Reading https://..." + , runLastToolCall :: Maybe (Text, UTCTime) -- (tool_name, timestamp) + } + +-- | Subagent lifecycle events for logging/streaming +data SubagentEvent + = SubagentStarted SubagentId SubagentConfig UTCTime + | SubagentActivity SubagentId Text UTCTime + | SubagentToolCall SubagentId Text Aeson.Value UTCTime + | SubagentToolResult SubagentId Text Bool Text UTCTime + | SubagentThinking SubagentId Text UTCTime -- Extended thinking + | SubagentCost SubagentId Int Double UTCTime -- tokens, cents + | SubagentCompleted SubagentId SubagentResult UTCTime + | SubagentError SubagentId Text UTCTime + deriving (Show, Eq, Generic) +``` + +### 1.2 New API + +```haskell +-- | Spawn subagent in background, return handle immediately +spawnSubagentAsync :: SubagentApiKeys -> SubagentConfig -> IO SubagentHandle + +-- | Query current status (non-blocking) +querySubagentStatus :: SubagentHandle -> IO SubagentRunStatus + +-- | Check if complete (non-blocking) +isSubagentDone :: SubagentHandle -> IO Bool + +-- | Wait for completion (blocking) +waitSubagent :: SubagentHandle -> IO SubagentResult + +-- | Cancel a running subagent +cancelSubagent :: SubagentHandle -> IO () + +-- | Read all events so far (for logging/UI) +drainSubagentEvents :: SubagentHandle -> IO [SubagentEvent] +``` + +### 1.3 Ava Integration + +Ava's orchestrator loop can now: +1. Spawn subagents in background +2. Continue conversation with user +3. Periodically poll for updates: `"🔍 WebCrawler running (45s, 12k tokens)..."` +4. Receive completion and synthesize result + +```haskell +-- In Ava's message handler: +handle <- spawnSubagentAsync keys config + +-- Non-blocking check in conversation loop: +status <- querySubagentStatus handle +when (runElapsedSeconds status > 30) $ + sendMessage chat $ "⏳ Subagent still working: " <> runCurrentActivity status + +-- When user asks for status: +status <- querySubagentStatus handle +sendMessage chat $ formatSubagentStatus status + +-- On completion: +result <- waitSubagent handle +sendMessage chat $ "✅ " <> subagentSummary result +``` + +## 2. User Confirmation Before Spawning + +### 2.1 Confirmation Flow + +Before spawning any subagent or long-running process, Ava must: + +``` +User: Research competitors for podcast transcription + +Ava: I'll spawn a WebCrawler subagent to research this. Estimated: + • Time: ~5-10 minutes + • Cost: up to $0.50 + • Tools: web_search, read_webpages + + Proceed? [Yes/No] + +User: Yes + +Ava: 🚀 Spawning WebCrawler subagent... + 🔍 [WebCrawler] Starting research... +``` + +### 2.2 Implementation + +```haskell +data SpawnRequest = SpawnRequest + { spawnConfig :: SubagentConfig + , spawnEstimatedTime :: (Int, Int) -- (min, max) minutes + , spawnEstimatedCost :: Double -- max cents + , spawnRationale :: Text -- why we need this + } + +-- | Generate confirmation message for user +formatSpawnConfirmation :: SpawnRequest -> Text + +-- | Parse user confirmation response +data ConfirmationResponse = Confirmed | Rejected | Modified SubagentConfig + +parseConfirmation :: Text -> ConfirmationResponse +``` + +### 2.3 Tool Modification + +The `spawn_subagent` tool becomes a two-phase operation: + +1. **Phase 1 (propose):** Returns confirmation request, doesn't spawn +2. **Phase 2 (confirm):** User confirms, actually spawns + +Alternative: Add `confirm_spawn` as separate tool that takes a pending spawn ID. + +## 3. Audit Logging System + +### 3.1 Log Storage + +All agent activity persisted to append-only JSONL files under `AVA_DATA_ROOT/logs/`: + +``` +$AVA_DATA_ROOT/logs/ # e.g. /home/ava/logs/ or _/var/ava/logs/ +├── ava/ +│ ├── 2024-01-15.jsonl # Daily Ava conversation logs +│ └── 2024-01-16.jsonl +└── subagents/ + ├── S-7f3a2b.jsonl # Per-subagent trace (named by SubagentId) + └── S-9e4c1d.jsonl +``` + +### 3.2 SubagentId Linking + +Each subagent gets a unique `SubagentId` (short UUID prefix) that links: +- The `SubagentResult` returned to Ava +- The JSONL log file (`S-{id}.jsonl`) +- References in Ava's daily log + +```haskell +-- | Unique identifier for a subagent run +newtype SubagentId = SubagentId { unSubagentId :: Text } + deriving (Show, Eq, Generic, Aeson.ToJSON, Aeson.FromJSON) + +-- | Generate a new subagent ID (first 6 chars of UUID) +newSubagentId :: IO SubagentId +newSubagentId = SubagentId . Text.take 6 . UUID.toText <$> UUID.nextRandom + +-- | Path to subagent's log file +subagentLogPath :: SubagentId -> FilePath +subagentLogPath (SubagentId sid) = + avaDataRoot "logs" "subagents" Text.unpack sid <> ".jsonl" +``` + +The `SubagentResult` includes the ID for cross-referencing: + +```haskell +data SubagentResult = SubagentResult + { subagentId :: SubagentId -- NEW: links to S-{id}.jsonl + , subagentOutput :: Aeson.Value + , subagentSummary :: Text + , ... + } +``` + +### 3.3 Log Entry Schema + +```haskell +data AuditLogEntry = AuditLogEntry + { logTimestamp :: UTCTime + , logSessionId :: SessionId -- Conversation session + , logAgentId :: AgentId -- Ava or subagent ID + , logUserId :: Maybe UserId -- Human user (Telegram, etc.) + , logEventType :: AuditEventType + , logContent :: Aeson.Value + , logMetadata :: LogMetadata + } + +data AuditEventType + = UserMessage -- Incoming user message + | AssistantMessage -- Ava response + | ToolCall -- Tool invocation + | ToolResult -- Tool response + | SubagentSpawn -- Subagent created + | SubagentComplete -- Subagent finished + | ExtendedThinking -- Thinking block content + | CostUpdate -- Token/cost tracking + | ErrorOccurred -- Any error + | SessionStart -- New conversation + | SessionEnd -- Conversation ended + deriving (Show, Eq, Generic) + +data LogMetadata = LogMetadata + { metaInputTokens :: Maybe Int + , metaOutputTokens :: Maybe Int + , metaCostCents :: Maybe Double + , metaModelId :: Maybe Text + , metaParentAgentId :: Maybe AgentId -- For subagents + , metaDuration :: Maybe Int -- Milliseconds + } +``` + +### 3.4 Logging Interface + +```haskell +-- | Append entry to audit log +writeAuditLog :: AuditLogEntry -> IO () + +-- | Query logs by various criteria +data LogQuery = LogQuery + { queryAgentId :: Maybe AgentId + , queryUserId :: Maybe UserId + , queryTimeRange :: Maybe (UTCTime, UTCTime) + , queryEventTypes :: Maybe [AuditEventType] + , querySessionId :: Maybe SessionId + , queryLimit :: Int + } + +queryAuditLogs :: LogQuery -> IO [AuditLogEntry] + +-- | Get recent logs for debugging +getRecentLogs :: AgentId -> Int -> IO [AuditLogEntry] + +-- | Search logs by content +searchLogs :: Text -> IO [AuditLogEntry] +``` + +### 3.5 Tools for Querying Logs + +**For Ben (CLI):** + +```bash +# View recent Ava logs +ava logs --last 100 + +# View specific subagent trace by ID +ava logs S-7f3a2b + +# Search for errors +ava logs --type error --since "1 hour ago" + +# Follow live logs +ava logs -f + +# Quick lookup with standard tools +tail -f $AVA_DATA_ROOT/logs/ava/$(date +%Y-%m-%d).jsonl +jq 'select(.eventType == "Error")' $AVA_DATA_ROOT/logs/ava/*.jsonl +cat $AVA_DATA_ROOT/logs/subagents/S-7f3a2b.jsonl | jq . +``` + +**For Ava (Agent Tool):** + +```haskell +-- | Tool for Ava to query her own logs +readAvaLogsTool :: Engine.Tool +readAvaLogsTool = Engine.Tool + { toolName = "read_ava_logs" + , toolDescription = + "Read Ava's audit logs or subagent traces. " + <> "Use to diagnose issues, review past conversations, or inspect subagent runs." + , toolJsonSchema = ... + , toolExecute = executeReadLogs + } + +-- Parameters: +-- { "subagent_id": "S-7f3a2b" } -- Read specific subagent trace +-- { "last_n": 50 } -- Last N entries from today's log +-- { "search": "error", "since": "1h" } -- Search with time filter +``` + +This allows Ava to self-diagnose: "Let me check my logs for that subagent run..." + +### 3.6 Automatic Logging Hook + +Integrate into Engine callbacks so logging is automatic: + +```haskell +auditingEngineConfig :: SessionId -> AgentId -> UserId -> EngineConfig +auditingEngineConfig session agent user = EngineConfig + { engineOnActivity = \txt -> writeAuditLog $ mkActivityEntry session agent txt + , engineOnToolCall = \name args -> writeAuditLog $ mkToolCallEntry session agent name args + , engineOnToolResult = \name success output -> writeAuditLog $ mkToolResultEntry session agent name success output + , engineOnCost = \tokens cents -> writeAuditLog $ mkCostEntry session agent tokens cents + , engineOnError = \err -> writeAuditLog $ mkErrorEntry session agent err + , ... + } +``` + +## 4. Subagent Thinking Logs + +Capture extended thinking for debugging: + +```haskell +-- In Engine, when extended thinking is enabled: +onThinkingBlock :: Text -> IO () +onThinkingBlock content = do + ts <- getCurrentTime + writeAuditLog $ AuditLogEntry + { logEventType = ExtendedThinking + , logContent = object ["thinking" .= content] + , ... + } +``` + +## 5. Implementation Plan + +### Phase 1: Audit Logging (Foundation) +1. Create `Omni/Agent/AuditLog.hs` with types and writers +2. Integrate into Engine callbacks +3. Add CLI commands: `jr agent logs` +4. Migrate existing status logging to audit system + +### Phase 2: Async Subagent Execution +1. Create `SubagentHandle` and `SubagentRunStatus` +2. Implement `spawnSubagentAsync`, `querySubagentStatus` +3. Add event queue for real-time updates +4. Update Ava integration for background polling + +### Phase 3: User Confirmation +1. Add confirmation prompt generation +2. Implement two-phase spawn flow +3. Update Telegram handler for confirmation UX +4. Add timeout for pending confirmations + +### Phase 4: CLI & Diagnostics +1. Full `jr agent logs` implementation with queries +2. Live log streaming (`-f` flag) +3. Subagent dashboard in status output +4. Health checks and metrics + +## 6. Example Session with All Features + +``` +[14:05:22] User (ben): Research podcast transcription pricing + +[14:05:23] Ava → User: I'll spawn a WebCrawler subagent to research competitor pricing. + Estimated: 5-10 min, up to $0.50 + Proceed? [Yes/No] + +[14:05:28] User (ben): yes + +[14:05:29] Ava → User: 🚀 Spawning WebCrawler subagent (S-7f3a2b)... +[14:05:29] [AUDIT] SubagentSpawn S-7f3a2b role=WebCrawler user=ben session=sess-123 + +[14:05:30] [AUDIT/S-7f3a2b] ToolCall web_search {"query": "podcast transcription pricing 2024"} +[14:05:32] [AUDIT/S-7f3a2b] ToolResult web_search success=true + +[14:06:00] Ava → User: ⏳ Research in progress (30s, reading otter.ai/pricing...) + +[14:07:45] [AUDIT/S-7f3a2b] SubagentComplete status=success cost=$0.24 tokens=45000 + +[14:07:46] Ava → User: ✅ Research complete! Found 5 competitors... + [structured findings with citations] + +# Later debugging: +$ jr agent logs S-7f3a2b +[14:05:30] ToolCall web_search {"query": "podcast transcription pricing 2024"} +[14:05:32] ToolResult web_search (success, 5 results) +[14:05:35] Thinking: "Looking at search results, otter.ai and descript appear most relevant..." +[14:05:40] ToolCall read_webpages {"urls": ["https://otter.ai/pricing"]} +... +``` + +## 7. References + +- Anthropic: [Effective Harnesses for Long-Running Agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents) +- Current: `Omni/Agent/Subagent.hs`, `Omni/Agent/Event.hs` +- Async: `Control.Concurrent.Async`, `Control.Concurrent.STM` -- cgit v1.2.3