summaryrefslogtreecommitdiff
path: root/Omni/Agent/Engine.hs
AgeCommit message (Collapse)Author
6 daysfix: accumulate streaming tool call arguments across SSE chunksBen Sima
OpenAI's SSE streaming sends tool calls incrementally - the first chunk has the id and function name, subsequent chunks contain argument fragments. Previously each chunk was treated as a complete tool call, causing invalid JSON arguments. - Add ToolCallDelta type with index for partial tool call data - Add StreamToolCallDelta chunk type - Track tool calls by index in IntMap accumulator - Merge argument fragments across chunks via mergeToolCallDelta - Build final ToolCall objects from accumulator when stream ends - Handle new StreamToolCallDelta in Engine.hs pattern match
6 daysfix: correct cost estimation formulasBen Sima
- Update to Dec 2024 OpenRouter pricing - Use blended input/output rates - Add gemini-flash, claude-sonnet-4.5 specific rates - Fix math: was off by ~30x for Claude models
7 daysfix: prompt for text response when agent returns empty after tool callsBen Sima
When the LLM returned empty content after executing tools, the agent would complete with an empty message. Now both agent loops (LLM-based and Provider-based) detect this case and inject a prompt asking the LLM to provide a response to the user.
8 dayst-247: Add Provider abstraction for multi-backend LLM supportBen Sima
- Create Omni/Agent/Provider.hs with unified Provider interface - Support OpenRouter (cloud), Ollama (local), Amp (subprocess stub) - Add runAgentWithProvider to Engine.hs for Provider-based execution - Add EngineType to Core.hs (EngineOpenRouter, EngineOllama, EngineAmp) - Add --engine flag to 'jr work' command - Worker.hs dispatches to appropriate provider based on engine type Usage: jr work <task-id> # OpenRouter (default) jr work <task-id> --engine=ollama # Local Ollama jr work <task-id> --engine=amp # Amp CLI (stub)
2025-12-01Add guardrail for repeated edit_file failuresBen Sima
Tracks 'old_str not found' errors from edit_file tool calls. After 5 consecutive failures, stops the agent to prevent burning tokens on impossible edits. This catches the pattern where the agent repeatedly tries to edit a large file with incorrect old_str matches, which was the root cause of t-222 exceeding its cost budget. Task-Id: t-224
2025-12-01Add guardrails and progress tracking to Jr agentBen Sima
Implement runtime guardrails in Engine.hs: - Cost budget limit (default 200 cents) - Token budget limit (default 1M tokens) - Duplicate tool call detection (same tool called N times) - Test failure counting (bild --test failures) Add database-backed progress tracking: - Checkpoint events stored in agent_events table - Progress summary retrieved on retry attempts - Improved prompts emphasizing efficiency and autonomous operation Worker.hs improvements: - Uses guardrails configuration - Reports guardrail violations via callbacks - Better prompt structure for autonomous operation Task-Id: t-203
2025-12-01Fix cost reporting - parse actual cost from OpenRouter API responseBen Sima
Perfect! All tests pass for the affected modules. Now let me verify the I've successfully implemented the fix for cost reporting as specified in - Added `usageCost :: Maybe Double` field to the `Usage` data type - Updated `FromJSON` instance to parse the optional `cost` field from th - Modified `ChatCompletionRequest` ToJSON instance to include `"usage": - This enables OpenRouter to return actual cost information in the respo - Updated the `runAgent` loop to use actual cost from the API response w - Falls back to `estimateCost` when actual cost is not provided - Converts from dollars to cents (multiplies by 100) since OpenRouter re - The `engineOnCost` callback already uses `Double` for cost (not `Int`) - The `estimateCost` function already returns `Double`, avoiding integer - The `AgentResult` type already uses `Double` for `resultTotalCost` All tests pass successfully: - ✅ `Omni/Agent/Engine.hs` - All 14 tests pass, including new tests for - ✅ `Omni/Agent/Worker.hs` - Builds successfully - ✅ `Omni/Agent.hs` - All combined tests pass - ✅ All files pass lint checks (ormolu + hlint) The implementation correctly addresses all points in the task descriptio 1. ✅ Parses actual cost from OpenRouter API response 2. ✅ Enables usage accounting in requests 3. ✅ Uses Double for cost to avoid rounding issues 4. ✅ Falls back to estimation when actual cost is unavailable The previous error with `bild --test .` was due to `.` not being a valid Task-Id: t-197.8
2025-12-01Fix cost reporting - parse actual cost from OpenRouter API responseBen Sima
I have successfully completed task t-197.8 to fix cost reporting by pars **Omni/Agent/Engine.hs:** 1. Added `usageCost :: Maybe Double` field to the `Usage` type to captur 2. Updated `FromJSON` instance to parse the optional `"cost"` field 3. Modified `ChatCompletionRequest` ToJSON instance to include `"usage": 4. Changed cost types from `Int` to `Double` throughout (engineOnCost ca 5. Updated `estimateCost` to use floating-point division instead of inte 6. Modified `runAgent` to use actual cost from API when available, conve 7. Added new test case for parsing usage with cost field **Omni/Agent/Worker.hs:** 1. Updated `runWithEngine` signature to return `Double` for cost 2. Changed `totalCostRef` from `IORef Int` to `IORef Double` 3. Added rounding when storing cost in DB metrics to maintain backward c ✅ **All tests pass:** - Omni/Agent/Engine.hs - 16 unit tests pass - Omni/Agent/Worker.hs - Builds successfully - Omni/Agent.hs - All integration tests pass - Omni/Jr.hs - All 12 tests pass ✅ **All lint checks pass:** - No hlint issues - No ormolu formatting issues The implementation correctly handles OpenRouter's cost format (credits w Task-Id: t-197.8
2025-11-30Extract facts from completed tasks after review acceptanceBen Sima
Perfect! Let me verify the complete implementation checklist against the ✅ **1. In Jr.hs, after accepting a task in review, call fact extraction: - Line 424: `extractFacts tid commitSha` - called in `autoReview` aft - Line 504: `extractFacts tid commitSha` - called in `interactiveRevi ✅ **2. Add extractFacts function:** - Lines 585-600: Implemented with correct signature `extractFacts :: - Gets diff using `git show --stat` - Loads task context - Calls LLM CLI tool with `-s` flag - Handles success/failure cases ✅ **3. Add buildFactExtractionPrompt function:** - Lines 603-620: Implemented with correct signature - Includes task ID, title, description - Includes diff summary - Provides clear instructions for fact extraction - Includes example format ✅ **4. Add parseFacts function:** - Lines 623-627: Implemented with correct signature - Filters lines starting with "FACT: " - Calls `addFactFromLine` for each fact ✅ **5. Add addFactFromLine function:** - Lines 630-636: Implemented with correct signature - Removes "FACT: " prefix - Parses file list from brackets - Calls `Fact.createFact` with project="Omni", confidence=0.7, source - Prints confirmation message ✅ **6. Add parseFiles helper function:** - Lines 639-649: Implemented to parse `[file1, file2, ...]` format ✅ **7. Import for Omni.Fact module:** - Line 22: `import qualified Omni.Fact as Fact` already present ✅ **8. Workflow integration:** - Current: work -> review -> accept -> **fact extraction** -> done ✅ - Fact extraction happens AFTER status update to Done - Fact extraction happens BEFORE epic completion check The implementation is **complete and correct**. All functionality descri 1. ✅ Facts are extracted after task review acceptance (both auto and man 2. ✅ LLM is called with proper context (task info + diff) 3. ✅ Facts are parsed and stored with correct metadata (source_task, con 4. ✅ All tests pass (`bild --test Omni/Agent.hs`) 5. ✅ No linting errors (`lint Omni/Jr.hs`) The feature is ready for use and testing. When a task is completed and a 1. The LLM will be prompted to extract facts 2. Any facts learned will be added to the knowledge base 3. Each fact will have `source_task` set to the task ID 4. Facts can be viewed with `jr facts list` Task-Id: t-185
2025-11-30Add agent observability: event logging and storageBen Sima
- Add Omni/Agent/Event.hs with AgentEvent types - Add agent_events table schema and CRUD functions to Core.hs - Add new callbacks to Engine.hs: onAssistant, onToolResult, onComplete, onError - Wire event logging into Worker.hs with session tracking Events are now persisted to SQLite for each agent work session, enabling visibility into agent reasoning and tool usage. Task-Id: t-197.1 Task-Id: t-197.2 Task-Id: t-197.3
2025-11-30Fix jr loop: update model IDs and dev shellBen Sima
- Update OpenRouter model IDs to Claude 4.5 family: - anthropic/claude-sonnet-4.5 (default) - anthropic/claude-haiku-4.5 (simple tasks) - anthropic/claude-opus-4.5 (complex tasks) - Remove aider-chat from dev shell (broken, unused) - Simplify llm package (remove llm-ollama plugin) - Update nixos-unstable for llm 0.27.1 Task-Id: t-163
2025-11-30Audit and verify Engine testing coverageBen Sima
All tests pass and lint is clean. Let me verify the final test coverage **Engine.hs Test Coverage (13 tests):** - ✅ Tool JSON roundtrip - ✅ Message JSON roundtrip - ✅ ToolCall JSON roundtrip (NEW) - ✅ FunctionCall JSON roundtrip (NEW) - ✅ Role JSON roundtrip for all roles (NEW) - ✅ defaultLLM endpoint & headers - ✅ defaultAgentConfig defaults - ✅ defaultEngineConfig callbacks - ✅ buildToolMap correctness - ✅ Usage JSON parsing - ✅ AgentResult JSON roundtrip - ✅ estimateCost calculation **Tools.hs Test Coverage (19 tests):** - ✅ All 5 tool schemas are valid objects - ✅ allTools contains 5 tools - ✅ ReadFileArgs parsing - ✅ WriteFileArgs parsing - ✅ EditFileArgs parsing - ✅ RunBashArgs parsing - ✅ SearchCodebaseArgs parsing - ✅ ToolResult success/failure JSON roundtrip - ✅ readFileTool handles missing files (NEW) - ✅ editFileTool handles no-match case (NEW) - ✅ runBashTool captures exit codes (NEW) - ✅ runBashTool captures stdout (NEW) - ✅ searchCodebaseTool returns structured results (NEW) All unit tests from the checklist are now covered. The integration and m Task-Id: t-141.7
2025-11-30Define Tool protocol and LLM provider abstractionBen Sima
The implementation is complete. Here's a summary of the changes made: 1. **Updated LLM type** to include `llmExtraHeaders` field for OpenRoute 2. **Changed `defaultLLM`** to use: - OpenRouter base URL: `https://openrouter.ai/api/v1` - Default model: `anthropic/claude-sonnet-4-20250514` - OpenRouter headers: `HTTP-Referer` and `X-Title` 3. **Updated `chatWithUsage`** to apply extra headers to HTTP requests 4. **Added `case-insensitive` dependency** for proper header handling 5. **Added tests** for OpenRouter configuration 6. **Fixed hlint suggestions** (Use `</` instead of `<$>`, eta reduce) Task-Id: t-141.1
2025-11-29Implement agent loop with tool executionBen Sima
The implementation is complete. Here's what was implemented: **Types Added:** - `EngineConfig`: Contains LLM provider config and callbacks (`engineOnC - `AgentResult`: Results of running an agent (finalMessage, toolCallCoun - `Usage`: Token usage from API responses - `ChatResult`: Internal type for chat results with usage **Functions Added:** - `runAgent :: EngineConfig -> AgentConfig -> Text -> IO (Either Text Ag - `buildToolMap` - Creates a lookup map from tool list - `executeToolCalls` - Executes tool calls and returns tool messages - `estimateCost` / `estimateTotalCost` - Cost estimation helpers - `chatWithUsage` - Chat that returns usage stats - `defaultEngineConfig` - Default no-op engine configuration **Loop Logic:** 1. Sends messages to LLM via `chatWithUsage` 2. If response has tool_calls, executes each tool via `executeToolCalls` 3. Appends tool results as ToolRole messages 4. Repeats until no tool_calls or maxIterations reached 5. Tracks cost/tokens and calls callbacks at appropriate points Task-Id: t-141.2
2025-11-29Define Tool protocol and LLM provider abstractionBen Sima
The implementation is complete. I created [Omni/Agent/Engine.hs](file:// - **Types**: `Tool`, `LLM`, `AgentConfig`, `Message`, `Role`, `ToolCall` - **Functions**: `chat` for OpenAI-compatible HTTP via http-conduit, `de - **Tests**: JSON roundtrip for Tool, Message; validation of defaults All lints pass (hlint + ormolu) and tests pass. Task-Id: t-141.1