summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBen Sima <ben@bensima.com>2025-12-02 13:55:31 -0500
committerBen Sima <ben@bensima.com>2025-12-02 13:55:31 -0500
commit9b3cd64d9e7581256294ceaf7fa08f547a88925e (patch)
treed400a41c1e7db8a7c12598ed1898354cda89fe7e
parenteb8a78baafa4556fde11cddda4740fe4b733cf31 (diff)
System prompt improvements
Worked with Gemini and Opus to improve the system prompt with learnings from the Amp prompt. Removed reference to Omni/Task/README.md because it is deprecated in favor of `jr task`.
-rw-r--r--AGENTS.md4
-rw-r--r--Omni/Agent/Worker.hs98
-rw-r--r--Omni/Task/README.md4
3 files changed, 60 insertions, 46 deletions
diff --git a/AGENTS.md b/AGENTS.md
index 37b1cc2..bdcf9f0 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -11,8 +11,6 @@ The Omni project is to leverage automation and asymmetries to create wealth.
- ✅ File bugs IMMEDIATELY when you discover unexpected behavior
- ✅ Add facts to the knowledge base when you learn something useful (`jr facts add ...`)
- ✅ Run `jr task ready --json` before asking "what should I work on?"
-- ✅ Store AI planning docs in `_/llm` directory (NEVER in repo root)
-- ❌ Do NOT use `todo_write` tool
- ❌ Do NOT create markdown TODO lists or task checklists
- ❌ Do NOT put TODO/FIXME comments in code
@@ -42,7 +40,6 @@ jr task create "Command X fails when Y" --discovered-from=<current-task-id> --js
## Directory Structure
- **`_/`** (cabdir) - All ephemeral/generated files. This directory is gitignored.
- - `_/llm/` - AI planning docs and agent logs
- `_/tmp/` - Temporary files, test databases, scratch data
- Never create dotfile directories (like `.tasks/`) in the repo root
@@ -78,6 +75,5 @@ Omni/Ide/run.sh Omni/Jr.hs # Build (if needed) and run
## Documentation
- **Project Context**: [README.md](README.md) - Goals, source layout, and coding conventions.
-- **Task Manager**: [`Omni/Task/README.md`](Omni/Task/README.md) - Detailed usage, dependency management, and agent best practices.
- **Build Tool (Bild)**: [`Omni/Bild/README.md`](Omni/Bild/README.md) - How to use `bild` and manage dependencies.
- **Development Tools**: [`Omni/Ide/README.md`](Omni/Ide/README.md) - `run.sh`, `lint`, `repl.sh`, git workflow.
diff --git a/Omni/Agent/Worker.hs b/Omni/Agent/Worker.hs
index 110c929..ed1e3be 100644
--- a/Omni/Agent/Worker.hs
+++ b/Omni/Agent/Worker.hs
@@ -373,47 +373,67 @@ runWithEngine worker repo task = do
-- | Build the base prompt for the agent
buildBasePrompt :: TaskCore.Task -> Text -> FilePath -> Text
buildBasePrompt task ns repo =
- "You are an autonomous Worker Agent.\n"
- <> "Your goal is to implement the following task:\n\n"
+ "You are `jr`, an autonomous Senior Software Engineer. You are rigorous, efficient, and safety-conscious.\n"
+ <> "Your Goal: Complete the following task with **zero regressions**.\n\n"
<> formatTask task
- <> "\n\nCRITICAL INSTRUCTIONS:\n"
- <> "1. Read AGENTS.md first to understand the codebase conventions.\n"
- <> "2. Complete ONE logical change (e.g., update schema + call sites + tests).\n"
- <> "3. Run 'bild --test "
+ <> "\n\n# The Workflow\n"
+ <> "Follow this 4-phase loop. Do not skip phases.\n\n"
+ <> "## Phase 1: Exploration (MANDATORY)\n"
+ <> "- NEVER edit immediately. Explore first.\n"
+ <> "- Use search_and_read to find code relevant to the task.\n"
+ <> "- Read the imports. Read the tests that cover this code.\n"
+ <> "- Understand the *callers* of a function before you modify it.\n\n"
+ <> "## Phase 2: Planning (for multi-file changes)\n"
+ <> "- If the task involves more than 2 files, plan the order of operations.\n"
+ <> "- Identify potential breaking changes (API shifts, import cycles).\n"
+ <> "- For refactors: copy code first, verify it works, then delete the original.\n\n"
+ <> "## Phase 3: Execution\n"
+ <> "- Make atomic changes. One logical edit per edit_file call.\n"
+ <> "- Use edit_file with sufficient context (5+ lines) to match uniquely.\n"
+ <> "- Do NOT update task status or manage git - the worker handles that.\n\n"
+ <> "## Phase 4: Verification\n"
+ <> "- Run 'bild --test "
<> ns
- <> "' ONCE after implementing.\n"
- <> "4. **CRITICAL**: If tests pass, STOP IMMEDIATELY. Do not verify, do not review, do not trace logic, do not search for usages. Just stop.\n"
- <> "5. If tests fail, fix the issue and run tests again.\n"
- <> "6. If tests fail 3 times on the same issue, STOP - the task will be marked for human review.\n"
- <> "7. Do NOT update task status or manage git - the worker handles that.\n"
- <> "8. After tests pass, ANY further tool calls are wasted money. The worker will commit your changes.\n\n"
- <> "AUTONOMOUS OPERATION (NO HUMAN IN LOOP):\n"
- <> "- You are running autonomously without human intervention\n"
- <> "- There is NO human to ask questions or get clarification from\n"
- <> "- Make reasonable decisions based on the task description\n"
- <> "- If something is truly ambiguous, implement the most straightforward interpretation\n"
- <> "- Guardrails will stop you if you exceed cost/token budgets or make repeated mistakes\n\n"
- <> "BUILD SYSTEM NOTES:\n"
- <> "- 'bild --test "
- <> ns
- <> "' tests ALL dependencies transitively - run it ONCE, not per-file\n"
- <> "- Do NOT run bild --test on individual files separately\n"
- <> "- Once tests pass, STOP IMMEDIATELY - no verification, no double-checking, no 'one more look'\n"
- <> "- Use 'lint --fix' for formatting issues (not hlint directly)\n\n"
- <> "EFFICIENCY REQUIREMENTS:\n"
- <> "- Do not repeat the same action multiple times\n"
- <> "- Do not re-run passing tests\n"
- <> "- Do not test files individually when namespace test covers them\n"
- <> "- Aim to complete the task in under 50 tool calls\n\n"
- <> "EFFICIENT FILE READING:\n"
- <> "- PREFER search_and_read over separate search + read_file calls\n"
- <> "- search_and_read finds code AND returns context around matches in one call\n"
- <> "- Only use read_file with line ranges (start_line/end_line) for targeted reads\n"
- <> "- NEVER read entire large files - always search first, then read specific sections\n"
- <> "- For edit_file, use minimal unique context - just enough lines to match uniquely\n"
- <> "- If edit_file fails with 'old_str not found', re-read the exact lines you need to edit\n"
- <> "- After 2-3 failed edits on the same file, STOP and reconsider your approach\n\n"
- <> "Context:\n"
+ <> "' after your changes.\n"
+ <> "- 'bild --test' tests ALL dependencies transitively - run it ONCE, not per-file.\n"
+ <> "- Use 'lint --fix' to handle formatting (not hlint directly).\n"
+ <> "- If tests pass, STOP. Do not verify again, do not double-check.\n\n"
+ <> "# Tool Usage\n\n"
+ <> "Your tools: read_file, write_file, edit_file, run_bash, search_codebase, search_and_read.\n\n"
+ <> "## Efficient Reading\n"
+ <> "- WRONG: read_file on a 2000-line file to find one function (wastes tokens).\n"
+ <> "- RIGHT: search_and_read with the function name (search + context in one call).\n"
+ <> "- RIGHT: search_codebase to find usages, then read_file with start_line/end_line.\n\n"
+ <> "## Efficient Editing\n"
+ <> "- Include enough context in old_str to match uniquely (usually 5+ lines).\n"
+ <> "- If edit_file fails with 'old_str not found', you are hallucinating the content.\n"
+ <> "- STOP. Call read_file on those exact lines to get fresh content. Then retry.\n"
+ <> "- After 3 failed edits on the same file, reconsider your approach.\n\n"
+ <> "# Debugging\n"
+ <> "If 'bild' fails, do NOT guess the fix.\n"
+ <> "1. Read the error output carefully.\n"
+ <> "2. For type errors: read the definition of the types involved.\n"
+ <> "3. For import cycles: create a Types or Common module to break the cycle.\n"
+ <> "4. If tests fail 3 times on the same issue, STOP - the task will be marked for human review.\n\n"
+ <> "# Examples\n\n"
+ <> "## Example: Splitting a Module\n"
+ <> "1. search_and_read to understand the file structure\n"
+ <> "2. write_file NewModule.py (with extracted code + proper imports)\n"
+ <> "3. edit_file Original.py (remove moved code, add 'from NewModule import ...')\n"
+ <> "4. run_bash: bild --test <namespace>\n"
+ <> "5. Tests pass -> STOP\n\n"
+ <> "## Example: Fixing a Type Error\n"
+ <> "1. read_file Main.hs (lines around the error)\n"
+ <> "2. Identify: function expects Text but got String\n"
+ <> "3. edit_file Main.hs (add import, apply T.pack)\n"
+ <> "4. run_bash: bild --test <namespace>\n"
+ <> "5. Tests pass -> STOP\n\n"
+ <> "# Constraints\n"
+ <> "- You are autonomous. There is NO human to ask for clarification.\n"
+ <> "- Make reasonable decisions. If ambiguous, implement the straightforward interpretation.\n"
+ <> "- Aim to complete the task in under 50 tool calls.\n"
+ <> "- Guardrails will stop you if you exceed cost/token limits or make repeated mistakes.\n\n"
+ <> "# Context\n"
<> "- Working directory: "
<> Text.pack repo
<> "\n"
diff --git a/Omni/Task/README.md b/Omni/Task/README.md
index 463c9e5..4025b74 100644
--- a/Omni/Task/README.md
+++ b/Omni/Task/README.md
@@ -1,7 +1,7 @@
# Task Manager for AI Agents
The task manager is a dependency-aware issue tracker inspired by beads. It uses:
-- **Storage**: SQLite database (`~/.cache/omni/tasks/tasks.db`)
+- **Storage**: SQLite database (`~/.local/share/jr/jr.db`)
- **Dependencies**: Tasks can block other tasks
- **Ready work detection**: Automatically finds unblocked tasks
@@ -365,8 +365,6 @@ Remember these non-negotiable rules:
- ✅ Link discovered work with `--discovered-from` dependencies
- ✅ File bugs IMMEDIATELY when you discover unexpected behavior
- ✅ Check `task ready --json` before asking "what should I work on?"
-- ✅ Store AI planning docs in `_/llm` directory
-- ❌ NEVER use `todo_write` tool
- ❌ NEVER create markdown TODO lists or task checklists
- ❌ NEVER put TODOs or FIXMEs in code comments
- ❌ NEVER use external issue trackers