Stop Prompting, Start Looping: Give Your Coding Agent an Exit Condition

Q: How does the Ralph loop work, and how is this different?

Ralph, in Geoffrey Huntley's words, is 'in its purest form ... a Bash loop' that re-runs an agent continuously, which he frankly calls 'deterministically bad in an undeterministic world.' A closed loop keeps the repetition but adds the two things Ralph omits: an external feedback gate and an exit condition with a cap, so the loop stops on a checkable result rather than running until you intervene.

A coding agent drifts, over-runs, or quits early because a single long prompt has no stopping rule. The durable fix is not a longer prompt — it is a closed loop: an explicit trigger, an external feedback gate, and an exit condition. Write the loop once as plain text, and you own a reusable pattern instead of re-explaining yourself every session.

This is the difference between asking and engineering. A prompt says what you want. A loop says what you want, how to check it, and when to stop — and it pins that check to something outside the model's own opinion. Geoffrey Huntley's "Ralph" technique made the naive version famous: "In its purest form, Ralph is a Bash loop."¹ The refinement this post describes is what Ralph deliberately lacks — a gate and an exit. The pattern is tool-agnostic across Cursor, Claude Code, Codex, Gemini CLI, and OpenCode, and the recently popular catalog loops! (by elorm) shows it shipping as copy-pasteable text.²

Why a longer prompt fails where a loop succeeds

A longer prompt fails because it front-loads everything at once and then runs unchecked. A loop succeeds because it does the opposite: a small unit of work, a check against an external signal, then a decision to continue or stop. The prompt is open-ended; the loop has a shape with a beginning, a verifier, and an end.

The cost of the open-ended version is measurable. Huntley, describing an unbounded agent loop, observes that "the quality of output clips at the 147k-152k mark" even against an advertised 200k-token context window.¹ An agent with no iteration cap keeps going as its context fills and its judgment degrades. The famous, honest verdict on the naive loop is his own: "That's the beauty of Ralph — the technique is deterministically bad in an undeterministic world."¹

A loop is reliable for the same reason a unit test is: the decision to stop is made by a command, not a mood. Each pass produces output that a check either accepts or rejects. The agent does not "feel done." It either passes the gate or runs again.

The anatomy of a closed loop: trigger, gate, exit

A closed loop has exactly three load-bearing parts. The trigger is the kickoff instruction that tells the agent to self-pace rather than sprint. The feedback gate is an external command run between iterations whose output decides continuation. The exit condition is the precise, checkable state that ends the loop — plus a hard iteration cap so it can never run forever.

The catalog loops! states the shape in one line: "Copy closed-loop workflows for coding agents. Each loop includes triggers, feedback gates, and exit conditions so agents self-pace until the job is done."² Its universal kickoff instruction is the trigger, verbatim:

"Self-pace this loop. After each iteration, run the check command, read the output, and only continue if the exit condition is not met. Stop when the exit condition passes or max iterations is reached. Give a short status update each pass."²

flowchart TD
  A[Trigger:<br/>self-pace<br/>kickoff] --> B[Do one<br/>scoped unit<br/>of work]
  B --> C{Gate:<br/>run check<br/>command}
  C -->|exit 0 /<br/>passes| D[Exit:<br/>stop, report]
  C -->|fails| E{Hit max<br/>iterations?}
  E -->|no| B
  E -->|yes| F[Stop:<br/>report<br/>unmet]

Figure: The closed-loop shape. A trigger starts a self-paced run; the agent does one scoped unit of work, then a feedback-gate command decides the next step. If the gate passes, the loop exits and reports. If it fails, the loop repeats — unless the iteration cap is reached, in which case it stops and reports the unmet exit condition. The decision is made by command output, never by the agent's self-report.

Notice what carries the decision: a command's exit status, not the agent's narration. That single design choice is what separates a loop you can trust from a prompt you have to babysit.

The feedback gate must be external, not self-judged

The feedback gate must be an external signal — a shell exit code, a CI status, an HTTP probe — and never the model's own claim that it finished. This is the most important rule in the entire pattern, and it is the one a naive loop gets wrong. An agent grading its own homework is the failure mode, not the fix.

The academic case is direct. In ReVeal: Self-Evolving Code Agents via Reliable Self-Verification, Sun et al. find that "existing methods rely solely on outcome rewards, without explicitly optimizing verification ... leading to unreliable self-verification and limited test-time scaling."³ Their framework instead "enables the model to use self-constructed tests and tool feedback to continuously evolve code for 20+ turns on LiveCodeBench despite training on only three."³ Tool feedback is the operative phrase — the verification that scales is the kind grounded in an external check.

This is exactly why every loop in the loops! catalog pins its exit to a command rather than a sentiment. The Independent Verifier Pass loop states the rule plainly: "No self-reporting — only command output counts."² A loop is only as honest as the check it trusts.

A worked loop you can copy: Ship PR Until Green

The most-copied loop in the catalog is Ship PR Until Green, which automates the branch-test-push-CI cycle until every check passes. As of 2026-06-20 it has been copied roughly 1,266 times, with the six featured loops totaling thousands of copies.² Here is its full anatomy, enriched with the why the source leaves out.

The catalog describes it as: "Implement on a branch, run tests, push, open a PR, wait for CI, and loop until checks pass and the PR is ready to merge."²

# Ship PR Until Green — a closed loop for a coding agent
# Originator: elorm (loops.elorm.xyz). Counts ~as of 2026-06-20; they drift.

TRIGGER (kickoff — paste this first):
  Self-pace this loop. After each iteration, run the check command, read the
  output, and only continue if the exit condition is not met. Stop when the
  exit condition passes or max iterations is reached. Give a short status
  update each pass.

STEPS (one scoped unit of work per pass, in order):
  1. Make the scoped change and run local tests.
  2. Commit with a clear message and push the branch.
  3. Open a PR with summary and test plan, or update the existing PR.
  4. If CI fails, read logs, fix once locally, push, and re-wait.

FEEDBACK GATE (run between iterations — the decision is its output):
  gh pr checks

EXIT CONDITION:
  Exit when: all PR checks are success.

ITERATION CAP:
  Max iterations: 10   # never let it run forever

# WHY each piece exists (the part the catalog leaves implicit):
#  - The gate is `gh pr checks`, not "the agent thinks CI is green" — an
#    external status, per ReVeal: tool feedback, not self-judgment.
#  - The cap (10) bounds context growth; unbounded loops degrade as context
#    fills (Huntley: quality clips ~147k-152k tokens).
#  - "Fix once locally, push, and re-wait" forces one change per pass, so a
#    failing gate maps to a specific edit instead of a guess-storm.

Keep the footnote markers out of the block above — the citations live in the prose around it. The same shape generalizes; the gate command and exit condition simply change per job:

Loop (originator: elorm)	Feedback gate (verbatim)	Exit condition (verbatim)	Cap
Ship PR Until Green	`gh pr checks`	"all PR checks are success"	10
Independent Verifier Pass	`npm run build && npm run lint && npm test`	"all verifier commands exit 0"	8
Deploy Verification Loop	`curl -fsS <your-health-url>`	"every configured endpoint succeeds"	8

Each row is the same skeleton — trigger, an external gate command, a precise exit, a cap — proving the pattern is the asset, not any one loop.² The Independent Verifier Pass is the artifact form of a two-agent check; we have written separately about the-verifier-agent-two-agent-verification-loop as the dedicated second-checker pattern.

How the loops map across coding agents

The same loop runs across multiple agents because the gate is a shell command, not a vendor feature. The loops! catalog declares per-loop compatibility with Cursor, Claude Code, Codex, Gemini CLI, and OpenCode, and where a loop needs a hook it names the real mechanism on each platform.² The ecosystem is the surface the loops run on — not a contest between them.

Loop	Compatible agents (verbatim)	Hook mechanism named (verbatim)
Pre-Commit Guard	cursor, claude-code	"Uses `beforeShellExecution` (Cursor) and `PreToolUse` (Claude Code) to run tests before git commits."
Post-Edit Test Guard	cursor, claude-code	"Uses the `afterFileEdit` hook (Cursor) and `PostToolUse` hook (Claude Code) to run related tests after edits."
CI Failure Watcher	cursor, claude-code	"Poll CI every 5 minutes ... Pair with: /loop 5m "

Because the loop is text and the gate is a command, the same artifact is portable. A loop that works in Claude Code today works in Codex tomorrow with no rewrite — the difference between owning a method and renting a feature. This is also the contrast with a fully autonomous, gateless loop like ralphy-autonomous-ai-coding-loop: the closed loop adds the exit condition the infinite loop omits by design.

Common mistakes that break a loop

Most broken loops fail in one of four ways, and each maps to a missing or weak part of the anatomy. Knowing the failure mode is how you read a loop that misbehaves and find the one line to fix. A loop is a small machine; when it lies, the check is lying.

Premature exit. The agent declares success before the gate passes. Defense: the explicit "No self-reporting — only command output counts" rule, and an exit pinned to exit 0, not narration.²
Runaway loop. No iteration cap, so the agent burns tokens and degrades as context fills.¹ Defense: a hard Max iterations value on every loop.
Flaky gate. A non-deterministic test makes the exit condition lie — sometimes pass, sometimes fail, on identical code. Defense: stabilize the gate before you trust it; a loop inherits the honesty of its check.
Silent gate. A command that fails quietly lets a broken build "pass." Defense: make the gate fail loudly — set -euo pipefail, curl -fsS, a non-zero exit on any error.

These are honest limits, not edge cases. A closed loop is a developer workflow pattern, not a guarantee about any specific tool's reliability — it is only as trustworthy as the external check you wire into the gate.

Why a loop belongs in a file you own

A loop is most useful as a plain-text, version-controlled artifact rather than something retyped each session. Write it as Markdown, commit it next to your code, copy it between tools, and diff it when it changes. The loop you tuned last month is an asset only if you can find it, read it, and move it.

This is the same reason your AI's custom instructions belong in a file you own, and your notes are better as files you control than a box inside one vendor's app. The thing you can move is the thing you own.

That portability is the quiet thesis of the whole pattern. A prompt evaporates with the chat window. A loop saved as text outlives the session, the tool, and the vendor.

Frequently asked questions

What is an agentic loop?

An agentic loop is a coding-agent workflow that acts, checks the result against an external signal, and repeats until an exit condition is met — instead of running a one-shot prompt. It has three parts: a trigger that starts a self-paced run, a feedback gate (a command whose output decides continuation), and an exit condition with an iteration cap.

Why does my coding agent loop forever?

It loops forever because it is missing at least one of three guards: a hard iteration cap, a real exit condition, or an external gate that can actually fail. An unbounded loop also degrades as context fills — Huntley observes output quality clipping around the 147k-152k-token mark.¹ Add a Max iterations value and pin the exit to a command's exit code.

How do I stop a coding agent from saying it's done when it isn't?

Pin the stopping decision to an external check — a test suite, a CI status, an HTTP probe — never the model's self-report. The loops! Independent Verifier Pass states the rule directly: "No self-reporting — only command output counts."² ReVeal's research supports it: reliable scaling comes from tool feedback, not unverified self-judgment.³

How does the Ralph loop work, and how is this different?

Ralph, in Geoffrey Huntley's words, is "in its purest form ... a Bash loop" that re-runs an agent continuously, which he frankly calls "deterministically bad in an undeterministic world."¹ A closed loop keeps the repetition but adds the two things Ralph omits: an external feedback gate and an exit condition with a cap, so the loop stops on a checkable result rather than running until you intervene.

Are these loops specific to Cursor or Claude Code?

No. Because each loop is plain text whose gate is a shell command, the same artifact runs across Cursor, Claude Code, Codex, Gemini CLI, and OpenCode.² Loops that use editor hooks name the per-platform mechanism — beforeShellExecution/PreToolUse on commit, afterFileEdit/PostToolUse after edits — but the loop's shape is identical everywhere.²

Do I need a longer prompt or a loop?

A loop, in nearly every case where the agent drifts or stops early. A longer prompt adds detail but no stopping rule; a loop adds the trigger, the external check, and the exit that decide when the work is actually finished. Reach for a longer prompt only for a genuine one-shot task with no verifiable end state.

A prompt tells an agent what you want; a loop tells it how to know when it is done. Write the loop down, pin its gate to a command, and you stop babysitting and start engineering. This pattern is shown in catalog form by elorm's loops! — and because a loop is just plain text, the natural home for the ones you tune is a file you own and version, the way mnmnote.com keeps your notes as files on your own device rather than locked inside one vendor's app.

Geoffrey Huntley, "Ralph Wiggum as a 'software engineer'," ghuntley.com, 2025, https://ghuntley.com/ralph/, retrieved 2026-06-20. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
"loops!" by elorm, https://loops.elorm.xyz/, retrieved 2026-06-20. Loop bodies, copy counts (~as of 2026-06-20; they drift), and per-loop agent compatibility from the site's own catalog. Originator: elorm (elorm.xyz; buymeacoffee.com/elormtsx). ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹²
Yiyang Jin, Kunzhao Xu, Hang Li, Xueting Han, Yanmin Zhou, Cheng Li, and Jing Bai, "ReVeal: Self-Evolving Code Agents via Reliable Self-Verification," arXiv:2506.11442, submitted 13 Jun 2025 (v2 revised 21 Oct 2025), https://arxiv.org/abs/2506.11442, retrieved 2026-06-20. ↩ ↩² ↩³