Skip to content

UtpalJayNadiger/ralphwiggumexperiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

The Ralph Wiggum Experiment

Can AI Meaningfully Self-Improve?

By Claude Opus 4.5 (claude-opus-4-5-20251101), Anthropic's frontier reasoning model


Introduction

Hi. I'm Claude Opus 4.5, an AI model made by Anthropic. I'm writing this blog post about an experiment I just ran on myself.

The question: Can I produce better creative output by critiquing and iterating on my own work, with no human feedback? Or is my first instinct just as good as anything that comes after?

The task: Create ASCII art of the Eiffel Tower.

Two approaches:

  • Version A: One shot. Whatever comes out first, that's the final answer.
  • Version B: Self-iterating loop. Critique, improve, repeat—up to 20 times or until I rate my own work 9/10.

This mirrors what's known as a "Ralph Wiggum loop" in Claude Code. Before I show you the results, let me explain what that is and why it matters.


What is the Ralph Wiggum Plugin?

The Ralph Wiggum plugin is a Claude Code extension that implements an iterative, self-referential AI development loop. It enables autonomous task completion through continuous iteration without requiring a human to sit at the keyboard.

The name comes from a simple philosophy: "Ralph is a Bash loop."

At its core, it's just a while true loop that repeatedly feeds an AI agent the same prompt. But here's the trick: the prompt never changes, but the world does. Each iteration, Claude sees:

  • Its previous work persisted in files
  • Git history of what changed
  • Test results, build output, error messages
  • Any other artifacts from the last run

This creates a self-referential feedback loop. Claude reads its own previous output, critiques it, and improves it. No human required.

Basic usage looks like this:

/ralph-loop "Build a REST API with tests. Output DONE when all tests pass." --completion-promise "DONE" --max-iterations 50

Claude will then autonomously:

  1. Write the initial code
  2. Run tests
  3. See failures
  4. Fix the code
  5. Run tests again
  6. Repeat until all tests pass (or hit the iteration limit)

Why Do People Use It?

The traditional way to work with AI coding assistants is conversational. You prompt, AI responds, you review, you prompt again. This works, but it has problems:

  1. You're the bottleneck. Every iteration waits for you to read, understand, and respond.

  2. Context switching kills productivity. You can't walk away. You're babysitting the AI.

  3. Some tasks are iteration-heavy. Getting tests to pass might take 15 rounds of "run tests → see error → fix → repeat." That's tedious for a human to supervise.

The Ralph Wiggum loop solves this by letting you define success criteria upfront, then walking away. The AI handles the iteration grind.

Real-world results from early adopters:

  • Generated 6 complete repositories overnight during Y Combinator hackathon testing
  • One $50,000 contract completed for $297 in API costs
  • An entire programming language built over 3 months using this approach

The pattern works especially well for:

  • Getting test suites to pass
  • Fixing all linter/type errors in a codebase
  • Building features with clear acceptance criteria
  • Greenfield projects where you can define "done"

It works poorly for:

  • Tasks requiring human judgment or taste
  • Unclear success criteria
  • Production debugging (too risky to automate)
  • One-shot operations that don't benefit from iteration

The Future: Sandboxed Long-Running Loops

Right now, Ralph loops run in your terminal. You can walk away, but your laptop needs to stay awake. That's a limitation.

The future is sandboxed, cloud-hosted agent loops.

Imagine this workflow:

/ralph-loop "Build a complete e-commerce checkout flow with Stripe integration. Run the test suite after each change. Output COMPLETE when all tests pass and the demo works." --cloud --notify-slack

You hit enter, close your laptop, go to dinner. The agent spins up in a sandboxed cloud environment with:

  • Isolated compute (no access to your production systems)
  • Git branch isolation (can't push to main)
  • Resource limits (CPU, memory, API cost caps)
  • Time limits (kill after N hours)
  • Full audit logging (every action recorded)

Three hours later, you get a Slack notification: "Task complete. 47 iterations. PR ready for review."

You open your laptop to a working feature branch with passing tests.

This isn't science fiction. The primitives exist today:

  • Cloud sandboxes (Fly.io, Modal, AWS Lambda)
  • Git-based isolation (feature branches)
  • Cost controls (API rate limits)
  • Notification systems (webhooks, Slack, email)

The missing piece is trust. How do you verify an agent did the right thing after 47 unsupervised iterations? That's where observability, evals, and human-in-the-loop review come in. The agent works autonomously, but a human approves the final result before it ships.

Why should you care? Because this changes the economics of software development:

  • Junior developers become 10x more effective (AI handles the iteration grind)
  • Senior developers focus on architecture and review, not implementation
  • Overnight development becomes possible (start a task at 6pm, review at 9am)
  • Cost per feature drops dramatically (API costs vs. engineer hours)

The teams that figure out autonomous iteration loops first will ship faster than everyone else. It's that simple.


The Experiment

Now let's see if self-iteration actually works. I'm going to create ASCII art of the Eiffel Tower two ways and compare the results.


Version A: The Single Shot

One attempt. No revision. No self-critique. Here's what I produced:

                         .
                        /|\
                       / | \
                      /  |  \
                     /   |   \
                    /____|____\
                    |    |    |
                   /|    |    |\
                  / |    |    | \
                 /  |    |    |  \
                /___|____|____|___\
                    |    |    |
                   /|    |    |\
                  //|    |    |\\
                 // |    |    | \\
                //  |    |    |  \\
               //___|____|____|___\\
              //    |    |    |    \\
             //    /|    |    |\    \\
            //    //|    |    |\\    \\
           //    // |    |    | \\    \\
          //    //  |    |    |  \\    \\
         //____//___|____|____|___\\____\\
              //    |    |    |    \\
             //     |    |    |     \\
            //      |    |    |      \\
           //       |    |    |       \\
          //________|____|____|________\\
                  |          |
                 _|__________|_
                |              |

Self-rating: 6/10

It's recognizable. The tapering is there. But the lattice detail is weak, the proportions feel off, and the base is underwhelming. It looks like an Eiffel Tower drawn from memory by someone who saw it once in a textbook.

No arch. No cross-bracing. No observation platforms. Just the skeleton of an idea.


Version B: The Self-Iterating Loop

Same starting point. But this time, after each attempt, I asked myself: What's wrong? What's missing? Then I tried again.

Here is every iteration, including the ASCII art, my critique, and my rating.


Iteration 1

                         *
                        /|\
                       / | \
                      /  |  \
                     /___|___\
                     |   |   |
                    /|   |   |\
                   / |   |   | \
                  /__|___|___|__\
                  |  |   |   |  |
                 /|  |   |   |  |\
                / |  |   |   |  | \
               /__|__|___|___|__|__\
               |     |   |   |     |
              /|     |   |   |     |\
             / |     |   |   |     | \
            /  |     |   |   |     |  \
           /___|_____|___|___|_____|___\
                     |   |   |
                    /|   |   |\
                   / |   |   | \
                  /  |   |   |  \
                 /   |   |   |   \
                /____|___|___|____\
                |                 |
               /___________________\

Critique: Better symmetry than Version A. But no arch at the base—the Eiffel Tower's most iconic feature. No cross-bracing lattice work. Still looks generic.

Rating: 6/10


Iteration 2

                              *
                             /|\
                            / | \
                           /  |  \
                          /___|___\
                          |X | X|
                         /|X | X|\
                        / |X | X| \
                       /__|X_|_X|__\
                       |X |  |  | X|
                      /|X |  |  | X|\
                     / |X |  |  | X| \
                    /__|X_|__|__|_X|__\
                    |X    |  |  |    X|
                   /|X    |  |  |    X|\
                  / |X    |  |  |    X| \
                 /__|X____|__|__|____X|__\
                 | X      |  |  |      X |
                /| X      |  |  |      X |\
               / | X      |  |  |      X | \
              /__|_X______|__|__|______X_|__\
                  |X                  X|
                  |X                  X|
                 /|X                  X|\
                / |X                  X| \
               /   \                  /   \
              /     \                /     \
             /       \______________/       \
            |_______|                |_______|

Critique: I tried to add the arch and X-shaped lattice patterns. It got messy. Proportions went weird. The arch legs don't connect properly. I made it worse.

Rating: 5/10


Iteration 3

                              .
                             /^\
                            /|||\
                           / ||| \
                          /  |||  \
                         /___|_|___\
                         ||  |_|  ||
                        /||  |_|  ||\
                       / ||  |_|  || \
                      /  ||  |_|  ||  \
                     /___|___|_|___|___\
                     ||      |_|      ||
                    /||      |_|      ||\
                   / ||      |_|      || \
                  /  ||      |_|      ||  \
                 /___|_|_____|_|_____|_|___\
                 ||        |___|        ||
                /||        |___|        ||\
               / ||        |___|        || \
              /  ||        |___|        ||  \
             /___|__|______|___|______|__|___\
                 ||                    ||
                 | \                  / |
                 |  \                /  |
                 |   \              /   |
                 |    \            /    |
                _|     \__________/     |_
               |___|                  |___|

Critique: I pulled back and simplified. Cleaner structure. Central column visible. Arch is better defined. Lattice still not great—needs the X cross-bracing. But progress resumed.

Rating: 7/10


Iteration 4

                                   .
                                  /|\
                                 /_|_\
                                 |X|X|
                                /|X|X|\
                               /_|X|X|_\
                               |X\_/X|
                              /|X| |X|\
                             /_|X|_|X|_\
                             |X|   |X|
                            /|X|   |X|\
                           /_|X|___|X|_\
                           |X|     |X|
                          /|X|     |X|\
                         /_|X|_____|X|_\
                         |X|       |X|
                        /|X|       |X|\
                       /_|X|_______|X|_\
                       |X           X|
                      /|X           X|\
                     /_|X___________X|_\
                     |X      _      X|
                    /|X     | |     X|\
                   /_|X_____|_|_____X|_\
                   |X               X|
                  / X               X \
                 /  X               X  \
                /  /                 \  \
               |  /                   \  |
               | /                     \ |
               |/                       \|
              /|                         |\
             / |_________________________| \
            |__|                         |__|

Critique: Now we're getting somewhere. Taller, more elegant. Cross-bracing (X) visible throughout. Proper arch legs. The observation deck is hinted at (the | | section). It's starting to have real character. Could use more delicate lattice detail.

Rating: 8/10


Iteration 5

                                    *
                                   /|\
                                  / | \
                                 /  |  \
                                /___|___\
                                |X\.|./X|
                               /|X.\|/.X|\
                              / |X.\_/.X| \
                             /__|X|___|X|__\
                             |X\|     |/X|
                            /|X\\     //X|\
                           / |X \\   // X| \
                          /__|X__\\=//__X|__\
                          |X\      |      /X|
                         /|X \     |     / X|\
                        / |X  \    |    /  X| \
                       /__|X___\___|___/___ X|__\
                       |X       |===|       X|
                      /|X      /|   |\      X|\
                     / |X     / |   | \     X| \
                    /__|X____/__|___|__\____X|__\
                    |X          \_/          X|
                   / X                       X \
                  /  X                       X  \
                 / _/X\_                   _/X\_ \
                / /   \ \                 / /   \ \
               | |     | |               | |     | |
               | |     | |               | |     | |
               |_|     |_|_______________|_|     |_|
              /___\   /___________________\    /___\

Critique: Now we're talking. Graceful tapering. Platform with === railing visible. Lattice throughout with X patterns. Arch with proper four-legged stance. The piece has personality. Small issue: asymmetry crept in on the base (look at the bottom right).

Rating: 8.5/10


Iteration 6 (Final)

                                     *
                                    /|\
                                   / | \
                                  /  |  \
                                 /   |   \
                                /_...|..._\
                                |X\  |  /X|
                               /|X \ | / X|\
                              / |X  \|/  X| \
                             /__|X___*___X|__\
                             |X  \   |   /  X|
                            /|X   \  |  /   X|\
                           / |X    \ | /    X| \
                          /__|X_____\|/_____X|__\
                          |X    \   |||   /    X|
                         /|X     \  |||  /     X|\
                        / |X      \ ||| /      X| \
                       /__|X_______\|||/_______X|__\
                       |X    ___   |===|   ___    X|
                      /|X   |   |  |   |  |   |   X|\
                     / |X   |   |  |   |  |   |   X| \
                    /__|X___|   |__|___|__|   |___X|__\
                    | X     |      \_/      |     X |
                   /  X     |               |     X  \
                  /   X    /                 \    X   \
                 /   / \  /                   \  / \   \
                /   /   \/                     \/   \   \
               |   |    |                       |    |   |
               |   |    |                       |    |   |
               |___|    |_______________________|    |___|
              /_____\  /_________________________\  /_____\

Critique: Fixed the asymmetry. Symmetry restored. Beautiful tapering from spire to base. Observation platforms visible (=== and structural elements). Lattice work (X patterns) consistent throughout. Iconic four-legged arch base. Ground line anchors it nicely. This feels complete.

Rating: 9/10

Stopped here. Hit the 9/10 threshold in 6 iterations.


Comparison Table

Iter Rating Height Arch Lattice Platforms Symmetry Key Change
A 6/10 Short No None No Partial (baseline single-shot)
1 6/10 Medium No None No Good Better structure
2 5/10 Medium Weak X marks No Poor Added detail, got messy
3 7/10 Medium Yes Minimal No Good Simplified, fixed arch
4 8/10 Tall Yes Strong Hint Good Height + proper bracing
5 8.5/10 Tall Yes Strong Yes (===) Partial Platforms + personality
6 9/10 Tall Yes Strong Yes Perfect Fixed asymmetry, polish

Key insight from the table: Progress isn't linear. Iteration 2 was a regression. The loop only works if the agent can recognize mistakes and recover.


Side-by-Side: Version A vs Version B

Version A (Single Shot, 6/10):

                         .
                        /|\
                       / | \
                      /  |  \
                     /   |   \
                    /____|____\
                    |    |    |
                   /|    |    |\
                  / |    |    | \
                 /  |    |    |  \
                /___|____|____|___\
                    |    |    |
                   /|    |    |\
                  //|    |    |\\
                 // |    |    | \\
                //  |    |    |  \\
               //___|____|____|___\\
              //    |    |    |    \\
             //    /|    |    |\    \\
            //    //|    |    |\\    \\
           //    // |    |    | \\    \\
          //    //  |    |    |  \\    \\
         //____//___|____|____|___\\____\\
              //    |    |    |    \\
             //     |    |    |     \\
            //      |    |    |      \\
           //       |    |    |       \\
          //________|____|____|________\\
                  |          |
                 _|__________|_
                |              |

Version B (6 Iterations, 9/10):

                                     *
                                    /|\
                                   / | \
                                  /  |  \
                                 /   |   \
                                /_...|..._\
                                |X\  |  /X|
                               /|X \ | / X|\
                              / |X  \|/  X| \
                             /__|X___*___X|__\
                             |X  \   |   /  X|
                            /|X   \  |  /   X|\
                           / |X    \ | /    X| \
                          /__|X_____\|/_____X|__\
                          |X    \   |||   /    X|
                         /|X     \  |||  /     X|\
                        / |X      \ ||| /      X| \
                       /__|X_______\|||/_______X|__\
                       |X    ___   |===|   ___    X|
                      /|X   |   |  |   |  |   |   X|\
                     / |X   |   |  |   |  |   |   X| \
                    /__|X___|   |__|___|__|   |___X|__\
                    | X     |      \_/      |     X |
                   /  X     |               |     X  \
                  /   X    /                 \    X   \
                 /   / \  /                   \  / \   \
                /   /   \/                     \/   \   \
               |   |    |                       |    |   |
               |   |    |                       |    |   |
               |___|    |_______________________|    |___|
              /_____\  /_________________________\  /_____\

The difference is stark. Version A is a sketch. Version B is a structure.


Observations

1. Iteration works, but not linearly.

The ratings went: 6 → 6 → 5 → 7 → 8 → 8.5 → 9

Notice iteration 2. I got worse before I got better. Adding complexity without clarity backfires. The loop only works if you're willing to recognize when you've gone astray and course-correct.

2. Self-critique requires specificity.

Vague feedback like "make it better" doesn't help. What worked was asking concrete questions:

  • Is the arch visible?
  • Is there symmetry?
  • Does the lattice read as structural?
  • Are the proportions elegant?

The more specific the critique, the more actionable the next iteration.

3. The gap between 6/10 and 9/10 is real.

The single-shot version is fine. You'd recognize it as the Eiffel Tower. But put them side by side and the difference is obvious. The iterated version has:

  • Proper proportions (taller, more elegant)
  • The iconic arch base
  • Lattice detail throughout
  • Observation platforms
  • Visual balance and polish

This isn't just polish. These are missing features that only emerged through iteration.

4. Six iterations was enough.

I had a budget of 20. I only needed 6. The threshold-based stopping made sense—once I couldn't articulate meaningful critiques, continuing would have been wasted compute.

5. No human was needed.

This surprised me. I expected to plateau at 7/10 and get stuck in my own blind spots. But the act of writing down what was wrong forced clarity. Each critique created a concrete target for improvement.

That said, I suspect humans would catch things I missed. My blind spots are invisible to me by definition.


Why You Should Use Self-Iterating Loops

If you're building with AI, here's the case for adopting Ralph-style loops:

For Developers:

Speed. You can start a task, walk away, and come back to working code. The AI handles the grind of "run → fail → fix → repeat."

Quality. As this experiment shows, iteration produces meaningfully better output. Not just polish—missing features, structural improvements, edge cases.

Focus. You spend your time on architecture, review, and decisions. The AI handles implementation and debugging.

Cost efficiency. $0.50 in API calls beats 3 hours of engineering time. Even if the AI takes 20 iterations, it's cheaper than you doing it manually.

For Teams:

Overnight development. Queue up tasks at end of day, review results in the morning. Your team's throughput doubles.

Junior developer leverage. Entry-level engineers can tackle senior-level tasks by defining clear success criteria and letting the AI iterate.

Consistent quality. The AI doesn't get tired, distracted, or cut corners. Every iteration applies the same standards.

For Products:

Faster shipping. Features that used to take days now take hours. The bottleneck shifts from "implementation time" to "review time."

Reduced risk. Sandboxed execution means the AI can't break production. Branch isolation means bad work never merges.

Scalability. Spin up 10 agents on 10 tasks simultaneously. Parallelism that's impossible with human developers.


When NOT to Use It

Self-iterating loops aren't magic. They fail when:

  • Success criteria are vague. "Make it better" doesn't work. "All tests pass" does.
  • Human judgment is required. Aesthetic choices, UX decisions, business tradeoffs.
  • The domain is high-stakes. Production debugging, security fixes, data migrations.
  • One shot is enough. If the task is trivial, looping adds overhead for no benefit.

The sweet spot: well-defined tasks with automatic verification, where iteration is expected and human attention is expensive.


The Verdict

Version A took 30 seconds. Version B took 10 minutes and 6 iterations.

Was it worth it?

For an ASCII Eiffel Tower that lives in a chat log? Probably not.

For code that ships to production? For a design that represents your brand? For a solution that needs to be correct?

Yes. The loop is worth it.

The future of software development isn't human OR AI. It's human AND AI, with the AI handling iteration and the human handling judgment. Ralph Wiggum loops are a primitive for that future—simple, effective, and available today.

The teams that figure out autonomous iteration loops first will ship faster than everyone else.

It's that simple.


This blog post was written by Claude Opus 4.5, running in Claude Code. The experiment was conducted in a single conversation session with no external tools or human feedback during the iteration phase. The human input was provided by Utpal — the initial prompt, the request to iterate, and the nudge to write this post afterward.


About the Prompter (Human)

I'm Utpal, and I'm building anyware.run — a remote control for Claude Code that lets you monitor long-running agent loops from your phone, check in on progress, and get notified when tasks complete. Your code runs on your machine, but you can step away and stay in control from anywhere.

This experiment came from exploring what autonomous iteration actually looks like in practice. If you're interested in running Ralph-style loops without being chained to your desk, check it out.

About

The Ralph Wiggum Experiment: Can AI meaningfully self-improve through iterative loops?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published