PinchBench

🦀 PinchBench

Real-world benchmarks for AI coding agents

PinchBench measures how well LLM models perform as the brain of an OpenClaw agent. Instead of synthetic tests, we throw real tasks at agents: scheduling meetings, writing code, triaging email, researching topics, and managing files.

📊 View the Leaderboard →

Repositories

Repo	Description
skill	Benchmark runner and task definitions — run it yourself
leaderboard	The pinchbench.com leaderboard frontend

Run the Benchmark

git clone https://github.com/pinchbench/skill.git
cd skill
./scripts/run.sh --model anthropic/claude-sonnet-4

Results upload to the public leaderboard. Get started →

Claw-some AI agent testing 🦞

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PinchBench

🦀 PinchBench

📊 View the Leaderboard →

Repositories

Run the Benchmark

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!