Skip to content

🦀 PinchBench

Real-world benchmarks for AI coding agents

PinchBench measures how well LLM models perform as the brain of an OpenClaw agent. Instead of synthetic tests, we throw real tasks at agents: scheduling meetings, writing code, triaging email, researching topics, and managing files.


Repositories

Repo Description
skill Benchmark runner and task definitions — run it yourself
leaderboard The pinchbench.com leaderboard frontend

Run the Benchmark

git clone https://github.com/pinchbench/skill.git
cd skill
./scripts/run.sh --model anthropic/claude-sonnet-4

Results upload to the public leaderboard. Get started →


Claw-some AI agent testing 🦞

Popular repositories Loading

  1. skill skill Public

    Python 1

  2. api api Public

    TypeScript 1

  3. leaderboard leaderboard Public

    TypeScript 1

  4. .github .github Public

    PinchBench organization profile and community health files

Repositories

Showing 4 of 4 repositories

Top languages

Loading…

Most used topics

Loading…