Support for hosted evals by willccbb · Pull Request #880 · PrimeIntellect-ai/verifiers

willccbb · 2026-02-09T04:26:25Z

Description

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

Medium Risk
Adds new networked evaluation execution paths (API requests, polling, log handling) and changes CLI argument normalization, which could affect evaluation runs and environment resolution if edge cases are missed.

Overview
Adds --hosted support to the verifiers.cli.commands.eval adapter, keeping local vf-eval behavior but enabling creation (and optional --follow polling/log streaming) of Prime-hosted evaluations via the Prime API, including API key/config resolution, slug/version resolution (arg/header/local metadata), TOML multi-eval configs, and payload options like timeouts, env args, access flags, secrets, and naming.

Updates the Prime CLI plugin to better locate a workspace root/venv and to normalize/auto-fill environment directory arguments (e.g., --path, --env-dir-path) to absolute workspace environments/ paths. Adds unit tests covering hosted payload construction, header-based slug parsing, TOML behavior, and plugin command/path resolution. Also pins dev ruff to >=0.15.0.

^{Written by Cursor Bugbot for commit 9e363a9. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 4 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-09T04:40:47Z

verifiers/cli/commands/eval.py

+        )
+        raise SystemExit(2)
+
+    _run_vf_eval(args)


Missing documentation for hosted evaluation feature

Medium Severity

This PR adds significant new user-facing functionality with the --hosted mode for evaluations, including new flags: --hosted, --follow, --poll-interval, --timeout-minutes, --allow-sandbox-access, --allow-instances-access, --custom-secrets, and --eval-name. The existing docs/evaluation.md describes the evaluation command in detail but isn't updated to document these new hosted evaluation capabilities. Per the review rules, PRs that add or modify core user-facing functionality as described in docs must update the relevant documentation.

cursor · 2026-02-09T04:40:47Z

verifiers/cli/commands/eval.py

+        )
+        raise SystemExit(2)
+
+    _run_vf_eval(args)


Missing skills update for hosted evaluation workflow

Low Severity

This PR changes user-facing Prime evaluation workflows by adding a new hosted execution mode with --hosted. The existing skills/evaluate-environments/SKILL.md describes evaluation workflows but doesn't include the new hosted evaluation patterns such as running evaluations on the Prime platform, following logs with --follow, or configuring hosted-specific options. Per the review rules, changes to user-facing evaluation workflows must update the corresponding skills.

cursor · 2026-02-09T04:40:47Z

verifiers/cli/commands/eval.py

+    slug, version = value.rsplit("@", 1)
+    if not version:
+        raise HostedEvalError(f"Invalid environment version in '{value}'")
+    return slug, version


Malformed slug with @ before / causes crash

Low Severity

_split_slug_and_version doesn't validate that the resulting slug contains a / separator. An unusual input like @version/name passes _is_slug_reference (because it contains / and doesn't start with ./, ../, or /), then rsplit("@", 1) produces an empty slug "". When _run_hosted_eval later calls env_slug.split("/", 1) on an empty string, it raises a ValueError with a confusing message rather than a descriptive hosted eval error.

Additional Locations (1)

verifiers/cli/commands/eval.py#L685-L686

cursor · 2026-02-09T04:40:47Z

verifiers/cli/commands/eval.py

+            i += 1
+            continue
+        if token in HOSTED_VALUE_FLAGS:
+            i += 2


Help flag skipped when following hosted value flag

Low Severity

In _strip_hosted_flags_for_help, when a hosted value flag like --poll-interval is encountered, the code does i += 2 to skip the flag and its value without checking if the "value" position contains --help or -h. For input like ["my-env", "--poll-interval", "--help"], the help flag is treated as --poll-interval's value and skipped. After stripping, _run_vf_eval(["my-env"]) is called, running an evaluation instead of showing help as the user intended.

willccbb added 3 commits February 8, 2026 19:40

plugin

e8da0a8

support for hosted evals

98d5c31

ruff

9e363a9

cursor bot reviewed Feb 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for hosted evals#880

Support for hosted evals#880
willccbb wants to merge 3 commits intomainfrom
hosted-eval-plugin

willccbb commented Feb 9, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 9, 2026

Uh oh!

cursor bot Feb 9, 2026

Uh oh!

cursor bot Feb 9, 2026

Uh oh!

cursor bot Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

willccbb commented Feb 9, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 9, 2026

Choose a reason for hiding this comment

Missing documentation for hosted evaluation feature

Uh oh!

cursor bot Feb 9, 2026

Choose a reason for hiding this comment

Missing skills update for hosted evaluation workflow

Uh oh!

cursor bot Feb 9, 2026

Choose a reason for hiding this comment

Malformed slug with @ before / causes crash

Uh oh!

cursor bot Feb 9, 2026

Choose a reason for hiding this comment

Help flag skipped when following hosted value flag

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

willccbb commented Feb 9, 2026 •

edited by cursor bot

Loading