Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| ) | ||
| raise SystemExit(2) | ||
|
|
||
| _run_vf_eval(args) |
There was a problem hiding this comment.
Missing documentation for hosted evaluation feature
Medium Severity
This PR adds significant new user-facing functionality with the --hosted mode for evaluations, including new flags: --hosted, --follow, --poll-interval, --timeout-minutes, --allow-sandbox-access, --allow-instances-access, --custom-secrets, and --eval-name. The existing docs/evaluation.md describes the evaluation command in detail but isn't updated to document these new hosted evaluation capabilities. Per the review rules, PRs that add or modify core user-facing functionality as described in docs must update the relevant documentation.
| ) | ||
| raise SystemExit(2) | ||
|
|
||
| _run_vf_eval(args) |
There was a problem hiding this comment.
Missing skills update for hosted evaluation workflow
Low Severity
This PR changes user-facing Prime evaluation workflows by adding a new hosted execution mode with --hosted. The existing skills/evaluate-environments/SKILL.md describes evaluation workflows but doesn't include the new hosted evaluation patterns such as running evaluations on the Prime platform, following logs with --follow, or configuring hosted-specific options. Per the review rules, changes to user-facing evaluation workflows must update the corresponding skills.
| slug, version = value.rsplit("@", 1) | ||
| if not version: | ||
| raise HostedEvalError(f"Invalid environment version in '{value}'") | ||
| return slug, version |
There was a problem hiding this comment.
Malformed slug with @ before / causes crash
Low Severity
_split_slug_and_version doesn't validate that the resulting slug contains a / separator. An unusual input like @version/name passes _is_slug_reference (because it contains / and doesn't start with ./, ../, or /), then rsplit("@", 1) produces an empty slug "". When _run_hosted_eval later calls env_slug.split("/", 1) on an empty string, it raises a ValueError with a confusing message rather than a descriptive hosted eval error.
Additional Locations (1)
| i += 1 | ||
| continue | ||
| if token in HOSTED_VALUE_FLAGS: | ||
| i += 2 |
There was a problem hiding this comment.
Help flag skipped when following hosted value flag
Low Severity
In _strip_hosted_flags_for_help, when a hosted value flag like --poll-interval is encountered, the code does i += 2 to skip the flag and its value without checking if the "value" position contains --help or -h. For input like ["my-env", "--poll-interval", "--help"], the help flag is treated as --poll-interval's value and skipped. After stripping, _run_vf_eval(["my-env"]) is called, running an evaluation instead of showing help as the user intended.


Description
Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Note
Medium Risk
Adds new networked evaluation execution paths (API requests, polling, log handling) and changes CLI argument normalization, which could affect evaluation runs and environment resolution if edge cases are missed.
Overview
Adds
--hostedsupport to theverifiers.cli.commands.evaladapter, keeping localvf-evalbehavior but enabling creation (and optional--followpolling/log streaming) of Prime-hosted evaluations via the Prime API, including API key/config resolution, slug/version resolution (arg/header/local metadata), TOML multi-eval configs, and payload options like timeouts, env args, access flags, secrets, and naming.Updates the Prime CLI plugin to better locate a workspace root/venv and to normalize/auto-fill environment directory arguments (e.g.,
--path,--env-dir-path) to absolute workspaceenvironments/paths. Adds unit tests covering hosted payload construction, header-based slug parsing, TOML behavior, and plugin command/path resolution. Also pins devruffto>=0.15.0.Written by Cursor Bugbot for commit 9e363a9. This will update automatically on new commits. Configure here.