Skip to content

Feat: Add base64 image support for results.#885

Open
d42me wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
d42me:feature/results-base64-image-support
Open

Feat: Add base64 image support for results.#885
d42me wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
d42me:feature/results-base64-image-support

Conversation

@d42me
Copy link
Collaborator

@d42me d42me commented Feb 10, 2026

Description

Add base64 image store support for results

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Note

Medium Risk
Touches the core output-serialization and evaluation plumbing (including worker RPC types), so regressions could affect saved datasets or server-mode evaluations; changes are guarded by explicit modes and covered by new tests.

Overview
Adds an opt-in path to persist images in saved eval results by extracting data:image/*;base64,... payloads into a per-message images field while still rendering [image] placeholders in content.

Plumbs save_image_mode/image_mode and max_image_base64_chars through eval CLI/config, Environment/EnvGroup generation, and worker client/server request types; metadata now records save_image_mode, and saving defaults to placeholder when save_results is off.

Hardens message serialization by validating data-URI base64 and size limits, and updates sanitize_tool_calls to preserve already-serialized tool-call strings and other message fields; new tests cover base64 extraction, limit enforcement, CLI saving, and sanitization behavior.

Written by Cursor Bugbot for commit a36c44d. This will update automatically on new commits. Configure here.

@d42me d42me requested a review from hallerite February 10, 2026 05:28
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

results_path: Path | None = None,
state_columns: list[str] | None = None,
save_results: bool = False,
image_mode: str = ImageMode.BASE64.value,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default image mode breaks existing non-data-URI callers

Medium Severity

The public API methods generate, evaluate, evaluate_sync, run_rollout, and run_group all default image_mode to ImageMode.BASE64, while the lower-level utilities state_to_output and states_to_outputs default to ImageMode.PLACEHOLDER. The BASE64 default means existing callers — like verifiers/gepa/adapter.py which calls generate() without image_mode — will now fail with a ValueError if any prompt contains an image_url with an HTTPS URL (not a data URI), since _extract_data_uri_base64 requires data: URIs. Before this change, all images were silently replaced with [image]. The eval CLI path correctly overrides this to PLACEHOLDER when save_results is false, but the public Python API does not.

Additional Locations (2)

Fix in Cursor Fix in Web

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willccbb What do you think here? Can we introduce this breaking change for better DX? Or should we stay with the default placeholder?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments