Skip to content

Conversation

@mawelborn
Copy link
Contributor

@mawelborn mawelborn commented Jan 23, 2026

A small "Zen of Python" enhancement for using idiomatic copy.deepcopy() and copy.replace() functions with correct and performant behavior.

I previously implemented Prediction.copy() because copy.deepcopy(prediction) copied immutable objects, which is unnecessary and particularly expensive when OCR is assigned. (Copying the whole associated table, its cells, and all their tokens for each prediction copy is slow, to no one's surprise.)

I've since learned the behaviour of copy.deepcopy() can be customized by implementing __deepcopy__() on classes. This PR drops Prediction.copy() in favor of the idiomatic copy.deepcopy(), customizing the latter's behavior to not copy immutable objects, thus getting the same speed and memory improvements as the custom Prediction.copy() implementation.

This PR also adds support for the Python 3.13+ copy.replace() function via __replace__(). It has better ergonomics than dataclasses.replace() by performing a deep copy as above and additionally supports assignment of computed class properties.

Where before:

new_prediction = dataclasses.replace(
    prediction.copy(),  # Must remember to copy to avoid shared mutable state.
    label="EXCEPTION_FLAG",
    confidence=1.0,  # Computed property not supported by `replace()`.
    span=etl_output.tokens[0].span,  # Same here and for other properties.
)

Now:

new_prediction = copy.replace(
    prediction,  # Mutable state handled for you.
    label="EXCEPTION_FLAG",
    confidence=1.0,  # Computed properties can be updated like attributes.
    span=etl_output.tokens[0].span,
)

Note

Introduces idiomatic copying with performance improvements and Python 3.13 replace ergonomics.

  • Add __deepcopy__ to Prediction, DocumentExtraction, Summarization, Unbundling, and Result to avoid copying immutable OCR structures while deep-copying mutable fields
  • Enable copy.replace(...) by assigning Prediction.__replace__ and unshadowing dataclass-generated methods in subclasses (Python 3.13+)
  • Remove legacy .copy() implementations from prediction classes in favor of copy.deepcopy(...)

Risk: Medium — API change removing .copy() and altering copy semantics could impact callers relying on old behavior or shared-state assumptions.

Written by Cursor Bugbot for commit 652efe2. This will update automatically on new commits. Configure here.

@mawelborn mawelborn changed the title Mawelborn/prediction deepcopy Prediction Deep Copy / Replace Jan 23, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants