generated from lambda-feedback/evaluation-function-boilerplate-python
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
strangely the pipeline we currently run doesnt take the full param into account, instead of using the Params pydantic, it search for a strange require_minimal
so I am going to paste the class def here
class Params(BaseModel):
"""
Evaluation parameters.
Example:
{
"evaluation_mode": "lenient",
"expected_type": "DFA",
"feedback_verbosity": "standard"
}
"""
# Evaluation mode
evaluation_mode: Literal["strict", "lenient", "partial"] = Field(
default="lenient",
description="strict: exact match, lenient: language equivalence, partial: partial credit"
)
# Expected automaton type
expected_type: Literal["DFA", "NFA", "any"] = Field(
default="any",
description="Expected automaton type"
)
# Feedback level
feedback_verbosity: Literal["minimal", "standard", "detailed"] = Field(
default="standard",
description="Level of feedback detail"
)
# Validation options
check_minimality: bool = Field(default=False, description="Check if FSA is minimal")
check_completeness: bool = Field(default=False, description="Check if DFA is complete")
# UI options
highlight_errors: bool = Field(default=True, description="Include element IDs for UI highlighting")
show_counterexample: bool = Field(default=True, description="Show counterexample if languages differ")
# Test generation
max_test_length: int = Field(default=10, ge=1, le=50, description="Max length for generated test strings")
is_dev: bool = Field(
default=False,
description="Flag indicating if running in development mode"
)so here
- evaluation mode: this should do 1. decomposition and are_iso 2. accept same language 3. I dont know
- expected_type: we need some extra is_nfa and is_dfa functions
- feedback verbosity: honestly I have no idea how to do this
- check_minimality and check_completeness will be done in the next commit
- show_counterexamples: also need a helper
- max_test_length: what the hell is this
- is_dev: this is added by me incase I need som dev only stuff, but now seems useless
but anyways we should do this in the checkpoint2
Metadata
Metadata
Assignees
Labels
No labels