Conversation
Greptile SummaryThis PR adds per-goal randomization of the goal radius for each agent. Previously, the goal radius was a fixed global value (
Confidence Score: 4/5
Important Files Changed
Flowchartflowchart TD
A[Agent Init / Reset] -->|"generate_reward_coefs()"| B[Set all reward coefs including goal_radius]
B --> C{goal_behavior?}
C -->|GOAL_GENERATE_NEW| D[Agent reaches goal]
D --> E["sample_new_goal()"]
E --> F["sample_new_goal_radius()"]
F --> G{reward_randomization?}
G -->|Yes| H["random_uniform(min, max)"]
G -->|No| I["env->goal_radius (fixed)"]
H --> J[Set agent reward_coefs GOAL_RADIUS]
I --> J
J --> K["within_distance = distance < agent.reward_coefs[GOAL_RADIUS]"]
C -->|GOAL_RESPAWN| L[Agent respawns]
L -->|"generate_reward_coefs()"| M[Goal radius set at respawn]
M --> K
C -->|GOAL_STOP| N[Agent stops at goal]
N --> K
Last reviewed commit: b73b835 |
| static void sample_new_goal_radius(Drive *env, Agent *agent) { | ||
|
|
||
| if (env->reward_randomization) { | ||
| // Standard Uniform Randomization | ||
| agent->reward_coefs[REWARD_COEF_GOAL_RADIUS] = random_uniform( | ||
| env->reward_bounds[REWARD_COEF_GOAL_RADIUS].min_val, env->reward_bounds[REWARD_COEF_GOAL_RADIUS].max_val); | ||
| } else { | ||
| // Fixed coefficients | ||
| agent->reward_coefs[REWARD_COEF_GOAL_RADIUS] = env->goal_radius; | ||
| } | ||
| } |
There was a problem hiding this comment.
Duplicated goal radius logic with generate_reward_coefs
generate_reward_coefs (line 417-418 and 449) already sets REWARD_COEF_GOAL_RADIUS using the same logic. When goal_behavior == GOAL_GENERATE_NEW, the value set by generate_reward_coefs at init/reset is immediately overwritten by this function when sample_new_goal is first called.
This isn't a bug — the separation is intentional so the goal radius can be re-randomized per goal. But you may want to consider removing the REWARD_COEF_GOAL_RADIUS assignment from generate_reward_coefs and instead calling sample_new_goal_radius directly after generate_reward_coefs in the init/reset paths. This would make sample_new_goal_radius the single source of truth for goal radius assignment, reducing duplication.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Goal radius randomization. Based on the work by Kevin; remaking this branch was simpler than rebasing the other branch after all the merge conflicts we had.