Conversation
Greptile OverviewGreptile SummaryThis PR adds per-agent goal-reaching radius randomization/conditioning in the Drive environment. On the C side it introduces a per-agent Key issues to address before merge:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Py as Python (Drive/torch.py)
participant Bind as C binding (binding.c)
participant Ini as INI parser (env_config.h)
participant Env as C Env (Drive in drive.h)
Py->>Bind: env_init(..., ini_file=".../drive.ini", goal_radius=..., ...)
Bind->>Ini: ini_parse(ini_file, handler, &conf)
Ini-->>Bind: conf.goal_radius_randomization
Bind->>Env: env.goal_radius_randomization = conf.goal_radius_randomization
Bind->>Env: init(env)
Env->>Env: init_goal_positions()
Env->>Env: for each active agent: sample/set agent.goal_radius
loop each step
Py->>Bind: vec_step()
Bind->>Env: c_step(env)
Env->>Env: compute distance_to_goal
Env->>Env: within_distance = dist < agent.goal_radius
Env->>Env: compute_observations(): append agent.goal_radius/12
Bind-->>Py: observations updated (ego_dim includes goal radius)
end
|
pufferlib/config/ocean/drive.ini
Outdated
| ; 0=disabled (use env goal_radius), 1=enabled (random 2-12m per agent) | ||
| goal_radius_randomization = 1 | ||
| ; Max target speed in m/s for the agent to maintain towards the goal |
There was a problem hiding this comment.
Default flag is enabled
PR description says goal_radius_randomization is off by default, but pufferlib/config/ocean/drive.ini sets goal_radius_randomization = 1. This will change training behavior for anyone using the default config; set the default to 0 (or update the PR description if the intent is to enable by default).
Additional Comments (1)
|
7123438 to
2a5b484
Compare
Variable Goal Radius + Collision/Offroad Reward Conditioning
Use per-agent
reward_coefsfor goal distance checks, collision/offroad penalties, and rendering instead of globalenv->values.Files changed:
drive.h,visualize.cChanges
drive.h): AddedREWARD_COEF_GOAL_RADIUS,REWARD_COEF_COLLISION,REWARD_COEF_OFFROADtonormalize_reward_coef()so the policy sees them as normalized [-1, 1] featuresdrive.h:c_step):distance_to_goal < agent->reward_coefs[REWARD_COEF_GOAL_RADIUS](wasenv->goal_radius)drive.h:c_step): Usesagent->reward_coefs[REWARD_COEF_COLLISION](wasenv->reward_vehicle_collision)drive.h:c_step): Usesagent->reward_coefs[REWARD_COEF_OFFROAD](wasenv->reward_offroad_collision)drive.h:draw_agent_obs,drive.h:draw_scene): Goal circles use per-agentreward_coefs[REWARD_COEF_GOAL_RADIUS]insteadof global
env->goal_radiusvisualize.c): Passreward_randomizationandreward_conditioningfrom config to Drive struct init