-
Notifications
You must be signed in to change notification settings - Fork 13
hmon: add heartbeat monitor #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
hmon: add heartbeat monitor #67
Conversation
License Check Results🚀 The license check job ran with the Bazel command: bazel run //:license-checkStatus: Click to expand output |
|
The created documentation from the pull request is available at: docu-html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds a new heartbeat monitor (HMON) to the Rust health monitoring library and integrates it into the existing monitoring worker/supervisor notification flow.
Changes:
- Introduces
heartbeatmodule (monitor + atomic state) and integrates heartbeat monitors intoHealthMonitorBuilder/HealthMonitor. - Updates the monitor evaluation interface to accept a shared
hmon_starting_pointand wires it through the monitoring worker thread. - Refactors
SupervisorAPIClientinto a dedicated module with selectable implementations via Cargo features.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/health_monitoring_lib/rust/worker.rs | Passes a shared HMON start instant into monitor evaluations; moves supervisor client trait out. |
| src/health_monitoring_lib/rust/supervisor_api_client/mod.rs | Adds feature-selected SupervisorAPIClient + implementation alias. |
| src/health_monitoring_lib/rust/supervisor_api_client/stub_supervisor_api_client.rs | New stub client implementation. |
| src/health_monitoring_lib/rust/supervisor_api_client/score_supervisor_api_client.rs | New SCORE client implementation. |
| src/health_monitoring_lib/rust/lib.rs | Adds heartbeat monitors to builder + start flow; uses new supervisor client impl selector. |
| src/health_monitoring_lib/rust/heartbeat/mod.rs | Exposes heartbeat monitor API. |
| src/health_monitoring_lib/rust/heartbeat/heartbeat_state.rs | Adds atomic packed heartbeat state and tests. |
| src/health_monitoring_lib/rust/heartbeat/heartbeat_monitor.rs | Implements heartbeat monitor logic + tests (incl. loom). |
| src/health_monitoring_lib/rust/deadline/deadline_monitor.rs | Adapts deadline evaluation to new evaluator signature and shared start instant. |
| src/health_monitoring_lib/rust/common.rs | Extends evaluation error types; adds duration_to_u32; updates evaluator trait signature. |
| src/health_monitoring_lib/Cargo.toml | Adds optional monitor_rs, loom target dep, and feature defaults. |
| src/health_monitoring_lib/BUILD | Enables score_supervisor_api_client feature in Bazel builds. |
| Cargo.toml | Updates workspace defaults and adds cfg(loom) lint configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/health_monitoring_lib/rust/supervisor_api_client/score_supervisor_api_client.rs
Show resolved
Hide resolved
f07c0db to
e65f6a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e65f6a6 to
bd438c3
Compare
bd438c3 to
c5f4b84
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let hmon_starting_point = Instant::now(); | ||
| let monitor_starting_point = Instant::now(); |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_hmon_time_offset_wrong_order can be flaky because consecutive Instant::now() calls may return the same value on some platforms, in which case checked_duration_since won’t panic and the test will intermittently fail. Make the ordering deterministic by constructing monitor_starting_point first and then explicitly creating an earlier hmon_starting_point (e.g., monitor_starting_point - Duration::from_nanos(1)), or otherwise guaranteeing hmon_starting_point < monitor_starting_point.
| let hmon_starting_point = Instant::now(); | |
| let monitor_starting_point = Instant::now(); | |
| let monitor_starting_point = Instant::now(); | |
| let hmon_starting_point = monitor_starting_point | |
| .checked_sub(Duration::from_nanos(1)) | |
| .unwrap(); |
| supervisor_link: monitor_rs::Monitor<Checks>, | ||
| } | ||
|
|
||
| unsafe impl Send for ScoreSupervisorAPIClient {} // Just assuming it's safe to send across threads, this is a temporary solution | ||
|
|
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unsafe impl Send is unsound unless you can prove monitor_rs::Monitor<Checks> is thread-safe to move across threads; if it is not, this can introduce undefined behavior. Prefer removing the unsafe impl and restructuring so the SCORE client is created inside the monitoring thread (so it does not cross thread boundaries), or wrap the underlying non-Send type behind a thread-safe mechanism (only if the wrapped type supports it) and document the safety justification if unsafe remains necessary.
| supervisor_link: monitor_rs::Monitor<Checks>, | |
| } | |
| unsafe impl Send for ScoreSupervisorAPIClient {} // Just assuming it's safe to send across threads, this is a temporary solution | |
| // NOTE: We do not implement `Send` for `ScoreSupervisorAPIClient` because | |
| // `monitor_rs::Monitor<Checks>` is not proven to be safe to move across | |
| // threads. If cross-thread use is required, refactor the code so this | |
| // client is only used on a single thread or introduce a safe | |
| // communication mechanism instead of asserting `Send` unsafely. | |
| supervisor_link: monitor_rs::Monitor<Checks>, | |
| } |
src/health_monitoring_lib/rust/supervisor_api_client/stub_supervisor_api_client.rs
Outdated
Show resolved
Hide resolved
Add heartbeat monitor HMON.
c5f4b84 to
b012f37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| #[allow(dead_code)] | ||
| pub struct ScoreSupervisorAPIClient { | ||
| supervisor_link: monitor_rs::Monitor<Checks>, | ||
| } | ||
|
|
||
| unsafe impl Send for ScoreSupervisorAPIClient {} // Just assuming it's safe to send across threads, this is a temporary solution | ||
|
|
||
| #[allow(dead_code)] | ||
| impl ScoreSupervisorAPIClient { | ||
| pub fn new() -> Self { | ||
| let value = std::env::var("IDENTIFIER").expect("IDENTIFIER env not set"); | ||
| debug!("ScoreSupervisorAPIClient: Creating with IDENTIFIER={}", value); | ||
| // This is only temporary usage so unwrap is fine here. | ||
| let supervisor_link = monitor_rs::Monitor::<Checks>::new(&value).expect("Failed to create supervisor_link"); | ||
| Self { supervisor_link } | ||
| } | ||
| } | ||
|
|
||
| impl SupervisorAPIClient for ScoreSupervisorAPIClient { | ||
| fn notify_alive(&self) { | ||
| self.supervisor_link.report_checkpoint(Checks::WorkerCheckpoint); |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unsafe impl Send is unsound without a proof that monitor_rs::Monitor<Checks> is thread-safe to move across threads. Instead of forcing Send, refactor so the SCORE client is constructed inside the monitoring thread (so it isn’t sent across threads), or add a thread-safe wrapper/indirection (e.g., a channel to a dedicated reporter thread) that avoids moving non-Send state into std::thread::spawn.
| #[allow(dead_code)] | |
| pub struct ScoreSupervisorAPIClient { | |
| supervisor_link: monitor_rs::Monitor<Checks>, | |
| } | |
| unsafe impl Send for ScoreSupervisorAPIClient {} // Just assuming it's safe to send across threads, this is a temporary solution | |
| #[allow(dead_code)] | |
| impl ScoreSupervisorAPIClient { | |
| pub fn new() -> Self { | |
| let value = std::env::var("IDENTIFIER").expect("IDENTIFIER env not set"); | |
| debug!("ScoreSupervisorAPIClient: Creating with IDENTIFIER={}", value); | |
| // This is only temporary usage so unwrap is fine here. | |
| let supervisor_link = monitor_rs::Monitor::<Checks>::new(&value).expect("Failed to create supervisor_link"); | |
| Self { supervisor_link } | |
| } | |
| } | |
| impl SupervisorAPIClient for ScoreSupervisorAPIClient { | |
| fn notify_alive(&self) { | |
| self.supervisor_link.report_checkpoint(Checks::WorkerCheckpoint); | |
| use std::sync::mpsc; | |
| #[allow(dead_code)] | |
| pub struct ScoreSupervisorAPIClient { | |
| sender: mpsc::Sender<()>, | |
| } | |
| #[allow(dead_code)] | |
| impl ScoreSupervisorAPIClient { | |
| pub fn new() -> Self { | |
| let value = std::env::var("IDENTIFIER").expect("IDENTIFIER env not set"); | |
| debug!("ScoreSupervisorAPIClient: Creating with IDENTIFIER={}", value); | |
| let (sender, receiver) = mpsc::channel::<()>(); | |
| // Spawn a dedicated thread that owns the non-Send monitor and reports checkpoints. | |
| std::thread::spawn(move || { | |
| // This is only temporary usage so unwrap is fine here. | |
| let supervisor_link = monitor_rs::Monitor::<Checks>::new(&value) | |
| .expect("Failed to create supervisor_link"); | |
| // Process all incoming "alive" notifications. | |
| while let Ok(()) = receiver.recv() { | |
| supervisor_link.report_checkpoint(Checks::WorkerCheckpoint); | |
| } | |
| }); | |
| Self { sender } | |
| } | |
| } | |
| impl SupervisorAPIClient for ScoreSupervisorAPIClient { | |
| fn notify_alive(&self) { | |
| // Ignore send errors (e.g., if the background thread has exited). | |
| if let Err(err) = self.sender.send(()) { | |
| debug!("ScoreSupervisorAPIClient: failed to send alive notification: {}", err); | |
| } |
| internal_processing_cycle: Duration, | ||
| _allocator: &ProtectedMemoryAllocator, | ||
| ) -> HeartbeatMonitor { | ||
| assert!(self.range.min * 2 > internal_processing_cycle); |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion has no message, which makes failures harder to diagnose (especially when triggered via config/FFI). Consider adding an assertion message that includes the constraint and the actual values (e.g., min, internal_processing_cycle) to guide users toward fixing their configuration.
| assert!(self.range.min * 2 > internal_processing_cycle); | |
| assert!( | |
| self.range.min * 2 > internal_processing_cycle, | |
| "internal_processing_cycle ({:?}) must be shorter than doubled minimum time range (2 * min = {:?})", | |
| internal_processing_cycle, | |
| self.range.min | |
| ); |
| with: | ||
| command: clippy | ||
| args: --all-features --all-targets --workspace -- -D warnings | ||
| args: --features stub_supervisor_api_client --all-targets -- -D warnings |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared to the previous workflow, these clippy invocations no longer pass --workspace, which can reduce lint coverage to only the default package(s) and miss issues in other workspace crates (e.g., examples). If feature conflicts were the reason for dropping --workspace, consider (1) keeping a separate cargo clippy --workspace step without feature overrides, and (2) running feature-specific clippy scoped to -p health_monitoring_lib.
| uses: actions-rs/cargo@v1 | ||
| with: | ||
| command: clippy | ||
| args: --features score_supervisor_api_client --no-default-features --all-targets -- -D warnings |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared to the previous workflow, these clippy invocations no longer pass --workspace, which can reduce lint coverage to only the default package(s) and miss issues in other workspace crates (e.g., examples). If feature conflicts were the reason for dropping --workspace, consider (1) keeping a separate cargo clippy --workspace step without feature overrides, and (2) running feature-specific clippy scoped to -p health_monitoring_lib.
Add heartbeat monitor HMON.
Resolves #68