-
Notifications
You must be signed in to change notification settings - Fork 134
Fix #729 and #731: Telemetry lifecycle management #734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
msrathore-db
wants to merge
1
commit into
main
Choose a base branch
from
fix/telemetry-lifecycle-issues-729-731
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+51
−6
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,6 +42,7 @@ | |
| from databricks.sql.common.feature_flag import FeatureFlagsContextFactory | ||
| from databricks.sql.common.unified_http_client import UnifiedHttpClient | ||
| from databricks.sql.common.http import HttpMethod | ||
| from databricks.sql.exc import RequestError | ||
| from databricks.sql.telemetry.telemetry_push_client import ( | ||
| ITelemetryPushClient, | ||
| TelemetryPushClient, | ||
|
|
@@ -295,7 +296,7 @@ def _send_telemetry(self, events): | |
| url, | ||
| data=request.to_json(), | ||
| headers=headers, | ||
| timeout=900, | ||
| timeout=30, | ||
| ) | ||
|
|
||
| future.add_done_callback( | ||
|
|
@@ -417,10 +418,38 @@ def export_latency_log( | |
| ) | ||
|
|
||
| def close(self): | ||
| """Flush remaining events before closing""" | ||
| """Flush remaining events before closing | ||
|
|
||
| IMPORTANT: This method does NOT close self._http_client. | ||
|
|
||
| Rationale: | ||
| - _flush() submits async work to the executor that uses _http_client | ||
| - If we closed _http_client here, async callbacks would fail with AttributeError | ||
| - Instead, we let _http_client live as long as needed: | ||
| * Pending futures hold references to self (via bound methods) | ||
| * This keeps self alive, which keeps self._http_client alive | ||
| * When all futures complete, Python GC will clean up naturally | ||
| - The __del__ method ensures eventual cleanup during garbage collection | ||
|
|
||
| This design prevents race conditions while keeping telemetry truly async. | ||
| """ | ||
| logger.debug("Closing TelemetryClient for connection %s", self._session_id_hex) | ||
| self._flush() | ||
|
|
||
| def __del__(self): | ||
| """Cleanup when TelemetryClient is garbage collected | ||
|
|
||
| This ensures _http_client is eventually closed when the TelemetryClient | ||
| object is destroyed. By this point, all async work should be complete | ||
| (since the futures held references keeping us alive), so it's safe to | ||
| close the http client. | ||
| """ | ||
| try: | ||
| if hasattr(self, '_http_client') and self._http_client: | ||
| self._http_client.close() | ||
| except Exception: | ||
| pass | ||
|
|
||
|
|
||
| class _TelemetryClientHolder: | ||
| """ | ||
|
|
@@ -674,7 +703,8 @@ def close(host_url): | |
| ) | ||
| try: | ||
| TelemetryClientFactory._stop_flush_thread() | ||
| TelemetryClientFactory._executor.shutdown(wait=True) | ||
| # Use wait=False to allow process to exit immediately | ||
| TelemetryClientFactory._executor.shutdown(wait=False) | ||
| except Exception as e: | ||
| logger.debug("Failed to shutdown thread pool executor: %s", e) | ||
| TelemetryClientFactory._executor = None | ||
|
|
@@ -689,13 +719,19 @@ def connection_failure_log( | |
| port: int, | ||
| client_context, | ||
| user_agent: Optional[str] = None, | ||
| enable_telemetry: bool = True, | ||
| ): | ||
| """Send error telemetry when connection creation fails, using provided client context""" | ||
|
|
||
| # Respect user's telemetry preference - don't force-enable | ||
| if not enable_telemetry: | ||
| logger.debug("Telemetry disabled, skipping connection failure log") | ||
| return | ||
|
|
||
| UNAUTH_DUMMY_SESSION_ID = "unauth_session_id" | ||
|
|
||
| TelemetryClientFactory.initialize_telemetry_client( | ||
| telemetry_enabled=True, | ||
| telemetry_enabled=enable_telemetry, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit : this change is not required. |
||
| session_id_hex=UNAUTH_DUMMY_SESSION_ID, | ||
| auth_provider=None, | ||
| host_url=host_url, | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are retries etc that incorporate backoff, the push is anyway async, we should not be changing this in my opinion.