[nvidia_stable-11.9] Backport util: fix max socket calculation#11
Open
NathanChenNVIDIA wants to merge 2 commits intonvidia_stable-11.9from
Open
[nvidia_stable-11.9] Backport util: fix max socket calculation#11NathanChenNVIDIA wants to merge 2 commits intonvidia_stable-11.9from
NathanChenNVIDIA wants to merge 2 commits intonvidia_stable-11.9from
Conversation
This patch changes how the maximum socket count is calculated.
On some systems (e.g. GB200), physical_package_id values are not
contiguous or zero-based. Instead of 0..N, they may contain large
arbitrary identifiers (e.g. 256123234). The previous implementation
assumed a 0..N range and used the maximum ID value directly.
This caused:
excessive memory allocation
extremely large loop bounds
OOM / DoS scenarios
unnecessary CPU time consumption
The new implementation computes the socket count as the number of unique
package IDs present on the node, rather than relying on the maximum numeric
value.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Alexandr Semenikhin <alexandr2e78@gmail.com>
(cherry picked from commit a64367115015df58e0d82635a40d76df56144c60 https://github.com/libvirt/libvirt/commits/)
Link: https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/COIBU2IGVLC36Q3FLXDL3W7U7WIFVPPJ/
Signed-off-by: Nathan Chen <nathanc@nvidia.com>
Signed-off-by: Nathan Chen <nathanc@nvidia.com>
f3ac536 to
405c45e
Compare
nvmochs
approved these changes
Feb 6, 2026
Collaborator
nvmochs
left a comment
There was a problem hiding this comment.
Confirmed this is a clean pick and matches upstream.
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Collaborator
|
@NathanChenNVIDIA - I see this was also submitted in PR 10 back in January, shall we close that one? |
ianm-nv
approved these changes
Feb 6, 2026
Collaborator
ianm-nv
left a comment
There was a problem hiding this comment.
LGTM
Acked-by: Ian May <ianm@nvidia.com>
Collaborator
Author
Yes, I'll close that one. |
Contributor
|
Not seeing any formatting issues, and confirmed that this matches the merged upstream commit, so LGTM. |
MitchellAugustin
approved these changes
Feb 6, 2026
Contributor
MitchellAugustin
left a comment
There was a problem hiding this comment.
Not seeing any formatting issues, and confirmed that this matches the merged upstream commit, so LGTM.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On some systems (e.g. GB200), physical_package_id values are not contiguous or zero-based. Instead of 0..N, they may contain large arbitrary identifiers (e.g. 256123234). The previous implementation assumed a 0..N range and used the maximum ID value directly.
This caused:
The new implementation computes the socket count as the number of unique package IDs present on the node, rather than relying on the maximum numeric value.
[Testing]
Launching VM via Kubevirt + Libvirt no longer hits OOM from virt-launcher querying the socket IDs and causing the VM to continually allocate large amounts of memory based on the large physical_package_id values.