Skip to content

[nvidia_stable-11.9] Backport util: fix max socket calculation#11

Open
NathanChenNVIDIA wants to merge 2 commits intonvidia_stable-11.9from
socket_count_fix
Open

[nvidia_stable-11.9] Backport util: fix max socket calculation#11
NathanChenNVIDIA wants to merge 2 commits intonvidia_stable-11.9from
socket_count_fix

Conversation

@NathanChenNVIDIA
Copy link
Collaborator

On some systems (e.g. GB200), physical_package_id values are not contiguous or zero-based. Instead of 0..N, they may contain large arbitrary identifiers (e.g. 256123234). The previous implementation assumed a 0..N range and used the maximum ID value directly.

This caused:

  • excessive memory allocation
  • extremely large loop bounds
  • OOM / DoS scenarios
  • unnecessary CPU time consumption

The new implementation computes the socket count as the number of unique package IDs present on the node, rather than relying on the maximum numeric value.

[Testing]
Launching VM via Kubevirt + Libvirt no longer hits OOM from virt-launcher querying the socket IDs and causing the VM to continually allocate large amounts of memory based on the large physical_package_id values.

alex2e78 and others added 2 commits February 5, 2026 19:16
This patch changes how the maximum socket count is calculated.

On some systems (e.g. GB200), physical_package_id values are not
contiguous or zero-based. Instead of 0..N, they may contain large
arbitrary identifiers (e.g. 256123234). The previous implementation
assumed a 0..N range and used the maximum ID value directly.

This caused:
    excessive memory allocation
    extremely large loop bounds
    OOM / DoS scenarios
    unnecessary CPU time consumption

The new implementation computes the socket count as the number of unique
package IDs present on the node, rather than relying on the maximum numeric
value.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Alexandr Semenikhin <alexandr2e78@gmail.com>
(cherry picked from commit a64367115015df58e0d82635a40d76df56144c60 https://github.com/libvirt/libvirt/commits/)
Link: https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/COIBU2IGVLC36Q3FLXDL3W7U7WIFVPPJ/
Signed-off-by: Nathan Chen <nathanc@nvidia.com>
Signed-off-by: Nathan Chen <nathanc@nvidia.com>
Copy link
Collaborator

@nvmochs nvmochs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed this is a clean pick and matches upstream.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

@nvmochs
Copy link
Collaborator

nvmochs commented Feb 6, 2026

@NathanChenNVIDIA - I see this was also submitted in PR 10 back in January, shall we close that one?

Copy link
Collaborator

@ianm-nv ianm-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Acked-by: Ian May <ianm@nvidia.com>

@NathanChenNVIDIA
Copy link
Collaborator Author

@NathanChenNVIDIA - I see this was also submitted in PR 10 back in January, shall we close that one?

Yes, I'll close that one.

@MitchellAugustin
Copy link
Contributor

Not seeing any formatting issues, and confirmed that this matches the merged upstream commit, so LGTM.

Copy link
Contributor

@MitchellAugustin MitchellAugustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not seeing any formatting issues, and confirmed that this matches the merged upstream commit, so LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants