Use `GpuIpcMem` for NVLS connections #719

chhwang · 2026-01-07T06:35:42Z

No description provided.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- [x] Move hash specialization and equality operator from std/global namespace to custom namespace - [x] Update unordered_map to use custom hash and equality as template parameters - [x] Add noexcept to equality operator - [x] Verify the changes build correctly - [x] Run code review and security checks  --- ✨ Let Copilot coding agent [set things up for you](https://github.com/microsoft/mscclpp/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Binyang2014 <9415966+Binyang2014@users.noreply.github.com> Co-authored-by: Binyang Li <binyli@microsoft.com>

Copilot

Pull request overview

This PR refactors NVLS (NVLink Sharp) connection handling to use the unified GpuIpcMem abstraction instead of directly managing CUDA multicast APIs. This simplifies the code by delegating multicast memory management to the existing GpuIpcMem infrastructure.

Key Changes

Replaces manual multicast handle management with GpuIpcMem and GpuIpcMemHandle abstractions
Removes manual buffer allocation tracking (allocatedRanges_, freeRanges_) as this is now handled internally by GpuIpcMem
Removes explicit synchronization barriers in connectNvlsCollective, relying instead on the blocking behavior of cuMulticastBindAddr within GpuIpcMem::mapMulticast()

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
src/switch_channel.cc	Complete refactoring of NvlsConnection::Impl to use GpuIpcMem; updated license header; removed buffer allocation logic and synchronization barriers; simplified bindMemory implementation
python/csrc/switch_channel_py.cpp	Removed Python binding for getMultiCastMinGranularity method (breaking API change)
include/mscclpp/switch_channel.hpp	Removed public API methods addDevice() and getMultiCastMinGranularity() (breaking API changes)

src/switch_channel.cc

chhwang · 2026-01-07T08:45:57Z

/azp run mscclpp-ut

azure-pipelines · 2026-01-07T08:46:05Z

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

chhwang · 2026-01-07T08:50:03Z

/azp run mscclpp-ut

azure-pipelines · 2026-01-07T08:50:15Z

Azure Pipelines successfully started running 1 pipeline(s).

Binyang2014 and others added 24 commits December 4, 2025 19:20

add ipc cache

70c1d4d

WIP

1739f5a

Update src/registered_memory.cc

4ebe37e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

WIP

2137325

fix ut

b1029b9

Merge branch 'main' into binyli/handle_cache

d97d230

Add GpuIpcMem class

fcb1ab6

Merge branch 'main' into chhwang/new-ipc-mem

73982f7

revert

ebec0ee

update

dc77036

update

c3d2c2b

Merge branch 'main' into chhwang/new-ipc-mem

8eccca7

Lint

3a07282

tackle comments

77245e5

lint

61cc7d6

Merge branch 'main' into chhwang/new-ipc-mem

c3f467b

add comments

61ee117

tackle comments

542800d

tackle comment

0d7f877

rocm fix

2ff8e1f

tackle comments

c99d344

more fix

0037490

Use GpuIpcMem for NVLS connections

a5817f8

chhwang requested a review from Copilot January 7, 2026 06:39

Copilot started reviewing on behalf of chhwang January 7, 2026 06:39 View session

Copilot AI reviewed Jan 7, 2026

View reviewed changes

src/switch_channel.cc Show resolved Hide resolved

src/switch_channel.cc Show resolved Hide resolved

tackle comments

2e184f9

chhwang changed the base branch from chhwang/new-ipc-mem to main January 7, 2026 08:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use `GpuIpcMem` for NVLS connections #719

Use `GpuIpcMem` for NVLS connections #719

Uh oh!

chhwang commented Jan 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

chhwang commented Jan 7, 2026

Uh oh!

azure-pipelines bot commented Jan 7, 2026

Uh oh!

chhwang commented Jan 7, 2026

Uh oh!

azure-pipelines bot commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use GpuIpcMem for NVLS connections #719

Are you sure you want to change the base?

Use GpuIpcMem for NVLS connections #719

Uh oh!

Conversation

chhwang commented Jan 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

chhwang commented Jan 7, 2026

Uh oh!

azure-pipelines bot commented Jan 7, 2026

Uh oh!

chhwang commented Jan 7, 2026

Uh oh!

azure-pipelines bot commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use `GpuIpcMem` for NVLS connections #719

Use `GpuIpcMem` for NVLS connections #719