Benchmarks: Micro benchmark - add nvbench based kernel-launch & sleep-kernel #750

WenqingLan1 · 2025-10-09T23:12:33Z

This pull request adds support for NVBench-based GPU micro-benchmarks to SuperBench.

Integrated the NVBench submodule
Implemented two benchmarks
- nvbench-sleep-kernel
- nvbench-kernel-launch
updated documentation and added example scripts

Example config:

version: v0.12
superbench:
  enable:
  # nvbench benchmarks
  - nvbench-sleep-kernel:single
  - nvbench-sleep-kernel:list
  - nvbench-sleep-kernel:range
  - nvbench-sleep-kernel:range-step
  - nvbench-kernel-launch
  var:
    default_local_mode: &default_local_mode
      modes:
      - name: local
        proc_num: 4
        prefix: CUDA_VISIBLE_DEVICES={proc_rank}
        parallel: yes
  benchmarks:
    nvbench-sleep-kernel:single:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "50"                   # Single value format
        timeout: 30
    nvbench-sleep-kernel:list:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "[25,50,75]"         # List format - no spaces after commas
        timeout: 30
    nvbench-sleep-kernel:range:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "[0:5]"           # Range format
        timeout: 30
    nvbench-sleep-kernel:range-step:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "[0:50:10]"         # Range with step format
        timeout: 30
    nvbench-kernel-launch:
      <<: *default_local_mode
      timeout: 300

codecov · 2025-10-10T20:44:21Z

Codecov Report

❌ Patch coverage is 89.11917% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.79%. Comparing base (575859b) to head (498d551).

Files with missing lines	Patch %	Lines
...rbench/benchmarks/micro_benchmarks/nvbench_base.py	80.39%	20 Missing ⚠️
...enchmarks/micro_benchmarks/nvbench_sleep_kernel.py	98.07%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #750      +/-   ##
==========================================
+ Coverage   85.70%   85.79%   +0.08%     
==========================================
  Files         102      105       +3     
  Lines        7703     7896     +193     
==========================================
+ Hits         6602     6774     +172     
- Misses       1101     1122      +21

Flag	Coverage Δ
cpu-python3.10-unit-test	`71.40% <88.94%> (+0.43%)`	⬆️
cpu-python3.12-unit-test	`71.40% <88.94%> (+0.43%)`	⬆️
cpu-python3.7-unit-test	`70.90% <89.11%> (+0.46%)`	⬆️
cuda-unit-test	`83.72% <88.94%> (+0.13%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Adds NVBench-based CUDA GPU micro-benchmarks to SuperBench, including build integration, result parsing, tests, examples, and documentation updates.

Changes:

Adds NVBench submodule integration and a cuda_nvbench third-party build target.
Introduces two new micro-benchmarks (nvbench-sleep-kernel, nvbench-kernel-launch) with parsing + unit tests.
Updates Docker images, docs, and CI workflow to support required tooling (notably newer CMake for NVBench).

Reviewed changes

Copilot reviewed 20 out of 23 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
third_party/nvbench	Adds NVBench as a git submodule dependency.
third_party/Makefile	Adds `cuda_nvbench` build/install target and adjusts recipe indentation.
tests/data/nvbench_sleep_kernel.log	Adds a sample NVBench sleep-kernel output fixture for parsing tests.
tests/data/nvbench_kernel_launch.log	Adds a sample NVBench kernel-launch output fixture for parsing tests.
tests/benchmarks/micro_benchmarks/test_nvbench_sleep_kernel.py	Adds unit tests for sleep-kernel preprocess and parsing.
tests/benchmarks/micro_benchmarks/test_nvbench_kernel_launch.py	Adds unit tests for kernel-launch preprocess and parsing.
superbench/benchmarks/micro_benchmarks/nvbench_sleep_kernel.py	Implements the NVBench sleep-kernel benchmark wrapper + output parser.
superbench/benchmarks/micro_benchmarks/nvbench_kernel_launch.py	Implements the NVBench kernel-launch benchmark wrapper + output parser.
superbench/benchmarks/micro_benchmarks/nvbench_base.py	Adds a shared NVBench benchmark base class (CLI args, parsing helpers).
superbench/benchmarks/micro_benchmarks/nvbench/sleep_kernel.cu	Adds NVBench CUDA benchmark implementing a sleep/busy-wait kernel.
superbench/benchmarks/micro_benchmarks/nvbench/kernel_launch.cu	Adds NVBench CUDA benchmark for empty-kernel launch overhead.
superbench/benchmarks/micro_benchmarks/nvbench/CMakeLists.txt	Adds CMake build for NVBench-based benchmark executables.
superbench/benchmarks/micro_benchmarks/init.py	Exports the new NVBench benchmarks from the micro-benchmarks package.
examples/benchmarks/nvbench_sleep_kernel.py	Adds an example runner for the sleep-kernel benchmark.
examples/benchmarks/nvbench_kernel_launch.py	Adds an example runner for the kernel-launch benchmark.
docs/user-tutorial/benchmarks/micro-benchmarks.md	Documents the new NVBench benchmarks and their metrics.
dockerfile/rocm5.0.x.dockerfile	Updates Intel MLC download version used in the ROCm image.
dockerfile/cuda13.0.dockerfile	Installs newer CMake and builds `cuda_nvbench` in the CUDA image.
dockerfile/cuda12.9.dockerfile	Installs newer CMake and builds `cuda_nvbench` in the CUDA image.
dockerfile/cuda12.8.dockerfile	Installs newer CMake and builds `cuda_nvbench` in the CUDA image.
.gitmodules	Registers the `third_party/nvbench` submodule.
.gitignore	Ignores `compile_commands.json`.
.github/workflows/codeql-analysis.yml	Upgrades CodeQL actions to v3 and adds CMake setup for the C++ job.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

superbench/benchmarks/micro_benchmarks/nvbench_sleep_kernel.py

superbench/benchmarks/micro_benchmarks/nvbench_kernel_launch.py

superbench/benchmarks/micro_benchmarks/nvbench_base.py

superbench/benchmarks/micro_benchmarks/nvbench/sleep_kernel.cu

tests/benchmarks/micro_benchmarks/test_nvbench_sleep_kernel.py

superbench/benchmarks/micro_benchmarks/__init__.py

Copilot · 2026-01-23T00:05:47Z

.github/workflows/codeql-analysis.yml

-          DEBIAN_FRONTEND=noninteractive apt-get install -y ffmpeg libavcodec-dev libavformat-dev libavutil-dev libswresample-dev sudo
+          DEBIAN_FRONTEND=noninteractive apt-get install -y ffmpeg libavcodec-dev libavformat-dev libavutil-dev libswresample-dev sudo build-essential
+      - name: Setup CMake
+        uses: lukka/get-cmake@latest


Using @latest for third-party GitHub Actions is a supply-chain risk and can lead to non-reproducible CI behavior. Pin this action to a specific tagged version or commit SHA.

Suggested change

uses: lukka/get-cmake@latest

uses: lukka/get-cmake@v3.20.0

Copilot · 2026-01-23T00:05:47Z

third_party/Makefile

 	&& git -C msccl checkout 87048bd && git -C msccl submodule update --recursive --init
 else ifeq ($(shell echo $(CUDA_VER)">=12.8" | bc -l), 1)
-    # Get commit 87048bd from msscl to support updated nccl and sm_100
+	# Get commit 87048bd from msscl to support updated nccl and sm_100


Typo in comment: change msscl to msccl.

Suggested change

# Get commit 87048bd from msscl to support updated nccl and sm_100

# Get commit 87048bd from msccl to support updated nccl and sm_100

Copilot

Pull request overview

Copilot reviewed 20 out of 23 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

dockerfile/cuda12.9.dockerfile

examples/benchmarks/nvbench_kernel_launch.py

WenqingLan1 and others added 15 commits July 22, 2025 16:03

add nvbench kernel launch

741ee98

submodule update

0ae7864

init sleep kernel

35bfb61

Merge branch 'microsoft:main' into feat/third_party/nvbench

66b4786

Merge branch 'microsoft:main' into feat/third_party/nvbench

82aed0c

Merge branch 'microsoft:main' into feat/third_party/nvbench

24ee0a5

test sleep kernel

bd87f50

add sm 103

a663db6

add arg parsing logic

32fe197

Merge branch 'microsoft:main' into feat/third_party/nvbench

76562dc

add arg parsing tests

3eb5525

refactor

4785fe6

refine logic - remove gpu_id

1fb7c05

add doc

83c442c

refine regex & update nvbench submodule

4b274c4

WenqingLan1 requested a review from a team as a code owner October 9, 2025 23:12

WenqingLan1 added benchmarks SuperBench Benchmarks micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks labels Oct 9, 2025

WenqingLan1 added 8 commits October 10, 2025 16:48

update cmake

0cf48bb

fix lint

5905647

fix lint

baa57c9

fix import

ecce2d9

fix

3a58ead

fix

d0d8773

fix

fbb5969

fix

f007745

WenqingLan1 added 3 commits October 10, 2025 21:23

fix

b6b6082

fix

0f2c838

fix

5bd20f6

WenqingLan1 added 5 commits October 10, 2025 22:30

fix pipeline

ab88d25

fix cmake

3faaf60

fix pipeline

896a46a

fix pipeline

5d4986b

fix pipeline & mlc version

b246522

guoshzhao self-assigned this Oct 17, 2025

WenqingLan1 added 2 commits December 17, 2025 15:51

Merge branch 'microsoft:main' into feat/third_party/nvbench

ffe182e

Merge branch 'main' into feat/third_party/nvbench

2877feb

polarG requested a review from Copilot January 23, 2026 00:00

Copilot AI reviewed Jan 23, 2026

View reviewed changes

WenqingLan1 added 2 commits February 3, 2026 14:14

Merge branch 'microsoft:main' into feat/third_party/nvbench

0902eef

Merge branch 'microsoft:main' into feat/third_party/nvbench

498d551

Copilot AI review requested due to automatic review settings February 6, 2026 00:03

Copilot AI reviewed Feb 6, 2026

View reviewed changes

dockerfile/cuda12.9.dockerfile Show resolved Hide resolved

examples/benchmarks/nvbench_kernel_launch.py Show resolved Hide resolved

fix comments

0804c12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks: Micro benchmark - add nvbench based kernel-launch & sleep-kernel #750

Benchmarks: Micro benchmark - add nvbench based kernel-launch & sleep-kernel #750

Uh oh!

WenqingLan1 commented Oct 9, 2025

Uh oh!

codecov bot commented Oct 10, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# Get commit 87048bd from msscl to support updated nccl and sm_100
	# Get commit 87048bd from msccl to support updated nccl and sm_100

Benchmarks: Micro benchmark - add nvbench based kernel-launch & sleep-kernel #750

Are you sure you want to change the base?

Benchmarks: Micro benchmark - add nvbench based kernel-launch & sleep-kernel #750

Uh oh!

Conversation

WenqingLan1 commented Oct 9, 2025

Uh oh!

codecov bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Oct 10, 2025 •

edited

Loading