DLNetBench Overview

DLNetBench is a proxy framework for benchmarking distributed deep neural network (DNN) training. It is designed to simulate the performance characteristics of large-scale distributed training workloads without running full end-to-end models.

The framework supports transformer-based architectures, including:

Encoder-only models
Decoder-only models

DLNetBench focuses on evaluating communication, parallelization strategies, and backend behavior in distributed environments.

Supported Features

Parallelization Strategies

Data Parallelism (DP)
Fully Sharded Data Parallelism (FSDP)
Data Parallelism + Pipeline Parallelism (DP + PP)
Data Parallelism + Pipeline Parallelism + Tensor Parallelism (DP + PP + TP)
Data Parallelism + Pipeline Parallelism + Mixture of Experts (DP + PP + MoE)

Communication Backends

MPI
MPI (CUDA-aware)
NCCL
RCCL
oneCCL

Data Types

float32
float16 (not supported with MPI and MPI CUDA-aware backends)

Adding a New System Makefile

Overview

The build system is split into three layers:

Makefile.<SYSTEM>     ← you write this (paths, compiler, configs)
Makefile.flags.mk     ← translates declarations into compiler flags
Makefile.common       ← all build targets, never edited

A config is a named build variant (e.g. nccl, rccl_profiler). Each config selects a backend and optional features, and produces its own output directory under build/<config>/.

Step-by-step: Adding a new system

1. Create `Makefile.<YOURSYSTEM>`

# Makefile.MYSYSTEM

# Compiler to use
CXX := mpicxx

# ── Paths ────────────────────────────────────────────────
# Explicitly set any paths that cannot be auto-detected.
# Paths not set here are auto-detected (see "Path auto-detection" below).
CUDA_HOME := /opt/cuda/12.4
NCCL_HOME := /opt/nccl/2.21

# ── Configs ───────────────────────────────────────────────
CONFIGS += nccl
BACKEND_nccl := nccl

CONFIGS += nccl_nvml
BACKEND_nccl_nvml   := nccl
WITH_NVML_nccl_nvml := 1

include Makefile.flags.mk
include Makefile.common

2. Run it

make -f Makefile.MYSYSTEM          # build all configs
make -f Makefile.MYSYSTEM cuda     # build one config
make -f Makefile.MYSYSTEM clean    # wipe build/

Backend reference

The backend name identifies the communication library, which determines both the collective backend and the accelerator runtime:

`BACKEND_<config>`	Preprocessor defines	Collectives	Accelerator
`nccl`	`PROXY_ENABLE_CUDA` + `PROXY_ENABLE_NCCL`	NCCL	CUDA GPU
`rccl`	`PROXY_ENABLE_HIP` + `PROXY_ENABLE_RCCL`	RCCL	AMD GPU
`oneccl`	`PROXY_ENABLE_ONECCL`	oneCCL	SYCL / Intel GPU
`mpi_gpu_cuda`	`PROXY_ENABLE_CUDA`	MPI only	CUDA GPU
`mpi_gpu_hip`	`PROXY_ENABLE_HIP`	MPI only	AMD GPU
`mpi_cpu`	(none)	MPI only	CPU buffers

WITH_MPI := 1 additionally emits -DWITH_MPI -DCCUTILS_ENABLE_MPI for every config in the system file.

Optional feature flags

Set per config:

Variable	Effect
`WITH_NVML_<config> := 1`	Adds `-DNVML -lnvidia-ml`
`WITH_ENERGY_PROFILER_<config> := 1`	Adds `-DPROXY_ENERGY_PROFILING`, links `libpower_profiler`

Path auto-detection

If a path variable is not set, the build system attempts auto-detection:

Variable	Detection method
`CUDA_HOME`	`dirname $(dirname $(which nvcc))`
`NCCL_HOME`	Checks `$(CUDA_HOME)/include/nccl.h`, then `/usr/include/nccl.h`
`ROCM_HOME`	`$ROCM_PATH` → `$EBROOTROCM` → newest `/opt/rocm-*` → `/opt/rocm`
`RCCL_HOME`	Defaults to `$(ROCM_HOME)` (RCCL is bundled in ROCm)
`ONECCL_HOME`	`$CCL_ROOT` → `$ONECCL_ROOT` (set by Intel's `setvars.sh`)
`MPI_HOME`	`dirname $(dirname $(which mpicc))`
`CCUTILS_INCLUDE`	`$(HOME)/.local/ccutils/install/include`
`ENERGY_PROFILER_HOME`	`$(HOME)/.local`

Override any of these by setting the variable before the include lines.

Output layout

bin/
  nccl/
    cpp/
      data_parallel/
        dp
        fsdp
        dp_loop
        fsdp_loop
      hybrid_parallel/
        hybrid_2d
        hybrid_3d
        hybrid_3d_moe
        hybrid_2d_loop
        hybrid_3d_loop
        hybrid_3d_moe_loop
  rccl/ ...

Each config is fully self-contained under its own directory. Configs do not share build artifacts.

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

Acknowledgments

This project is inspired by and builds upon prior work in the area of distributed DNN proxies:

DNN-cpp-proxies – C++ proxy implementations for distributed deep neural network training
Special thanks to Shigang Li for the original implementation and foundational research contributions

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
cpp		cpp
model_stats		model_stats
models		models
plots		plots
python		python
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile.ALPS		Makefile.ALPS
Makefile.LEONARDO		Makefile.LEONARDO
Makefile.common		Makefile.common
Makefile.flags.mk		Makefile.flags.mk
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DLNetBench Overview

Supported Features

Parallelization Strategies

Communication Backends

Data Types

Adding a New System Makefile

Overview

Step-by-step: Adding a new system

1. Create `Makefile.<YOURSYSTEM>`

2. Run it

Backend reference

Optional feature flags

Path auto-detection

Output layout

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

HicrestLaboratory/DLNetBench

Folders and files

Latest commit

History

Repository files navigation

DLNetBench Overview

Supported Features

Parallelization Strategies

Communication Backends

Data Types

Adding a New System Makefile

Overview

Step-by-step: Adding a new system

1. Create Makefile.<YOURSYSTEM>

2. Run it

Backend reference

Optional feature flags

Path auto-detection

Output layout

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. Create `Makefile.<YOURSYSTEM>`

Packages