Support custom partitioning patterns for AutoTP #7806

tohtana · 2026-01-22T00:16:47Z

This PR introduces a flexible, configuration-driven API for AutoTP (Automatic Tensor Parallelism) that allows users to define custom layer partitioning patterns for training.
@inkcherry @delock

Motivation

Previously, AutoTP relied on hardcoded layer detection logic that was difficult to customize for new model architectures. This PR enables:

Custom models: Users can define exact regex patterns to match their model's parameter names
Fused layers: Support for fused QKV, gate_up_proj, and other packed weight matrices with unequal sub-parameter sizes (e.g., GQA with different Q/K/V dimensions)
Extensibility: Easy to add new model presets or customize existing ones

Here is an example of a config including custom partitioning patterns:

{
    "tensor_parallel": {
        "autotp_size": 4,
        "partition_config": {
            "use_default_specs": false,
            "layer_specs": [
                {
                    "patterns": [".*\\.o_proj\\.weight$", ".*\\.down_proj\\.weight$"],
                    "partition_type": "row"
                },
                {
                    "patterns": [".*\\.[qkv]_proj\\.weight$"],
                    "partition_type": "column"
                },
                {
                    "patterns": [".*\\.gate_up_proj\\.weight$"],
                    "partition_type": "column",
                    "shape": [2, -1],
                    "partition_dim": 0
                }
            ]
        }
    }
}

Refer to the document for more details (including preset models and how to define partitioning for fused models).
We also opened a new PR to show the usage.

Simplified initialization step

AutoTP previously required calling set_autotp_mode(training=True) and deepspeed.tp_model_init before deepspeed.initialize. Now we can include all the necessary configurations in the DeepSpeed config.

We still support the traditional initialization path for backward compatibility.
When you use both (i.e. calling set_autotp_mode(training=True) and deepspeed.tp_model_init and passing the config to deepspeed.initialize), we will merge the settings at initialization. When we have conflicting settings, we will error out.

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

delock · 2026-01-23T02:32:49Z

I like the way to use layer specs which provides much more flexibility, and easy to use from presets. I see presets are defined as layer specs in DeepSpeed code, probably add a link to preset code in the documents (
https://github.com/tohtana/DeepSpeed/blob/tohtana/autotp_custom_patterns/docs/code-docs/source/training.rst#preset-based-partitioning
and
https://github.com/tohtana/DeepSpeed/blob/tohtana/autotp_custom_patterns/docs/code-docs/source/training.rst#supported-models) then invite user to contribute presets with PR would be a good idea.

docs/_tutorials/autotp-training.md

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

tohtana · 2026-01-23T21:49:38Z

Thank you @delock, I added links to the preset code and messages to welcome PRs. I hope the community will add support of new models.

sfc-gh-truwase · 2026-01-29T17:55:42Z

@delock or @inkcherry can you please help with review and approval of this PR?

delock · 2026-01-30T08:13:22Z

deepspeed/module_inject/autotp_config.py

+        if not matches:
+            return None
+        if len(matches) > 1:
+            warning_once(f"AutoTPConfig: parameter {param_name} matched multiple layer_specs; using the first match.")


In case of more than one matched, should show the matching specs in the warning as well, so user can judge whether this is intended.

Great catch, I added the warning as you suggested.

delock · 2026-01-30T08:44:11Z

deepspeed/__init__.py

+    if not isinstance(config, dict):
+        config = load_ds_config(config)
+
+    mesh_device = None


I saw sequence parallel (mesh) related code is moved down here. Is there a reason?

It was actually a fix of a pre-existing bug. config can be a file path, but we refer to it as a dictionary to initialize a device mesh. We need to have load_ds_config earlier than the initialization.

delock · 2026-01-30T09:22:43Z

Hi @tohtana, overall this looks good to me, I gave some minor suggestion and questions. I'll approve and let me know when you want to merge it. Thanks!

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

tohtana added 2 commits January 21, 2026 15:58

Support custom partitioning patterns for AutoTP

0cf73fe

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

fix codeblock lang

c592949

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

tohtana requested review from hwchen2017, loadams and tjruwase as code owners January 22, 2026 00:16

inkcherry reviewed Jan 23, 2026

View reviewed changes

docs/_tutorials/autotp-training.md Show resolved Hide resolved

tohtana requested a review from inkcherry January 23, 2026 06:22

tohtana added 2 commits January 23, 2026 13:28

add links to preset code and messages to welcome PRs

f2b8c49

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

update blog

a518a27

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

delock self-assigned this Jan 30, 2026

delock reviewed Jan 30, 2026

View reviewed changes

delock approved these changes Jan 30, 2026

View reviewed changes

tohtana and others added 3 commits January 30, 2026 14:48

show warning when multiple matches found

040316e

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

fix applying custom patterns

9d1354c

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

Merge branch 'master' into tohtana/autotp_custom_patterns

f0e1a5d

tohtana enabled auto-merge (squash) January 31, 2026 09:11

tohtana merged commit 6b9cab1 into deepspeedai:master Jan 31, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support custom partitioning patterns for AutoTP #7806

Support custom partitioning patterns for AutoTP #7806

Uh oh!

tohtana commented Jan 22, 2026 •

edited

Loading

Uh oh!

delock commented Jan 23, 2026

Uh oh!

Uh oh!

tohtana commented Jan 23, 2026

Uh oh!

sfc-gh-truwase commented Jan 29, 2026

Uh oh!

delock Jan 30, 2026

Uh oh!

tohtana Jan 30, 2026

Uh oh!

delock Jan 30, 2026

Uh oh!

tohtana Jan 30, 2026

Uh oh!

delock commented Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Support custom partitioning patterns for AutoTP #7806

Support custom partitioning patterns for AutoTP #7806

Uh oh!

Conversation

tohtana commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Simplified initialization step

Uh oh!

delock commented Jan 23, 2026

Uh oh!

Uh oh!

tohtana commented Jan 23, 2026

Uh oh!

sfc-gh-truwase commented Jan 29, 2026

Uh oh!

delock Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

tohtana Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

delock Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

tohtana Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

delock commented Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tohtana commented Jan 22, 2026 •

edited

Loading