Skip to content

Conversation

@tohtana
Copy link
Contributor

@tohtana tohtana commented Jan 22, 2026

This PR adds examples of AutoTP training including custom partitioning partterns.

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
@tohtana tohtana requested a review from tjruwase as a code owner January 22, 2026 00:13
tohtana added a commit to deepspeedai/DeepSpeed that referenced this pull request Jan 31, 2026
This PR introduces a flexible, configuration-driven API for AutoTP
(Automatic Tensor Parallelism) that allows users to define custom layer
partitioning patterns for training.
@inkcherry @delock 

## Motivation

Previously, AutoTP relied on hardcoded layer detection logic that was
difficult to customize for new model architectures. This PR enables:

1. **Custom models**: Users can define exact regex patterns to match
their model's parameter names
2. **Fused layers**: Support for fused QKV, gate_up_proj, and other
packed weight matrices with unequal sub-parameter sizes (e.g., GQA with
different Q/K/V dimensions)
3. **Extensibility**: Easy to add new model presets or customize
existing ones

Here is an example of a config including custom partitioning patterns:

```json
{
    "tensor_parallel": {
        "autotp_size": 4,
        "partition_config": {
            "use_default_specs": false,
            "layer_specs": [
                {
                    "patterns": [".*\\.o_proj\\.weight$", ".*\\.down_proj\\.weight$"],
                    "partition_type": "row"
                },
                {
                    "patterns": [".*\\.[qkv]_proj\\.weight$"],
                    "partition_type": "column"
                },
                {
                    "patterns": [".*\\.gate_up_proj\\.weight$"],
                    "partition_type": "column",
                    "shape": [2, -1],
                    "partition_dim": 0
                }
            ]
        }
    }
}
```

Refer to the
[document](https://github.com/tohtana/DeepSpeed/blob/tohtana/autotp_custom_patterns/docs/code-docs/source/training.rst)
for more details (including preset models and how to define partitioning
for fused models).
We also opened a new
[PR](deepspeedai/DeepSpeedExamples#998) to show
the usage.


## Simplified initialization step

AutoTP previously required calling ``set_autotp_mode(training=True)``
and ``deepspeed.tp_model_init`` before ``deepspeed.initialize``. Now we
can include all the necessary configurations in the DeepSpeed config.

We still support the traditional initialization path for backward
compatibility.
When you use both (i.e. calling ``set_autotp_mode(training=True)`` and
``deepspeed.tp_model_init`` and passing the config to
``deepspeed.initialize``), we will merge the settings at initialization.
When we have conflicting settings, we will error out.

---------

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant