Skip to content

Conversation

@zianglih
Copy link

@zianglih zianglih commented Feb 3, 2026

Description

@HumansAnd

Add an NVTE_KEEP_BACKWARD_UNQUANTIZED env var for quantized fprop + high precision wgrad & dgrad.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

zianglih and others added 2 commits February 2, 2026 16:45
Signed-off-by: Ziang Li <ziangli@umich.edu>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 3, 2026

Greptile Overview

Greptile Summary

Added NVTE_KEEP_BACKWARD_UNQUANTIZED environment variable to enable quantized forward pass with high-precision backward gradients (wgrad & dgrad). When enabled, the forward pass quantizes inputs and weights to FP8/FP4, but backward gradients remain in high precision (BF16/FP32).

Critical Issues:

  • DelayedScaling recipe has assertion at line 220 that blocks quantize_backward=False, making the feature unusable with this recipe type despite env var being set
  • LayerNormMLP module crashes immediately when env var is set (line 238-240 assertion)
  • Potential AttributeError crash in quantize.py when recipe is None (line 62)

Implementation:

  • Modified all recipe classes to support quantize_backward field (defaults to True unless NVTE_KEEP_BACKWARD_UNQUANTIZED=1)
  • Updated linear operations to conditionally save high-precision tensors for backward pass
  • Test coverage is comprehensive (756 lines) but intentionally excludes DelayedScaling since it's not supported
  • Feature works correctly with Float8CurrentScaling, MXFP8BlockScaling, Float8BlockScaling, and NVFP4BlockScaling recipes

Confidence Score: 1/5

  • Critical crashes block the feature for common use cases - needs fixes before merge
  • Three critical logic errors will cause immediate crashes: (1) DelayedScaling assertion blocks env var usage for the most common recipe type, (2) LayerNormMLP crashes on any usage with env var, (3) potential None reference crash in quantize logic. These are blocking issues that prevent the feature from working in production.
  • transformer_engine/common/recipe/__init__.py (line 220 assertion), transformer_engine/pytorch/module/layernorm_mlp.py (line 238 assertion), and transformer_engine/pytorch/ops/basic/quantize.py (line 62 None check) require fixes

Important Files Changed

Filename Overview
transformer_engine/common/recipe/init.py Added quantize_backward field to all recipe classes with env var support, but DelayedScaling.__post_init__ at line 220 has assertion that blocks quantize_backward=False, making NVTE_KEEP_BACKWARD_UNQUANTIZED=1 crash immediately for this recipe type
transformer_engine/pytorch/module/layernorm_mlp.py Added assertion at line 238-240 that crashes when NVTE_KEEP_BACKWARD_UNQUANTIZED=1 is set, making LayerNormMLP completely unusable with this env var
transformer_engine/pytorch/ops/basic/quantize.py Added recipe override logic, but line 62 calls FP8GlobalStateManager.get_fp8_recipe().quantize_backward without None check - will crash if recipe is None
transformer_engine/pytorch/ops/basic/basic_linear.py Added keep_backward_unquantized support to conditionally save high-precision tensors and skip backward quantization - implementation looks correct
transformer_engine/pytorch/module/linear.py Sets save_original_input=True when keep_backward_unquantized is enabled - works correctly with Float8CurrentScaling and other non-DelayedScaling recipes
tests/pytorch/test_keep_backward_unquantized.py Comprehensive test file with 756 lines covering the feature - notably excludes DelayedScaling from test cases, only testing Float8CurrentScaling, MXFP8BlockScaling, Float8BlockScaling, and NVFP4BlockScaling

Sequence Diagram

sequenceDiagram
    participant User
    participant Recipe
    participant Linear
    participant BasicLinear
    participant Quantize

    User->>Recipe: Set NVTE_KEEP_BACKWARD_UNQUANTIZED=1
    Recipe->>Recipe: quantize_backward = False
    
    Note over Recipe: DelayedScaling: CRASHES HERE<br/>(assertion at line 220)
    
    User->>Linear: forward(input)
    Linear->>Linear: keep_backward_unquantized = True
    Linear->>Linear: save_original_input = True
    
    Linear->>Quantize: quantize(input)
    Quantize->>Quantize: Check recipe.quantize_forward
    Note over Quantize: Potential crash if recipe is None
    Quantize-->>Linear: quantized_input (FP8)
    
    Linear->>BasicLinear: forward(quantized_input, weight)
    BasicLinear->>BasicLinear: Save high-precision input for backward
    BasicLinear-->>Linear: output
    
    User->>Linear: backward(grad_output)
    Linear->>BasicLinear: backward(grad_output)
    Note over BasicLinear: Uses high-precision saved tensors<br/>Skip quantization in backward
    BasicLinear->>BasicLinear: wgrad = grad_output @ input_hp
    BasicLinear->>BasicLinear: dgrad = grad_output @ weight_hp
    BasicLinear-->>Linear: grad_input (high precision)
    Linear-->>User: gradients (BF16/FP32)
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@zianglih
Copy link
Author

zianglih commented Feb 3, 2026

I'll work on potential unit test breakage.

Signed-off-by: Ziang Li <ziangli@umich.edu>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Ziang Li <ziangli@umich.edu>
Signed-off-by: Ziang Li <ziangli@umich.edu>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

zianglih and others added 2 commits February 3, 2026 09:56
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Signed-off-by: Ziang Li <ziangli@umich.edu>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

… is used

Signed-off-by: Ziang Li <ziangli@umich.edu>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

ln_out_return = None
if return_layernorm_output or return_layernorm_output_gathered:
ln_out_return = ln_out
ln_out_hp = ln_out if keep_backward_unquantized else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storing both ln_out (quantized) and ln_out_hp (high precision) doubles the memory footprint for this activation

verify this memory overhead is acceptable for your target models, especially during training with large batch sizes or long sequences

Signed-off-by: Ziang Li <ziangli@umich.edu>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Ziang Li <ziangli@umich.edu>
Signed-off-by: Ziang Li <ziangli@umich.edu>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Ziang Li <ziangli@umich.edu>
Signed-off-by: Ziang Li <ziangli@umich.edu>
Signed-off-by: Ziang Li <ziangli@umich.edu>
Signed-off-by: Ziang Li <ziangli@umich.edu>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Signed-off-by: Ziang Li <ziangli@umich.edu>
Signed-off-by: Ziang Li <ziangli@umich.edu>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 57 to 60
# Check if FP8 is enabled
fp8_enabled = FP8GlobalStateManager.is_fp8_enabled()
quantize_forward = fp8_enabled and self._quantize_forward
quantize_backward = fp8_enabled and self._quantize_backward
quantize_backward = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recipe None crash

FP8GlobalStateManager.get_fp8_recipe() can be None (e.g., if FP8 is enabled but no recipe was set), so ...get_fp8_recipe().quantize_backward will raise an AttributeError. This makes Quantize.forward() crash in that configuration; please guard for None (e.g., default to quantized backward when recipe is unset, or explicitly error with a clearer message).

zianglih and others added 2 commits February 9, 2026 12:33
Signed-off-by: Ziang Li <ziangli@umich.edu>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

assert not (
not self.quantize_forward and self.quantize_backward
), "Invalid recipe configuration: quantize_backward=True requires quantize_forward=True."
assert self.quantize_backward, "Delayed scaling does not support quantize_backward=False."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertion prevents using NVTE_KEEP_BACKWARD_UNQUANTIZED=1 with DelayedScaling recipe - when env var is set, quantize_backward becomes False, making this assert fail and blocking the entire feature for this recipe type

Comment on lines +238 to +240
assert (
not keep_backward_unquantized
), "NVTE_KEEP_BACKWARD_UNQUANTIZED is not implemented in LayerNormMLP"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertion makes LayerNormMLP crash immediately when NVTE_KEEP_BACKWARD_UNQUANTIZED=1 is set - if this module doesn't support the feature, either implement it or handle more gracefully with a clear error message before reaching this point

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 9, 2026

Additional Comments (1)

transformer_engine/common/recipe/__init__.py
Float8BlockScaling is missing field declarations for quantize_forward and quantize_backward, but they're used in __post_init__ (line 407) and __repr__ (line 425-426) - will cause AttributeError at runtime

    fp8_mha: bool = False
    quantize_forward: bool = True
    quantize_backward: bool = not (os.getenv("NVTE_KEEP_BACKWARD_UNQUANTIZED", "0") == "1")

Signed-off-by: Ziang Li <ziangli@umich.edu>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 1019 to +1026

# Save state for backward pass
if ctx.requires_grad:
saved_input = input_ if keep_backward_unquantized else x_local
saved_weight = self.weight if keep_backward_unquantized else w
if is_cpu_offload_enabled():
mark_activation_offload(x_local)
ctx.save_for_backward(x_local, w)
ctx.with_quantized_compute = with_quantized_compute
mark_activation_offload(saved_input)
ctx.save_for_backward(saved_input, saved_weight)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary saved tensors

The forward path now saves saved_input/saved_weight whenever ctx.requires_grad is true, even when weight_requires_grad or input_requires_grad is false. In cases like frozen weights (common for finetune) or when only one side needs grads, this will save extra tensors and can materially increase activation memory. The prior if not weight_requires_grad: saved_input = None / if not input_requires_grad: saved_weight = None logic avoided that.

Signed-off-by: Ziang Li <ziangli@umich.edu>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 9, 2026

Additional Comments (1)

transformer_engine/pytorch/ops/fused/userbuffers_forward_linear.py
Backward remains quantized
NVTE_KEEP_BACKWARD_UNQUANTIZED is implemented across other linear paths by saving high-precision tensors for backward and setting ctx.with_quantized_compute = fp8 && !keep_backward_unquantized. In the Userbuffers forward fusion, the forward path never computes/propagates keep_backward_unquantized and always saves x_local, w (potentially quantized) and sets linear_op_ctx.with_quantized_compute = with_quantized_compute, so UB-enabled execution will still take the quantized-backward path even when recipe.quantize_backward=False (i.e., when the feature is intended to disable quantized backward).

@zianglih
Copy link
Author

Currently, without NVTE_KEEP_BACKWARD_UNQUANTIZED , unit test is aligned with main:
te-2644.log
te-main.log

Comment on lines +565 to +568
quantize_forward : bool, default = True
Whether to quantize tensors in the forward pass.
quantize_backward : bool, default = True
Whether to quantize tensors in the backward pass.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need that for the custom recipe, since there we can just specify the quantizers we want, but sure, we can have it to keep the API consistent.

)
assert (
not keep_backward_unquantized
), "NVTE_KEEP_BACKWARD_UNQUANTIZED is not implemented in LayerNormMLP"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate.

)
if keep_backward_unquantized:
# Note, NVTE_KEEP_BACKWARD_UNQUANTIZED is ignored when delayed scaling is used
save_original_input = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also make sure that we don't create the columnwise version of the input.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptrendx
Copy link
Member

ptrendx commented Feb 10, 2026

@zianglih Thank you for your contribution!
Apart from the specific ones, 2 general comments:

  • the quantize_forward option seems unnecessary -> it has to be True for the quantize_backward to be True and if it is False it basically means that the entire quantization is false (which is the field we already have in autocast anyway). Because of that I think it should be removed since it is confusing.
  • Could you add some tests for the new functionality? Otherwise it will be very easy to inadvertently break this when making changes.

zianglih and others added 3 commits February 10, 2026 14:02
Signed-off-by: Ziang Li <ziangli@umich.edu>
Signed-off-by: Ziang Li <ziangli@umich.edu>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

assert not (
not self.quantize_forward and self.quantize_backward
), "Invalid recipe configuration: quantize_backward=True requires quantize_forward=True."
assert self.quantize_backward, "Delayed scaling does not support quantize_backward=False."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocks NVTE_KEEP_BACKWARD_UNQUANTIZED=1 with DelayedScaling - when env var is set, quantize_backward becomes False, triggering this assertion and preventing the feature from working with this recipe type

Comment on lines +238 to +239
assert (
not keep_backward_unquantized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hard crash when NVTE_KEEP_BACKWARD_UNQUANTIZED=1 - LayerNormMLP becomes completely unusable with this env var

@zianglih
Copy link
Author

Hi @zhongbozhu @timmoon10 @ptrendx , thank you so much for reviewing!

I have implemented and added the unit test. All new tests passed:

root@B200-55:~/TransformerEngine# NVTE_KEEP_BACKWARD_UNQUANTIZED=1 python3 -m pytest -v -s tests/pytorch/test_keep_backward_unquantized.py
=================================================================================== test session starts ====================================================================================
platform linux -- Python 3.12.3, pytest-8.2.1, pluggy-1.6.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/root/TransformerEngine/.hypothesis/examples'))
rootdir: /root/TransformerEngine
configfile: pyproject.toml
plugins: typeguard-4.4.4, anyio-4.12.1, shard-0.1.2, flakefinder-1.1.0, xdist-3.8.0, rerunfailures-16.1, hypothesis-6.130.8, xdoctest-1.0.2
collected 112 items                                                                                                                                                                        
Running 112 items in this shard: tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_autocast_respects_quantize_forward_flag, tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_quantize_op_respects_recipe_overrides, tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_is_invalid_for_delayed_scaling, tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_not_implemented_for_layernorm_mlp

tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-Float8BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-Float8BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-Float8BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-Float8BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-Float8BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-Float8BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-NVFP4BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-NVFP4BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-NVFP4BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-NVFP4BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-Float8BlockScaling] SKIPPED (Fusible ops (te_op...)
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-Float8BlockScaling] SKIPPED (Fusible ops (te_...)
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-Float8BlockScaling] SKIPPED (Fus...)
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-Float8BlockScaling] SKIPPED (Fus...)
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_autocast_respects_quantize_forward_flag PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_quantize_op_respects_recipe_overrides PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_is_invalid_for_delayed_scaling PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_not_implemented_for_layernorm_mlp PASSED

===================================================================================== warnings summary =====================================================================================
../../usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:1480
../../usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:1480
  /usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:1480: DeprecationWarning: `torch.jit.script` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

../../usr/local/lib/python3.12/dist-packages/torch/library.py:357
  /usr/local/lib/python3.12/dist-packages/torch/library.py:357: UserWarning: Warning only once for all operators,  other operators may also be overridden.
    Overriding a previously registered kernel for the same operator and the same dispatch key
    operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor
      registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:926
    dispatch key: ADInplaceOrView
    previous kernel: no debug info
         new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:926 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.)
    self.m.impl(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================== 98 passed, 14 skipped, 3 warnings in 7.21s ========================================================================

@zianglih
Copy link
Author

Hi @timmoon10 , @ptrendx,

the quantize_forward option seems unnecessary

This design was from @timmoon10 's comment here: #2644 (comment)

This option name is specific to this workflow and doesn't generalize well. How about we break this up into two options: quantize_forward and quantize_backward. We have the following cases:

  • quantize_forward=True, quantize_backward=True: Equivalent to quantized case. In the future we might be able to replace FP8GlobalStateManager.FP8_ENABLED with FP8GlobalStateManager.QUANTIZE_FORWARD or FP8GlobalStateManager.QUANTIZE_BACKWARD.
  • quantize_forward=False, quantize_backward=False: Equivalent to unquantized case.
  • quantize_forward=True, quantize_backward=False: Your desired workflow.
  • quantize_forward=False, quantize_backward=True: We can error out in this case, but who know if someone in the future might want this.

Which way do we prefer? Thanks!

Signed-off-by: Ziang Li <ziangli@umich.edu>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

assert not (
not self.quantize_forward and self.quantize_backward
), "Invalid recipe configuration: quantize_backward=True requires quantize_forward=True."
assert self.quantize_backward, "Delayed scaling does not support quantize_backward=False."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocks NVTE_KEEP_BACKWARD_UNQUANTIZED=1 with DelayedScaling recipe

when env var is set, quantize_backward becomes False, but this assertion requires it to be True - the feature cannot work with this recipe type at all

Suggested change
assert self.quantize_backward, "Delayed scaling does not support quantize_backward=False."
# Note: DelayedScaling does not support quantize_backward=False yet

Comment on lines +238 to +240
assert (
not keep_backward_unquantized
), "NVTE_KEEP_BACKWARD_UNQUANTIZED is not implemented in LayerNormMLP"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hard crash when NVTE_KEEP_BACKWARD_UNQUANTIZED=1

setting the env var makes LayerNormMLP completely unusable - crashes immediately on first use

Suggested change
assert (
not keep_backward_unquantized
), "NVTE_KEEP_BACKWARD_UNQUANTIZED is not implemented in LayerNormMLP"
if keep_backward_unquantized:
raise NotImplementedError(
"NVTE_KEEP_BACKWARD_UNQUANTIZED is not yet implemented in LayerNormMLP"
)

Comment on lines +62 to +68
# Recipe quantize overrides
if FP8GlobalStateManager.get_fp8_recipe() is not None:
quantize_forward = (
quantize_forward and FP8GlobalStateManager.get_fp8_recipe().quantize_forward
)
quantize_backward = (
quantize_backward and FP8GlobalStateManager.get_fp8_recipe().quantize_backward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_fp8_recipe() returns None when FP8 is enabled but no recipe set

calling .quantize_backward on None will crash with AttributeError

Suggested change
# Recipe quantize overrides
if FP8GlobalStateManager.get_fp8_recipe() is not None:
quantize_forward = (
quantize_forward and FP8GlobalStateManager.get_fp8_recipe().quantize_forward
)
quantize_backward = (
quantize_backward and FP8GlobalStateManager.get_fp8_recipe().quantize_backward
# Recipe quantize overrides
recipe = FP8GlobalStateManager.get_fp8_recipe()
if recipe is not None:
quantize_forward = quantize_forward and recipe.quantize_forward
quantize_backward = quantize_backward and recipe.quantize_backward

@zianglih
Copy link
Author

Full unit tests results, with the newly added test_keep_backward_unquantized:
te-2644.log

ziang-and pushed a commit to zianglih/TransformerEngine that referenced this pull request Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants