cp: `docs: Adding dtensor TP debugging summary (1767)` into `r0.5.0` by chtruong814 · Pull Request #1777 · NVIDIA-NeMo/RL

chtruong814 · 2026-01-15T01:38:13Z

beep boop [🤖]: Hi @joyang-nv 👋,

we've cherry picked #1767 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

Documentation
- Added a comprehensive guide on tensor parallelism accuracy for reinforcement learning training, including analysis of training-inference misalignments, performance degradation mechanisms, root cause investigation of cross-device reductions and kernel batching effects, strategy comparisons, practical mitigation recommendations with code examples, and supporting performance metrics.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-15T01:41:55Z

📝 Walkthrough

Walkthrough

A new comprehensive documentation guide detailing accuracy challenges encountered with DTensor tensor parallelism in RL training, covering root causes such as batch-variant kernels and cross-device reductions. The guide includes mitigation strategies and concrete code examples. Documentation index updated to include the new guide.

Changes

Cohort / File(s)	Summary
Documentation Addition `docs/guides/dtensor-tp-accuracy.md`	New comprehensive guide documenting numerical accuracy issues with DTensor tensor parallelism, including analysis of root causes (batch-variant kernels, row-wise sharding), observed phenomena (token\_mult\_prob\_error spikes, reward-model discrepancies), and concrete mitigation recommendations
Documentation Index `docs/index.md`	Updated Guides toctree to include the new dtensor-tp-accuracy guide

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

documentation, CI:docs, r0.5.0

Suggested reviewers

RayenTian
yuki-97

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly indicates the main change: adding a dtensor TP debugging/accuracy summary document to the r0.5.0 branch via cherry-pick.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes	✅ Passed	This is a documentation-only PR with no code changes affecting numerics, convergence, or performance. The custom check targets major code changes that could introduce regressions, but this PR merely adds a comprehensive guide documenting observed accuracy issues with DTensor tensor parallelism.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@docs/guides/dtensor-tp-accuracy.md`:
- Line 67: The <img> tag referencing
"../assets/dtensor-tp-accuracy/validation_accuracy.png" is missing alt text; add
a descriptive alt attribute to the image element (e.g., alt="Validation accuracy
over training steps for DTensor tensor-parallel experiment") so screen readers
and accessibility tools can convey the image content; update the <img
src="../assets/dtensor-tp-accuracy/validation_accuracy.png" ... /> element to
include this alt attribute.
- Line 95: The image markdown line for the figure
("![](../assets/dtensor-tp-accuracy/logprobs_unequal_1.png)") is missing alt
text; update that markdown to include a concise descriptive alt string (for
example: alt="Plot of log-probabilities showing unequal values across tensor
parallel partitions" or similar) so screen readers can convey the image content
and purpose.
- Around line 47-51: The markdown table block starting with the line beginning
"|               | TP=1   | TP=2..." and ending with the "<p
align="center"><em>Table 1: The validation loss of reward model
training</em></p>" line needs a blank line inserted immediately before the table
and another blank line immediately after the caption paragraph so the table and
its caption are surrounded by empty lines for correct Markdown rendering.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0cced7 and 8efd68d.

⛔ Files ignored due to path filters (6)

docs/assets/dtensor-tp-accuracy/image-20260111142255534.png is excluded by !**/*.png
docs/assets/dtensor-tp-accuracy/image-20260111160656891-1768118824549-2.png is excluded by !**/*.png
docs/assets/dtensor-tp-accuracy/kl_hf_prev.png is excluded by !**/*.png
docs/assets/dtensor-tp-accuracy/logprobs_unequal_1.png is excluded by !**/*.png
docs/assets/dtensor-tp-accuracy/token_mult_prob_error_qwen3_4B.png is excluded by !**/*.png
docs/assets/dtensor-tp-accuracy/validation_accuracy.png is excluded by !**/*.png

📒 Files selected for processing (2)

docs/guides/dtensor-tp-accuracy.md
docs/index.md

🧰 Additional context used

📓 Path-based instructions (2)

docs/**/*.md

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Update docs/index.md when a new markdown doc is added under docs/**/*.md or a markdown file is renamed, ensuring the document appears in the most appropriate section

Files:

docs/index.md
docs/guides/dtensor-tp-accuracy.md

!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

docs/index.md
docs/guides/dtensor-tp-accuracy.md

🪛 LanguageTool

docs/guides/dtensor-tp-accuracy.md

[style] ~13-~13: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...under different TP configurations. 3. For overall model training performance: U...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~179-~179: Consider using a different verb to strengthen your wording.
Context: ...\right]$. ### Root Cause Our analysis shows that the **row-wise (colwise) tensor pa...

(SHOW_INDICATE)

🪛 markdownlint-cli2 (0.18.1)

docs/guides/dtensor-tp-accuracy.md

50-50: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

67-67: Images should have alternate text (alt text)

(MD045, no-alt-text)

95-95: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: sphinx-build / Build docs
GitHub Check: Lint check
GitHub Check: sphinx-build / Build docs
GitHub Check: build-container / main
GitHub Check: Lint check
GitHub Check: build-container / main
GitHub Check: sphinx-build / Build docs
GitHub Check: Lint check
GitHub Check: Post automodel integration comment / Comment on PR
GitHub Check: Post submodule check comment / Comment on PR

🔇 Additional comments (2)

docs/guides/dtensor-tp-accuracy.md (1)

1-242: Excellent comprehensive documentation on DTensor TP accuracy issues.

This guide provides thorough coverage of tensor parallelism accuracy challenges in RL training, including:

Well-documented observed issues with concrete examples and metrics

Clear mathematical explanations of row-wise vs column-wise parallelism

Actionable mitigation strategies with code examples

Proper context with references to external research

The technical content is accurate, the structure is logical, and the explanations are detailed enough for practitioners to understand and apply the solutions.

docs/index.md (1)

219-219: Documentation index correctly updated.

The new DTensor TP accuracy guide has been properly added to the Guides section of the documentation index, satisfying the coding guideline requirement to update docs/index.md when adding new markdown documentation.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-15T01:41:58Z

+|               | TP=1   | TP=2   | TP=4   | TP=8   |
+| ------------- | ------ | ------ | ------ | ------ |
+| With autocast | 0.6035 | 0.6010 | 0.5864 | 0.6021 |
+| W/O autocast  | 0.6035 | 0.6010 | 0.5864 | 0.6021 |
+<p align="center"><em>Table 1: The validation loss of reward model training</em></p> 


⚠️ Potential issue | 🟡 Minor

Add blank lines around the table for proper markdown formatting.

Table 1 should be surrounded by blank lines according to markdown style guidelines. This ensures consistent rendering across different markdown parsers.

📝 Proposed fix

the discrepancy. Instead, these results imply that the primary source of inconsistency lies in how different TP plans partition and aggregate computations across devices, rather than in precision loss from autocast. | | TP=1 | TP=2 | TP=4 | TP=8 | | ------------- | ------ | ------ | ------ | ------ | | With autocast | 0.6035 | 0.6010 | 0.5864 | 0.6021 | | W/O autocast | 0.6035 | 0.6010 | 0.5864 | 0.6021 | + Table 1: The validation loss of reward model training

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

| | TP=1 | TP=2 | TP=4 | TP=8 |

| ------------- | ------ | ------ | ------ | ------ |

| With autocast | 0.6035 | 0.6010 | 0.5864 | 0.6021 |

| W/O autocast | 0.6035 | 0.6010 | 0.5864 | 0.6021 |

Table 1: The validation loss of reward model training

| | TP=1 | TP=2 | TP=4 | TP=8 |

| ------------- | ------ | ------ | ------ | ------ |

| With autocast | 0.6035 | 0.6010 | 0.5864 | 0.6021 |

| W/O autocast | 0.6035 | 0.6010 | 0.5864 | 0.6021 |

Table 1: The validation loss of reward model training

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

50-50: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

🤖 Prompt for AI Agents

In `@docs/guides/dtensor-tp-accuracy.md` around lines 47 - 51, The markdown table block starting with the line beginning "| | TP=1 | TP=2..." and ending with the "Table 1: The validation loss of reward model training" line needs a blank line inserted immediately before the table and another blank line immediately after the caption paragraph so the table and its caption are surrounded by empty lines for correct Markdown rendering.

coderabbitai · 2026-01-15T01:41:58Z

+
+<p align="center"><em>Fig 2: The reward of Qwen3-4B</em></p>
+
+<img src="../assets/dtensor-tp-accuracy/validation_accuracy.png" style="zoom:33%;" />


⚠️ Potential issue | 🟡 Minor

Add alt text to the image for accessibility.

The image reference is missing alt text, which is important for accessibility and screen readers.

♿ Proposed fix

-<img src="../assets/dtensor-tp-accuracy/validation_accuracy.png" style="zoom:33%;" /> +<img src="../assets/dtensor-tp-accuracy/validation_accuracy.png" alt="Validation accuracy comparison for Qwen3-4B under different TP configurations" style="zoom:33%;" />

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

<img src="../assets/dtensor-tp-accuracy/validation_accuracy.png" style="zoom:33%;" />

<img src="../assets/dtensor-tp-accuracy/validation_accuracy.png" alt="Validation accuracy comparison for Qwen3-4B under different TP configurations" style="zoom:33%;" />

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

67-67: Images should have alternate text (alt text)

(MD045, no-alt-text)

🤖 Prompt for AI Agents

In `@docs/guides/dtensor-tp-accuracy.md` at line 67, The <img> tag referencing "../assets/dtensor-tp-accuracy/validation_accuracy.png" is missing alt text; add a descriptive alt attribute to the image element (e.g., alt="Validation accuracy over training steps for DTensor tensor-parallel experiment") so screen readers and accessibility tools can convey the image content; update the <img src="../assets/dtensor-tp-accuracy/validation_accuracy.png" ... /> element to include this alt attribute.

coderabbitai · 2026-01-15T01:41:58Z

+
+This ratio is the standard importance ratio used in off-policy RL to reweight returns when the data are collected under an older behavior policy. In on-policy training, this ratio should be exactly 1. However, in our experiments, we observed cases where the ratio deviates from 1, indicating a mismatch between the intended on-policy setting and the actual behavior of the system. Figure 4 and Figure 5 illustrate this phenomenon by showing the mismatch between `prev_logprobs` and `current_logprobs` under TP=4, as well as the reward curves under TP=4 and TP=1 for the `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` model.
+
+![](../assets/dtensor-tp-accuracy/logprobs_unequal_1.png)


⚠️ Potential issue | 🟡 Minor

Add alt text to the image for accessibility.

The image reference is missing alt text, which is important for accessibility and screen readers.

♿ Proposed fix

-![](../assets/dtensor-tp-accuracy/logprobs_unequal_1.png) +![Mismatch between prev_logprobs and current_logprobs under TP=4](../assets/dtensor-tp-accuracy/logprobs_unequal_1.png)

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

95-95: Images should have alternate text (alt text)

(MD045, no-alt-text)

🤖 Prompt for AI Agents

In `@docs/guides/dtensor-tp-accuracy.md` at line 95, The image markdown line for the figure ("![](../assets/dtensor-tp-accuracy/logprobs_unequal_1.png)") is missing alt text; update that markdown to include a concise descriptive alt string (for example: alt="Plot of log-probabilities showing unequal values across tensor parallel partitions" or similar) so screen readers can convey the image content and purpose.

Signed-off-by: Jonas Yang <joyang@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Rayen <ruit@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: ruit <ruit@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

…VIDIA-NeMo#1777) Signed-off-by: Jonas Yang <joyang@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Rayen <ruit@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Jonas Yang CN <joyang@nvidia.com> Co-authored-by: ruit <ruit@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>

chtruong814 requested a review from a team as a code owner January 15, 2026 01:38

chtruong814 requested a review from joyang-nv January 15, 2026 01:38

chtruong814 added cherry-pick Run CICD labels Jan 15, 2026

github-actions Bot added the Documentation Improvements or additions to documentation label Jan 15, 2026

chtruong814 temporarily deployed to nemo-ci January 15, 2026 01:38 — with GitHub Actions Inactive

yuki-97 added the CI:docs Run doctest label Jan 15, 2026

yuki-97 temporarily deployed to nemo-ci January 15, 2026 01:41 — with GitHub Actions Inactive

coderabbitai Bot reviewed Jan 15, 2026

View reviewed changes

yuki-97 enabled auto-merge (squash) January 15, 2026 01:42

yuki-97 approved these changes Jan 15, 2026

View reviewed changes

chtruong814 temporarily deployed to nemo-ci January 15, 2026 02:10 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci January 15, 2026 02:13 — with GitHub Actions Inactive

chtruong814 temporarily deployed to nemo-ci January 15, 2026 02:20 — with GitHub Actions Inactive

yuki-97 added CI:docs Run doctest and removed CI:docs Run doctest labels Jan 15, 2026

yuki-97 temporarily deployed to nemo-ci January 15, 2026 03:25 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci January 15, 2026 03:30 — with GitHub Actions Inactive

yuki-97 force-pushed the cherry-pick-1767-r0.5.0 branch from 8efd68d to d6112f3 Compare January 15, 2026 04:21

yuki-97 temporarily deployed to nemo-ci January 15, 2026 04:21 — with GitHub Actions Inactive

yuki-97 added CI:docs Run doctest and removed CI:docs Run doctest labels Jan 15, 2026

yuki-97 temporarily deployed to nemo-ci January 15, 2026 04:22 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci January 15, 2026 04:57 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci January 15, 2026 05:01 — with GitHub Actions Inactive

yuki-97 merged commit 174e5cb into r0.5.0 Jan 15, 2026
40 checks passed

yuki-97 deleted the cherry-pick-1767-r0.5.0 branch January 15, 2026 05:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cp: `docs: Adding dtensor TP debugging summary (1767)` into `r0.5.0`#1777

cp: `docs: Adding dtensor TP debugging summary (1767)` into `r0.5.0`#1777
yuki-97 merged 1 commit into
r0.5.0from
cherry-pick-1767-r0.5.0

chtruong814 commented Jan 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jan 15, 2026

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jan 15, 2026

Uh oh!

coderabbitai Bot Jan 15, 2026

Uh oh!

coderabbitai Bot Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		<p align="center"><em>Fig 2: The reward of Qwen3-4B</em></p>

		<img src="../assets/dtensor-tp-accuracy/validation_accuracy.png" style="zoom:33%;" />


		This ratio is the standard importance ratio used in off-policy RL to reweight returns when the data are collected under an older behavior policy. In on-policy training, this ratio should be exactly 1. However, in our experiments, we observed cases where the ratio deviates from 1, indicating a mismatch between the intended on-policy setting and the actual behavior of the system. Figure 4 and Figure 5 illustrate this phenomenon by showing the mismatch between `prev_logprobs` and `current_logprobs` under TP=4, as well as the reward curves under TP=4 and TP=1 for the `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` model.

		![](../assets/dtensor-tp-accuracy/logprobs_unequal_1.png)

Conversation

chtruong814 commented Jan 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jan 15, 2026

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chtruong814 commented Jan 15, 2026 •

edited by coderabbitai Bot

Loading