Skip to content

fix contiguity computation#5489

Merged
zasdfgbnm merged 6 commits intoeval-segmentfrom
esfix
Nov 12, 2025
Merged

fix contiguity computation#5489
zasdfgbnm merged 6 commits intoeval-segmentfrom
esfix

Conversation

@zasdfgbnm
Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Nov 6, 2025

Review updated until commit 5874b2b

Description

  • Fix contiguity computation for non-reduction dimensions

  • Improve allocation domain validation in tensor metadata

  • Correct argument manager update flag in kernel runtime

Changes walkthrough

Relevant files
Bug fix
fusion_cache_utils.cpp
Revise contiguity computation logic                                           

csrc/runtime/fusion_cache_utils.cpp

  • Rewrite contiguity inference to handle reduction and broadcast dims
    correctly
  • Use single right-to-left pass with dynamic index tracking
  • Set contiguity based on stride relationships between consecutive
    non-skipped dimensions
  • Add bounds checks for sizes/strides consumption
  • +44/-11 
    fusion_kernel_runtime.cpp
    Fix segment outputs update flag                                                   

    csrc/runtime/fusion_kernel_runtime.cpp

    • Change updateWithSegmentOutputs flag from true to false
    +1/-1     
    tensor_metadata.cpp
    Validate allocation domain completeness                                   

    csrc/tensor_metadata.cpp

  • Replace dummy values for missing IDs with error check
  • Ensure all allocation domain IDs are present in active_ids
  • Simplify stride and size assignment logic
  • Remove unnecessary conditional branches
  • +11/-13 

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review
    Possible Issue

    The logic for computing contiguity may not correctly handle cases where broadcast dimensions are present, as it unconditionally decrements sizes_idx for broadcasts but does not validate if the stride matches expected contiguous patterns when resuming after skips.

    if (id->isBroadcast()) {
      sizes_idx--;  // Move to next dimension in sizes/strides
      continue;
    }
    Correctness Concern

    The removal of fallback logic for IDs not found in active_ids may lead to errors in cases like grouped matmul where some IDs are expected to be missing; previously dummy values were used, but now a strict error is thrown.

    auto it = active_ids.find(id);
    NVF_ERROR(
        it != active_ids.end(),
        "Allocation domain of tensor ",
        tv->toString(),
        " is not complete. Missing ID: ",
        id->toString());
    Behavior Change

    The update to updateWithSegmentOutputs changes the last argument from true to false, which may affect how outputs are managed or synchronized; the rationale for this change is not explained.

    group_to_run->outputs(), group_runtime_outputs, run_order_id, false);

    @zasdfgbnm
    Copy link
    Copy Markdown
    Collaborator Author

    !test

    @zasdfgbnm
    Copy link
    Copy Markdown
    Collaborator Author

    !test

    @zasdfgbnm
    Copy link
    Copy Markdown
    Collaborator Author

    !test

    @zasdfgbnm zasdfgbnm marked this pull request as ready for review November 12, 2025 21:42
    @zasdfgbnm zasdfgbnm merged commit 25c3fee into eval-segment Nov 12, 2025
    56 of 60 checks passed
    @zasdfgbnm zasdfgbnm deleted the esfix branch November 12, 2025 21:42
    @greptile-apps
    Copy link
    Copy Markdown
    Contributor

    greptile-apps bot commented Nov 12, 2025

    Greptile Overview

    Greptile Summary

    This PR fixes contiguity computation for tensor allocation domains by replacing a two-phase approach with a single-pass dual-index algorithm that correctly handles reduction and broadcast dimensions.

    Key changes:

    • csrc/runtime/fusion_cache_utils.cpp: Replaced computeContiguity() call with custom inline algorithm that iterates through allocation domain right-to-left, using two indices (alloc_idx for allocation domain, sizes_idx for sizes/strides array). This correctly handles the fact that reduction dimensions have no entry in sizes/strides while broadcast dimensions do.
    • csrc/tensor_metadata.cpp: Strengthened validation by removing fallback to dummy values when allocation domain IDs are missing from active_ids map. Now fails fast with clear error message.
    • csrc/runtime/fusion_kernel_runtime.cpp: Disabled contiguity updates in segment output processing by changing flag from true to false.

    Confidence Score: 4/5

    • Safe to merge with minor concerns about test coverage for edge cases
    • The core algorithm rewrite in fusion_cache_utils.cpp is well-commented and logically sound, with proper bounds checking. The removal of dummy value fallbacks in tensor_metadata.cpp makes behavior more predictable. However, the change from true to false in fusion_kernel_runtime.cpp appears unrelated to the contiguity fix and lacks explanation in the PR description.
    • csrc/runtime/fusion_kernel_runtime.cpp - verify the update_contiguity flag change is intentional and correct

    Important Files Changed

    File Analysis

    Filename Score Overview
    csrc/runtime/fusion_cache_utils.cpp 4/5 Rewrote contiguity computation algorithm to correctly handle reduction and broadcast dimensions in allocation domain by using dual-index traversal
    csrc/runtime/fusion_kernel_runtime.cpp 5/5 Changed update_contiguity flag from true to false in updateWithSegmentOutputs call
    csrc/tensor_metadata.cpp 5/5 Strengthened validation to require all allocation domain IDs in active_ids map, removing fallback to dummy values

    Sequence Diagram

    sequenceDiagram
        participant FKR as FusionKernelRuntime
        participant AM as ArgumentManager
        participant RAD as resetAllocationDomainAndContiguity
        participant IAS as inferAllocationSizesAndStrides
        participant TV as TensorView
    
        FKR->>AM: updateWithSegmentOutputs(outputs, runtime_outputs, id, false)
        Note over FKR,AM: Changed from true to false
        AM->>AM: Store outputs in tensor_map_
        
        alt update_contiguity == true
            loop For each TensorView output
                AM->>RAD: resetAllocationDomainAndContiguity(tv, tensor)
                RAD->>IAS: inferAllocationSizesAndStrides(tensor, tv)
                IAS->>IAS: Traverse logical to allocation domain
                IAS->>IAS: Validate all IDs present in active_ids
                Note over IAS: Now enforces strict validation
                IAS-->>RAD: Return (sizes, strides) without reductions
                RAD->>RAD: Iterate allocation domain right-to-left
                RAD->>RAD: Dual-index traversal (alloc_idx, sizes_idx)
                Note over RAD: Skip reductions (no sizes/strides entry)<br/>Skip broadcasts (nullopt contiguity)<br/>Compute contiguity for normal dims
                RAD->>TV: setContiguity(contiguity)
            end
        end
        
        AM->>AM: Clean up unused tensor_map_ entries
    
    Loading

    Copy link
    Copy Markdown
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    3 files reviewed, 1 comment

    Edit Code Review Agent Settings | Greptile


    args_manager.updateWithSegmentOutputs(
    group_to_run->outputs(), group_runtime_outputs, run_order_id, true);
    group_to_run->outputs(), group_runtime_outputs, run_order_id, false);
    Copy link
    Copy Markdown
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    style: Check whether changing this flag from true to false is intentional and related to the contiguity computation fix, as it's not explained in the PR description.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant