Ensure Consistency among Data Accesses in the same Thread Block#551
Merged
caiomcbr merged 14 commits intofeature/dslfrom Jul 3, 2025
Merged
Ensure Consistency among Data Accesses in the same Thread Block#551caiomcbr merged 14 commits intofeature/dslfrom
caiomcbr merged 14 commits intofeature/dslfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR implements a conflict-detection pass to insert synchronization operations between thread‐block operations that read/write the same buffer intervals. It adds data‐access tracking (DataAccess), a buffering‐access analyzer (BuffersAccess), and hooks into the existing pipeline to resolve dependencies after instruction fusion.
- Introduced
DataAccessTypeandDataAccessindsl_typesand builtbuffer_access.pyto detect interval conflicts - Extended every operation with unique IDs and
local_data_accessmethods to annotate reads/writes - Added
resolve_data_dependencythroughThreadBlock,Gpu, andMSCCLPPProgram.post_process_operations
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| python/mscclpp/language/internal/dsl_types.py | Added DataAccessType, DataAccess, reordered SyncType |
| python/mscclpp/language/internal/buffer_access.py | New BuffersAccess class to scan and sync operations |
| python/mscclpp/language/internal/operations.py | Gave operations UUIDs and implemented local_data_access |
| python/mscclpp/language/internal/threadblock.py | Imported buffer_access and added resolve_data_dependency |
| python/mscclpp/language/internal/gpu.py | Propagated resolve_data_dependency to all threadblocks |
| python/mscclpp/language/program.py | Called resolve_data_dependency after adding_data_sync |
| python/mscclpp/language/internal/optmizer.py | Included relaxed_signal/relaxed_wait in sync filter |
| python/mscclpp/language/tests/*.py | Updated tests for new sync/DSL types and revised logic |
| python/mscclpp/language/collectives.py & channel.py | Migrated imports to dsl_types |
Comments suppressed due to low confidence (1)
python/mscclpp/language/tests/allreduce.py:13
- [nitpick] The variable
num_tbis ambiguous—does it represent the chunk factor, number of thread blocks, or something else? Consider renaming to a more descriptive identifier likechunk_factorornum_thread_blocks.
num_tb = 8
Binyang2014
reviewed
Jun 20, 2025
Binyang2014
approved these changes
Jul 1, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The PR ensures that if two or more operations impact the same memory location, it will insert a mechanism to synchronize the thread block between theses operations, preventing any errors. For instance:
copy chunk 0 -> chunk 1
put chunk 1 -> chunk 2
Here we can see the copy operation and put operation have chunk 1 in common for writing and reading, respectively. In this case, I need to make sure the copy operation ends before starting the put operation. Therefore, we should have a synchronization mechanism between them to avoid any conflicts.
In this PR, we will insert synchronization mechanisms between the operations under the following conditions:
Operation using Write Data Access followed by Operation using Write Data Access to the same memory location
Operation using Read Data Access followed by Operation using Write Data Access to the same memory location
Operation using Write Data Access followed by Operation using Read Data Access to the same memory location