Skip to content

FIFO improvements#557

Merged
chhwang merged 5 commits intomainfrom
chhwang/fifo-fix
Jun 24, 2025
Merged

FIFO improvements#557
chhwang merged 5 commits intomainfrom
chhwang/fifo-fix

Conversation

@chhwang
Copy link
Contributor

@chhwang chhwang commented Jun 21, 2025

  • Revert MSCCLPP_FIFO_USE_TAIL_REPLICA=1 back to the default.
  • Optimize FifoDeviceHandle.
  • Do not use cudaHostAllocWriteCombined that increases latency.
  • Pin host memory for Host2DeviceSemaphore::outboundSemaphore_.
  • Fix proxy NUMA binding issues.
  • Prevent graph capture inside proxy threads.
  • Now CudaIpcConnection skips stream sync when unnecessary.
  • Now any type of connection needs to hold a shared pointer to the context for memory safety.
  • Now a context should be always managed by a shared pointer for memory safety.
  • Minor docs & interface improvements.
  • Minor fix in mscclpp-test correctness test.

@chhwang chhwang requested a review from Binyang2014 June 21, 2025 04:05
@chhwang
Copy link
Contributor Author

chhwang commented Jun 21, 2025

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@chhwang
Copy link
Contributor Author

chhwang commented Jun 21, 2025

/azp run mscclpp-test

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Binyang2014
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@chhwang chhwang merged commit b4dde38 into main Jun 24, 2025
16 of 29 checks passed
@chhwang chhwang deleted the chhwang/fifo-fix branch June 24, 2025 16:50
chhwang added a commit that referenced this pull request Jun 27, 2025
chhwang added a commit that referenced this pull request Jun 30, 2025
* Page-locking `Host2DeviceSemaphore::outboundSemaphore_` caused
unexpected performance issues so reverting it back. We may revisit this
later.
* Removed reference to connections from context as now connections refer
to context.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants