Peer API RMA changes + SHM clean up#2
Conversation
prov/efa/src/rdm/rxr_rma.c
Outdated
|
|
||
| if (peer->is_local && rxr_ep->use_shm_for_tx) { | ||
| use_lower_ep_read = true; | ||
| // msg_clone->addr = peer->shm_fiaddr; |
There was a problem hiding this comment.
The fi_rma_bw test passes when I use msg and msg_clone. I don't know how that's possible..
I have to use msg_clone to set the address because the input parameter msg is const. Using msg_clone generates a compiler warning.
There was a problem hiding this comment.
fabtests only has 1 peer that that the shm_fiaddr could be the same with efa addr. We should use shm_fiaddr for any fi_* call for shm
There was a problem hiding this comment.
I see OK
What's the best way to do this? The msg_clone idea which I have here generates a compiler warning. I could do a memcpy but that would be more expensive?
There was a problem hiding this comment.
There are rxr_msg_construct and rxr_tmsg_construct in rxr_msg.h. You can do similar things to have a function to construct a new fi_rma_msg
Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Signed-off-by: Sai Sunku <sunkusa@amazon.com>
If a posted receive matches with a saved receive, we may need to increment the rx counter. Set the rx counter increment callback to match that of the posted receive. This fixes an assert in xnet_cntr_inc() accessing a NULL cntr_inc function pointer. Program received signal SIGABRT, Aborted. 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #0 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #1 0x0000155552d37db5 in abort () from /lib64/libc.so.6 #2 0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6 #3 0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6 #4 0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347 #5 0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354 #6 0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153 #7 0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188 #8 0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445 #9 0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558 #10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91 #11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212 Signed-off-by: Sean Hefty <sean.hefty@intel.com>
We need such barrier in two transitions: 1. from wqe write to doorbell ring 2. from doorbell ring to wqe write. For #2, we currently have the barrier right after the doorbell ring, which is different from rdma-core implementation that has the doorbell right in the next wr_start, and according to benchmark such choice hurts performance on ARM platform. This patch moves the #2 barrier to the beginning of wr start, before the wqe write, to be consistent with rdma-core and improves the performance. Signed-off-by: Shi Jin <sjina@amazon.com>
We need such barrier in two transitions: 1. from wqe write to doorbell ring 2. from doorbell ring to wqe write. For #2, we currently have the barrier right after the doorbell ring, which is different from rdma-core implementation that has the doorbell right in the next wr_start, and according to benchmark such choice hurts performance on ARM platform. This patch moves the #2 barrier to the beginning of wr start, before the wqe write, to be consistent with rdma-core and improves the performance. Signed-off-by: Shi Jin <sjina@amazon.com>
We need such barrier in two transitions: 1. from wqe write to doorbell ring 2. from doorbell ring to wqe write. For #2, we currently have the barrier right after the doorbell ring, which is different from rdma-core implementation that has the doorbell right in the next wr_start, and according to benchmark such choice hurts performance on ARM platform. This patch moves the #2 barrier to the beginning of wr start, before the wqe write, to be consistent with rdma-core and improves the performance. Signed-off-by: Shi Jin <sjina@amazon.com> (cherry picked from commit c4937ed)
We need such barrier in two transitions: 1. from wqe write to doorbell ring 2. from doorbell ring to wqe write. For #2, we currently have the barrier right after the doorbell ring, which is different from rdma-core implementation that has the doorbell right in the next wr_start, and according to benchmark such choice hurts performance on ARM platform. This patch moves the #2 barrier to the beginning of wr start, before the wqe write, to be consistent with rdma-core and improves the performance. Signed-off-by: Shi Jin <sjina@amazon.com> (cherry picked from commit c4937ed)
…I_EFA_ENABLE_SHM_TRANSFER=0 This patch fixes two related bugs involving uninitialized descriptor arrays that cause segmentation faults when FI_EFA_ENABLE_SHM_TRANSFER=0 is set, but only in release builds with -DNDEBUG. Bug #1: Inconsistent memset in efa_rdm_txe_construct() When msg->desc is NULL, the code zeroed the entire desc array with: memset(txe->desc, 0, sizeof(txe->desc)); This zeros all EFA_RDM_IOV_LIMIT elements, but the memcpy branch only copies iov_count elements. The fix uses sizeof(*txe->desc) * msg->iov_count to match the memcpy() branch and the parallel iov[] and mr[] arrays. Bug #2: Dangling pointer in efa_rdm_atomic_ex The compare_desc field was declared as void **compare_desc (pointer), which was set to point to the caller's stack memory. When the caller returned, this became a dangling pointer. Later access to compare_desc[i] would read garbage from reused stack memory. The fix changes compare_desc to an array void *compare_desc[EFA_RDM_IOV_LIMIT] with its own persistent storage, and copies the data instead of storing a pointer. Why debug builds didn't crash: Debug builds zero-initialize memory, so uninitialized desc[i] values were NULL. The if (desc) check in efa_copy_from_hmem() would fail, avoiding the dereference. Release builds leave memory uninitialized, so desc[i] contained garbage pointers like 0x73e54aab7848. The if (desc) check would pass, then dereference the garbage pointer, causing a segmentation fault. Tested on c7i.48xlarge clusters with 96-192 ranks using IMB-EXT Accumulate benchmark in both debug (-g -O0) and release (-DNDEBUG -O2) builds. Signed-off-by: Yin Li <yinliq@amazon.com>
…lt with FI_EFA_ENABLE_SHM_TRANSFER=0 This patch fixes two related bugs involving uninitialized descriptor arrays that cause segmentation faults when FI_EFA_ENABLE_SHM_TRANSFER=0 is set, but only in release builds with -DNDEBUG. Bug #1: Inconsistent memset in efa_rdm_txe_construct() When msg->desc is NULL, the code zeroed the entire desc array with: memset(txe->desc, 0, sizeof(txe->desc)); This zeros all EFA_RDM_IOV_LIMIT elements, but the memcpy branch only copies iov_count elements. The fix uses sizeof(*txe->desc) * msg->iov_count to match the memcpy() branch and the parallel iov[] and mr[] arrays. Bug #2: Dangling pointer in efa_rdm_atomic_ex The compare_desc field was declared as void **compare_desc (pointer), which was set to point to the caller's stack memory. When the caller returned, this became a dangling pointer. Later access to compare_desc[i] would read garbage from reused stack memory. The fix changes compare_desc to an array void *compare_desc[EFA_RDM_IOV_LIMIT] with its own persistent storage, and copies the data instead of storing a pointer. Why debug builds didn't crash: Debug builds zero-initialize memory, so uninitialized desc[i] values were NULL. The if (desc) check in efa_copy_from_hmem() would fail, avoiding the dereference. Release builds leave memory uninitialized, so desc[i] contained garbage pointers like 0x73e54aab7848. The if (desc) check would pass, then dereference the garbage pointer, causing a segmentation fault. Tested on c7i.48xlarge clusters with 96-192 ranks using IMB-EXT Accumulate benchmark in both debug (-g -O0) and release (-DNDEBUG -O2) builds. Signed-off-by: Yin Li <yinliq@amazon.com> (cherry picked from commit 6797933) Signed-off-by: Shi Jin <sjina@amazon.com>
…lt with FI_EFA_ENABLE_SHM_TRANSFER=0 This patch fixes two related bugs involving uninitialized descriptor arrays that cause segmentation faults when FI_EFA_ENABLE_SHM_TRANSFER=0 is set, but only in release builds with -DNDEBUG. Bug #1: Inconsistent memset in efa_rdm_txe_construct() When msg->desc is NULL, the code zeroed the entire desc array with: memset(txe->desc, 0, sizeof(txe->desc)); This zeros all EFA_RDM_IOV_LIMIT elements, but the memcpy branch only copies iov_count elements. The fix uses sizeof(*txe->desc) * msg->iov_count to match the memcpy() branch and the parallel iov[] and mr[] arrays. Bug #2: Dangling pointer in efa_rdm_atomic_ex The compare_desc field was declared as void **compare_desc (pointer), which was set to point to the caller's stack memory. When the caller returned, this became a dangling pointer. Later access to compare_desc[i] would read garbage from reused stack memory. The fix changes compare_desc to an array void *compare_desc[EFA_RDM_IOV_LIMIT] with its own persistent storage, and copies the data instead of storing a pointer. Why debug builds didn't crash: Debug builds zero-initialize memory, so uninitialized desc[i] values were NULL. The if (desc) check in efa_copy_from_hmem() would fail, avoiding the dereference. Release builds leave memory uninitialized, so desc[i] contained garbage pointers like 0x73e54aab7848. The if (desc) check would pass, then dereference the garbage pointer, causing a segmentation fault. Tested on c7i.48xlarge clusters with 96-192 ranks using IMB-EXT Accumulate benchmark in both debug (-g -O0) and release (-DNDEBUG -O2) builds. Signed-off-by: Yin Li <yinliq@amazon.com> (cherry picked from commit 6797933) Signed-off-by: Shi Jin <sjina@amazon.com>
No description provided.