Skip to content

Conversation

@fferrandis
Copy link

@fferrandis fferrandis commented Dec 15, 2025

This pull request introduces significant improvements and refactoring to the buffer matrix management for erasure coding, focusing on more efficient handling of data fragments, especially for cases where the last subgroup (stripe) is not a full block. The changes include a new buffer format, more flexible offset calculations, improved test coverage, and a new C header for backend interoperability.

Key changes:

Buffer Matrix Refactoring and New Format Support

  • Introduced a new BufferInfo struct to encapsulate buffer metadata, separated from the main BufferMatrix data structure, and refactored related logic for clarity and maintainability. Added methods to support both old and new buffer layouts, with explicit UseNewFormat and UseOldFormat setters and corresponding offset calculation logic. (buffer.go [1] [2] [3] [4] [5] [6]

  • Implemented logic to handle the last subgroup differently when the data size is not a multiple of the block size, ensuring correct data alignment and minimizing wasted space in the last fragments. (buffer.go [1] [2] [3]

Expanded and Improved Testing

  • Removed outdated or redundant tests and benchmarks, and added comprehensive new tests to cover the new buffer format, offset calculations, subgroup size computations, and block mapping. This ensures correctness of the new logic and robust coverage for edge cases. (buffer_test.go [1] [2] [3]

Backend Interoperability

  • Added a new C header file backend.h defining the encode_chunk_context structure and a suite of utility functions to facilitate interaction between Go and C code for erasure coding backends. (backend.h backend.hR1-R70)

Minor Improvements

  • Added a // nolint:gosec comment to suppress a linter warning for pointer arithmetic in memalign.go. (memalign.go memalign.goR9)

These changes collectively modernize the buffer management for erasure coding, improve performance for non-aligned data sizes, and lay groundwork for further backend integration.

References: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

@bert-e
Copy link

bert-e commented Dec 15, 2025

Hello fferrandis,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link

bert-e commented Dec 15, 2025

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

@fferrandis fferrandis force-pushed the feature/HD-4220-last-block-size branch 3 times, most recently from 159884f to 6c8187c Compare December 15, 2025 09:14
- separate C implementations in dedicated files
- improve encoding and decoding functionality
- add new tests
- improvements to the encoding and decoding processes
@fferrandis fferrandis force-pushed the feature/HD-4220-last-block-size branch from 6c8187c to 181e135 Compare December 15, 2025 09:59
- enable new buffer style
- rework all tests to only use *Matrix
@fferrandis fferrandis force-pushed the feature/HD-4220-last-block-size branch from 041a7c1 to 12234f8 Compare December 26, 2025 12:52
@fferrandis fferrandis force-pushed the feature/HD-4220-last-block-size branch 2 times, most recently from 4bbbda1 to 44ad647 Compare January 1, 2026 12:06
@fferrandis fferrandis force-pushed the feature/HD-4220-last-block-size branch from 44ad647 to fd5d424 Compare January 7, 2026 12:56
Store in EncodeData structure the realsize
With old format, it contains the FragLen itself
With new format, this is sticked to the real data.

In DecodeMatrix, we also manage the last stripe size to ensure we read the
header.ChunkSize value instead of reading piecesize bytes
@fferrandis fferrandis force-pushed the feature/HD-4220-last-block-size branch from fd5d424 to 5decc2c Compare January 7, 2026 12:59
@@ -0,0 +1,408 @@
#include <liberasurecode/erasurecode.h>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only a move from code that was directly inline in GO file to a proper dedicated C file.

max: params.MaxBlockSize,
}
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we dont use jerasure, this is removed. (this new feature can't work with jerasure anyway)

wg.Add(int(ctx.number_of_subgroup))

for i := 0; i < int(ctx.number_of_subgroup); i++ {
go func(nth int) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main diff of the encode part. We make a distinction when it comes to the last lines/stripes. All the stripes are supposed to be "chunksize" long, but not the last one because of the new implementation

C.my_liberasurecode_encode_buffermatrix_cleanup(
backend.libecDesc, C.size_t(ctx.frags_len), ctx.datas, ctx.codings)
}},
}, totLen},
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we store the real len of the slice. We could also reslice it to this len, but it was easier from the debug point of view to keep the slice with the same slice regardless of the usage of the new format, and store the real len alongside.

}}, nil
}

// EncodeMatrix encodes data in small subpart of chunkSize bytes
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EncodeMatrix isn't used in hdcontroller. So all the tests using it has been updated to replace the usage of EncodeMatrix by EncodeMatrixWithBufferMatrix


var totLen int64
for i := 0; i < chunkInfo.NrChunk; i++ {
vect := make([][]byte, len(frags))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when decoding stripes, we can't assume that all stripes are "chunksized long". When using the new format, the last stripe of a fragment is often smaller (that is the prupose of this PR and this new format). The nextbound is computed accordingly.

* .. [-[*]- -]
* p4 [-[*]- -]
*
*/
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mainly variable renaming to help to understand what we are manipulating in the matrix. That why we have lines, celles and columns.

totalLines := (maxDataSize + lineSize - 1) / lineSize
isLastStripe := (lineEnd == totalLines-1)

/* When wrapping around, we read the full groups. */
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main change of GetRangeMatrix : we are going to read a full group (ie a full line of stripes) in case of two condition :

  • first one already existed
  • the second one is new and is true when data is in the last stripe line.
    As we cannot guess how the data is stored (using the old format or the new one) we chose to read all the stripes and then, based on the stripes header, get properly the data

Copy link
Author

@fferrandis fferrandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments to help the review

@fferrandis fferrandis requested a review from benoit-a January 9, 2026 15:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors the buffer matrix management for erasure coding to handle non-aligned data more efficiently, particularly when the last subgroup (stripe) is not a full block. The changes introduce a new buffer format alongside the existing format to minimize wasted space.

Key Changes:

  • Extracted BufferInfo struct from BufferMatrix for better separation of concerns
  • Added UseNewFormat/UseOldFormat methods to toggle between buffer layouts
  • Implemented variable-sized last subgroup handling in encoding/decoding paths
  • Moved C code from inline to separate backend.c/backend.h files for maintainability
  • Enhanced test coverage with new format-specific tests

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
buffer.go Core refactoring: new BufferInfo struct, format toggle methods, offset calculation logic for variable-sized last blocks
buffer_test.go Removed old comparison tests, added comprehensive tests for new format, offset calculations, and block mapping
backend.go Updated EncodeData/DecodeData structures with RealDataSize, refactored GetRangeMatrix variable naming, removed EncodeMatrix method
backend.c New file containing extracted C code with updated encode_chunk_buffermatrix to accept fraglen parameter
backend.h New C header defining encode_chunk_context structure and function declarations
backend_test.go Extensive test refactoring to use BufferMatrix instead of direct encoding, improved error handling with require
memalign.go Added nolint:gosec comment for pointer arithmetic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

r := C.encode_chunk_buffermatrix(backend.libecDesc, pData, pDataLen, nbFrags, &ctx, C.int(nth))
fragLen := C.size_t(chunkSize)
// last subgroup has a different size
if i == int(ctx.number_of_subgroup)-1 {
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop variable i is being captured by the goroutine closure, but it's also being used as the loop variable. This is a common concurrency bug where all goroutines might see the same value of i (specifically, the final value after the loop completes). The code correctly passes nth as a parameter, but then uses i inside the closure on line 285. This should use nth instead of i to ensure the correct value is used.

Suggested change
if i == int(ctx.number_of_subgroup)-1 {
if nth == int(ctx.number_of_subgroup)-1 {

Copilot uses AI. Check for mistakes.
Comment on lines +317 to +341

func (b *BufferMatrix) IsBlockInLastSubGroup(block int) bool {
cur := block / b.k
return cur == b.SubGroups()-1
}

func (b *BufferMatrix) ComputeSizeOfLastSubGroup() int {
// total of size already in previous subgroups
lastSubGroup := b.SubGroups() - 1
totalSizeInPreviousSubGroups := lastSubGroup * b.k * (b.bufSize)
leftSize := b.len - totalSizeInPreviousSubGroups
return leftSize
}

func (b *BufferMatrix) FragLenLastSubGroup() int {
if !b.newStyle {
return b.bufSize
}

r := b.ComputeSizeOfLastSubGroup() / b.k
if b.ComputeSizeOfLastSubGroup()%b.k != 0 {
r++
}
return r
}
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods (IsBlockInLastSubGroup, ComputeSizeOfLastSubGroup, and FragLenLastSubGroup) are duplicated between BufferInfo and BufferMatrix. Since BufferMatrix embeds BufferInfo, the methods on BufferMatrix will shadow the ones on BufferInfo. This duplication is unnecessary and can lead to maintenance issues. Consider removing the duplicate methods from BufferMatrix (lines 318-341) and relying on the embedded BufferInfo methods.

Suggested change
func (b *BufferMatrix) IsBlockInLastSubGroup(block int) bool {
cur := block / b.k
return cur == b.SubGroups()-1
}
func (b *BufferMatrix) ComputeSizeOfLastSubGroup() int {
// total of size already in previous subgroups
lastSubGroup := b.SubGroups() - 1
totalSizeInPreviousSubGroups := lastSubGroup * b.k * (b.bufSize)
leftSize := b.len - totalSizeInPreviousSubGroups
return leftSize
}
func (b *BufferMatrix) FragLenLastSubGroup() int {
if !b.newStyle {
return b.bufSize
}
r := b.ComputeSizeOfLastSubGroup() / b.k
if b.ComputeSizeOfLastSubGroup()%b.k != 0 {
r++
}
return r
}

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +108

while (offset < frag_len) {
if (is_invalid_fragment_header((fragment_header_t *)&frag[offset])) {
return false;
}
offset += piecesize + getHeaderSize();
}

Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check_matrix_fragment function previously validated that the fragment length was properly aligned to the cell size (piecesize + header). This alignment check has been removed, which could allow incorrectly-sized fragments to pass validation. This is particularly important for the new format where the last subgroup can have a different size. Consider adding logic to handle variable-sized last blocks or documenting why this check is no longer needed.

Suggested change
while (offset < frag_len) {
if (is_invalid_fragment_header((fragment_header_t *)&frag[offset])) {
return false;
}
offset += piecesize + getHeaderSize();
}
size_t header_size = (size_t)getHeaderSize();
size_t frag_size = (size_t)frag_len;
size_t cell_size = (size_t)piecesize + header_size;
/* Fragment must be at least large enough to hold a single header */
if (frag_size < header_size) {
return false;
}
/*
* Validate all full cells of size (piecesize + header_size). For the new
* format where the last subgroup can have a different size, any trailing
* bytes smaller than a full cell are handled separately below.
*/
while (offset + cell_size <= frag_size) {
if (is_invalid_fragment_header(
(fragment_header_t *)&frag[offset])) {
return false;
}
offset += cell_size;
}
/* Handle any remaining bytes that do not make up a full cell */
{
size_t remaining = frag_size - offset;
/* Exact multiple of full cells */
if (remaining == 0) {
return true;
}
/* Not enough bytes for a complete header: invalid/truncated fragment */
if (remaining < header_size) {
return false;
}
/*
* Variable-sized last block: ensure at least its header is valid.
* We do not assume its payload size here, only that the header is
* fully present and passes validation.
*/
if (is_invalid_fragment_header(
(fragment_header_t *)&frag[offset])) {
return false;
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +47 to +56
func (b BufferInfo) FragLenLastSubGroup() int {
if !b.newStyle {
return b.bufSize
}
r := b.ComputeSizeOfLastSubGroup() / b.k
if b.ComputeSizeOfLastSubGroup()%b.k != 0 {
r++
}
return r
}
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In FragLenLastSubGroup, ComputeSizeOfLastSubGroup() is called twice (lines 51-52 and 336-337). This could be optimized by storing the result in a local variable to avoid redundant computation. While not a major issue, it's a simple optimization that improves code clarity and efficiency.

Copilot uses AI. Check for mistakes.
// Computes actual data in the fragment
// If we are at the end of the data, we may have a smaller
// fragment than the others. fraglen will take of that.
int size = datalen - ((nth * ctx->k * ctx->chunk_size) + (i * fraglen));
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calculation on line 343 appears incorrect for the new format. The formula datalen - ((nth * ctx->k * ctx->chunk_size) + (i * fraglen)) doesn't properly account for variable-sized last blocks. When fraglen differs from ctx->chunk_size (as it does for the last subgroup), this calculation will produce incorrect offsets. This should be reviewed and tested carefully with the new buffer format, especially for non-aligned data sizes.

Suggested change
int size = datalen - ((nth * ctx->k * ctx->chunk_size) + (i * fraglen));
int size = datalen - ((nth * ctx->k * ctx->chunk_size) + (i * ctx->chunk_size));

Copilot uses AI. Check for mistakes.
if index < 0 || index >= len(e.Data) {
return nil
}
return e.Data[index][:e.RealDataSize]
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetFragment method on line 224 slices e.Data[index][:e.RealDataSize], but RealDataSize appears to be the total size across all fragments, not the size of a single fragment. This will likely cause a panic or incorrect data when the fragment size is less than RealDataSize. This method needs to be corrected to return the appropriate fragment size, not the total data size.

Suggested change
return e.Data[index][:e.RealDataSize]
return e.Data[index]

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +70
func (b *BufferInfo) init(bufSize int, length int, k int) {
b.newStyle = true

hdrSize := fragmentHeaderSize()
b.hdrSize = hdrSize
b.bufSize = bufSize
b.len = length
b.k = k
b.leftInBlock = -1
b.curBlock = 0

b.sizeOfLastSubGroup = b.FragLenLastSubGroup()
}
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init method sets newStyle to true by default (line 59), which means all new BufferMatrix instances will use the new format unless explicitly changed. However, this could break backward compatibility for existing code that doesn't explicitly call UseOldFormat(). Consider defaulting to false for backward compatibility or documenting this breaking change prominently.

Copilot uses AI. Check for mistakes.
Comment on lines +115 to +123
// UseNewFormat sets the buffer to use the new format.
// The new format is more efficient for the last stripe/subgroup.
// Note: will panic if called after any Write() or ReadFrom()
func (b *BufferMatrix) UseNewFormat() {
if b.curBlock != 0 || b.leftInBlock != -1 || b.finished {
panic("UseNewFormat must be called before any Write")
}
b.newStyle = true
}
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The panic condition checks if b.leftInBlock != -1, but immediately after initialization in the init method, leftInBlock is set to -1 (line 66). This means the check will work for the first call, but the condition might be confusing. Additionally, checking b.finished might not be necessary since once finished is true, writes are no longer possible. Consider simplifying the condition or adding a comment explaining the state machine.

Copilot uses AI. Check for mistakes.
inFragRangeStartIncl := groupStartIncl * chunkSize
inFragRangeEndExcl := (groupEndIncl + 1) * chunkSize
inFragRangeStartIncl := lineStart * cellSize
inFragRangeEndExcl := (lineEnd + 1) * cellSize
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the GetRangeMatrix function, when isLastStripe is true (line 673), the function forces reading all columns. However, this logic doesn't account for the new buffer format where the last stripe may have different-sized cells. The cellSize used for calculating ranges assumes all cells are uniform, but with the new format, the last stripe's cells could be smaller. This may lead to incorrect range calculations for the new format.

Suggested change
inFragRangeEndExcl := (lineEnd + 1) * cellSize
inFragRangeEndExcl := (lineEnd + 1) * cellSize
if isLastStripe {
/*
* With the new buffer format, the last stripe may contain less
* than a full line of data, so the last cell can be smaller than
* cellSize. Adjust the end of the range to match the actual size
* of the last cell instead of assuming a uniform cellSize.
*/
headerSize := cellSize - lineSize
remainderDataSize := maxDataSize - lineEnd*lineSize
if remainderDataSize < 0 {
remainderDataSize = 0
}
lastCellSize := headerSize + remainderDataSize
inFragRangeEndExcl = lineEnd*cellSize + lastCellSize
}

Copilot uses AI. Check for mistakes.
Comment on lines +185 to 201
info := GetFragmentInfo(piece)
if info.Index != index {
t.Errorf("Expected frag %v to have index %v; got %v", index, index, info.Index)
}
if info.Size != len(piece)-80 { // 80 == sizeof (struct fragment_header_s)
t.Errorf("Expected frag %v to have size %v; got %v", index, len(piece)-80, info.Size)
}
if info.BackendName != params.Name {
t.Errorf("Expected frag %v to have backend %v; got %v", index, params.Name, info.BackendName)
}
if info.ErasureCodeVersion != expectedVersion {
t.Errorf("Expected frag %v to have EC version %v; got %v", index, expectedVersion, info.ErasureCodeVersion)
}
if !info.IsValid {
t.Errorf("Expected frag %v to be valid", index)
}
}
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test previously validated that info.OrigDataSize matched len(pattern), but this check was removed in the refactored test. This validation is important to ensure that the encoded fragments correctly store the original data size metadata. The check should be restored.

Copilot uses AI. Check for mistakes.
@fferrandis
Copy link
Author

/approve

@bert-e
Copy link

bert-e commented Jan 9, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

The following options are set: approve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants