Fixes #365 Apply CLI override_parameters into metadata.json parameters#370
Fixes #365 Apply CLI override_parameters into metadata.json parameters#370crossmeta wants to merge 1 commit into
Conversation
Fixes mlcommons#365 (Checkpointing Benchmark: Invalid Submission Due to Incorrect Operation Count) The submission_checker reads num_checkpoints_write/read from metadata['parameters'] (YAML defaults), but split-phase submissions need the CLI overrides reflected there. Previously the overrides only landed in metadata['override_parameters'] which the checker ignores - causing 10W+10R per phase to aggregate to 20W+20R and INVALID. Fix: at metadata serialization time, apply override_parameters (dotted keys) into the nested parameters dict, so parameters reflects the effective config. override_parameters is still emitted unchanged for full audit. Signed-off-by: sam sammandam <suprasam@zettalane.com>
|
MLCommons CLA bot: |
idevasena
left a comment
There was a problem hiding this comment.
Changes look good. Thank you!
|
Devasena,
Great work investigating, fixing and closing issues.
—Russ
… On May 12, 2026, at 6:57 AM, Devasena I ***@***.***> wrote:
@idevasena approved this pull request.
Changes look good. Thank you!
—
Reply to this email directly, view it on GitHub <#370 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF64UJ6IWXBINIHLDFUROLD42MNUTAVCNFSM6AAAAACY2FO4JWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DENZSGQZTQOBRGA>.
You are receiving this because your review was requested.
|
|
Did you provide your github ID (crossmeta) to the "join" workflow when you joined the MLPerf Storage WG? If not, could you please go through the "join" workflow again (providing all the same inputs) and add crossmeta as your github ID? One of the things that the "join" workflow does is to ask you to sign a "contributor license agreement" that grants to MLCommons a license to use any code contributions you make. With that complete and on file, we add your github ID to the whitelist of ID's that are permitted to check code into our github repo. Without that, we cannot accept your code change, and we'd really like to accept it, so... |
|
@FileSystemGuy Yes, crossmeta handle was provided while joining MLPerf Storage WG. |
Fixes #365 (Checkpointing Benchmark: Invalid Submission Due to Incorrect Operation Count)
Two-phase checkpoint runs (separate write-only and read-only mlpstorage invocations) are flagged INVALID because the submission_checker counts each phase as 10W+10R and aggregates to 20W+20R.
Root cause:
So even though --num-checkpoints-write=10 --num-checkpoints-read=0 drove the actual run correctly, metadata.json's
parametersblock recorded 10/10 and the checker tallied 20W+20R.Fix
At metadata serialization time, apply override_parameters (dotted keys) into the nested parameters dict, so
parametersreflects the effective run configuration.override_parametersis still emitted unchanged for audit.Verification
Pre-patch on the test script from #365:
Post-patch: