fix: guard against IndexError in decompress_list on empty string#3018
fix: guard against IndexError in decompress_list on empty string#3018prabindersinghh wants to merge 1 commit intoNetflix:masterfrom
Conversation
compress_list([]) legitimately returns '' (joining an empty list).
However, decompress_list('') immediately raised IndexError on lststr[0]
because there was no guard for an empty input string, making the
round-trip decompress_list(compress_list([])) crash.
Fix: add an early return for empty input before the index access.
Reproducer:
from metaflow.util import compress_list, decompress_list
decompress_list(compress_list([])) # IndexError before fix
Greptile SummaryThis PR fixes a one-line Key changes:
Notable design consideration: Confidence Score: 5/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["decompress_list(lststr)"] --> B{"lststr is empty?"}
B -->|"yes — NEW guard"| C["return []"]
B -->|"no"| D{"first char == zlibmarker?"}
D -->|"yes"| E["base64-decode\nthen zlib decompress twice\n→ decoded string"]
D -->|"no"| F["decoded = lststr"]
E --> G{"rangedelim in decoded?"}
F --> G
G -->|"yes"| H["split on rangedelim\nreturn prefix+suffix for each suffix"]
G -->|"no"| I["return decoded.split(separator)"]
Last reviewed commit: 4bea37a |
| def test_compress_empty_list_returns_empty_string(): | ||
| """compress_list([]) must produce '' (join of zero elements).""" | ||
| assert compress_list([]) == "" |
There was a problem hiding this comment.
Missing edge-case test: [""] round-trip collision
compress_list([""]) produces "" (joining a list of one empty string), which is the same output as compress_list([]). After this fix, decompress_list("") returns [], meaning decompress_list(compress_list([""])) returns [] instead of [""].
This is a pre-existing design-level ambiguity (both an empty list and a list containing one empty string compress to the same token), but the fix silently resolves it in favor of []. Adding a test documenting the current behavior would make this intent explicit and prevent future surprises:
def test_roundtrip_single_empty_string():
# compress_list([""]) == "" == compress_list([]),
# so decompress cannot distinguish the two; [] is the chosen result.
assert decompress_list(compress_list([""])) == []If [""] inputs are genuinely expected in the codebase, the collision should be addressed at the compress_list level (e.g., encoding the count) rather than relying on callers to avoid empty-string elements.
Summary
Fixes #3017
compress_list([])legitimately returns""(joining an empty list).However,
decompress_list("")immediately crashed withIndexErroron
lststr[0]because there was no guard for empty input, making theround-trip
decompress_list(compress_list([]))crash instead ofreturning
[].Fix: add a two-line early-return guard before the index access.
Changes
metaflow/util.py— 2 lines added (early return guard)test/unit/test_util_compress.py— new file, 6 unit testsReproducer
Testing
6 new unit tests in
test/unit/test_util_compress.py:test_compress_empty_list_returns_empty_stringtest_decompress_empty_string_returns_empty_listtest_roundtrip_empty_listtest_roundtrip_single_itemtest_roundtrip_multiple_items_with_common_prefixtest_roundtrip_items_without_common_prefixAll 6 pass. Zero regressions.
AI Disclosure
the test file. I have reviewed every line, understand the root
cause, and can explain the fix and failure modes independently.