Skip to content

fix: resolve IndexError in decompress_list for empty input#3031

Open
Vansh0204 wants to merge 7 commits intoNetflix:masterfrom
Vansh0204:fix/3017-decompress-list-indexerror
Open

fix: resolve IndexError in decompress_list for empty input#3031
Vansh0204 wants to merge 7 commits intoNetflix:masterfrom
Vansh0204:fix/3017-decompress-list-indexerror

Conversation

@Vansh0204
Copy link
Contributor

Fix: resolve IndexError in decompress_list for empty input

Fixes #3017

Problem

The decompress_list function was not guarding against empty string input. Since compress_list([]) legitimately returns an empty string "", calling decompress_list(compress_list([])) would immediately crash with:
IndexError: string index out of range at line 387 (if lststr[0] == zlibmarker:).

Fix

Added an early return guard if not lststr: return [] in decompress_list.

Change

  • File: metaflow/util.py
  • Added check to return [] if input string is empty.

Verification

Verified with a round-trip test:
decompress_list(compress_list([])) now correctly returns [] instead of raising an IndexError.

compress_list([]) legitimately returns an empty string "".
However, decompress_list("") was attempting to access lststr[0]
without a guard, causing an IndexError.

This fix adds an early return for empty strings, ensuring that
decompress_list(compress_list([])) correctly returns [].

Fixes Netflix#3017
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 13, 2026

Greptile Summary

This PR fixes a crash in decompress_list when called with an empty string — the natural output of compress_list([]) — by adding a targeted early-return guard. It also ships a new test/unit/test_util.py covering all three encoding modes and explicitly documenting the pre-existing compress_list([""]) ambiguity.

  • Fix is precise: The guard uses lststr == "" (strict equality) rather than if not lststr, so None still raises a TypeError as expected, rather than silently returning [].
  • Known limitation is documented: Both the inline comment in util.py and test_compress_empty_string_element_ambiguity make clear that compress_list([]) and compress_list([""]) both produce "", and that decompress_list("") will always return [] — meaning a single empty-string element is unrecoverably lost in a round-trip.
  • Test coverage is solid: The three compression modes (plain CSV, prefix-encoded, zlib) are each exercised with round-trip assertions. A local ZLIB_MARKER constant is used instead of a bare "!" literal, tying it to the known default.

Confidence Score: 5/5

  • Safe to merge — the one-line guard is minimal, precise, and fully covered by tests.
  • The change is a single, narrowly-scoped guard that addresses a well-defined crash path without altering any other behaviour. All previous review concerns (None-safety, ambiguity documentation, test breadth, fragile assertions) have been addressed in this iteration. No new issues were introduced.
  • No files require special attention.

Important Files Changed

Filename Overview
metaflow/util.py Added a precise if lststr == "": guard in decompress_list to fix IndexError on empty input. The fix correctly uses strict equality (not not lststr) to avoid silently swallowing None, and includes a comment documenting the known compress_list([""]) ambiguity.
test/unit/test_util.py New test file covering all three compression modes (plain CSV, prefix-encoded, zlib), the empty-list edge case, and explicitly documenting the compress_list([""])"" ambiguity as a known limitation.

Last reviewed commit: f399208

- Update decompress_list to check specifically for "" as a guard.
- Add test/unit/test_util.py with regression tests for empty input.
- Add comment in decompress_list explaining the [] vs [""] ambiguity.
- Expand test/unit/test_util.py with comprehensive round-trip tests for
  single-element, plain CSV, prefix-encoded, and zlib-compressed lists.
- Add test documenting the known ambiguity between [] and [""].
- Remove fragile internal assertion
- Explicitly assert the known data loss issue when round-tripping
- Update comment to clarify that the prefix encoding test is checking Mode 2 indirectly
- Replace hardcoded '!' with ZLIB_MARKER in the zlib compression test to make the coupling visible
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: decompress_list raises IndexError on empty string input

1 participant