Skip to content

fix: guard against IndexError in decompress_list on empty string#3018

Open
prabindersinghh wants to merge 1 commit intoNetflix:masterfrom
prabindersinghh:fix/decompress-list-empty-string
Open

fix: guard against IndexError in decompress_list on empty string#3018
prabindersinghh wants to merge 1 commit intoNetflix:masterfrom
prabindersinghh:fix/decompress-list-empty-string

Conversation

@prabindersinghh
Copy link

@prabindersinghh prabindersinghh commented Mar 12, 2026

Summary

Fixes #3017

compress_list([]) legitimately returns "" (joining an empty list).
However, decompress_list("") immediately crashed with IndexError
on lststr[0] because there was no guard for empty input, making the
round-trip decompress_list(compress_list([])) crash instead of
returning [].

Fix: add a two-line early-return guard before the index access.

Changes

  • metaflow/util.py — 2 lines added (early return guard)
  • test/unit/test_util_compress.py — new file, 6 unit tests

Reproducer

from metaflow.util import compress_list, decompress_list

compress_list([])            # returns ""
decompress_list(compress_list([]))  # IndexError before fix

Testing

6 new unit tests in test/unit/test_util_compress.py:

Test What it verifies
test_compress_empty_list_returns_empty_string compress_list([]) returns ""
test_decompress_empty_string_returns_empty_list Core regression: no crash on ""
test_roundtrip_empty_list Full round-trip returns []
test_roundtrip_single_item Non-empty round-trip unaffected
test_roundtrip_multiple_items_with_common_prefix Prefix-encoding path unaffected
test_roundtrip_items_without_common_prefix Plain separator path unaffected

All 6 pass. Zero regressions.

AI Disclosure

  • I used AI tools (Claude) to help identify the bug and draft
    the test file. I have reviewed every line, understand the root
    cause, and can explain the fix and failure modes independently.

compress_list([]) legitimately returns '' (joining an empty list).
However, decompress_list('') immediately raised IndexError on lststr[0]
because there was no guard for an empty input string, making the
round-trip decompress_list(compress_list([])) crash.

Fix: add an early return for empty input before the index access.

Reproducer:
    from metaflow.util import compress_list, decompress_list
    decompress_list(compress_list([]))  # IndexError before fix
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR fixes a one-line IndexError in decompress_list when called with an empty string — the output produced by compress_list([]) — by adding a two-line early-return guard before the lststr[0] index access. The fix is minimal, correct, and has no effect on any non-empty input path.

Key changes:

  • metaflow/util.py: if not lststr: return [] added at the top of decompress_list, resolving the crash on the empty-list round-trip.
  • test/unit/test_util_compress.py: 6 new unit tests covering the regression case and the three encoding paths (plain, prefix-encoded, zlib).

Notable design consideration:
Both compress_list([]) and compress_list([""]) produce "" (a pre-existing collision in the encoding scheme). The fix resolves this ambiguity in favour of [], but no test documents this behaviour, which could surprise callers who pass lists containing empty strings.

Confidence Score: 5/5

  • This PR is safe to merge; the change is a minimal guard that fixes a crash with no impact on existing non-empty code paths.
  • The fix is two lines, targets only the empty-string edge case, and is fully covered by new regression tests. No existing behaviour is altered for non-empty inputs.
  • No files require special attention beyond the minor test-coverage gap noted for the [""] edge case.

Important Files Changed

Filename Overview
metaflow/util.py Adds a 2-line early-return guard (if not lststr: return []) to decompress_list to prevent the IndexError raised when an empty string is passed; the fix is minimal, correct, and does not affect non-empty inputs.
test/unit/test_util_compress.py New test file with 6 unit tests covering the empty-list round-trip regression and common non-empty paths; missing a test that documents the known [""]/[] collision in the compress/decompress encoding scheme.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["decompress_list(lststr)"] --> B{"lststr is empty?"}
    B -->|"yes — NEW guard"| C["return []"]
    B -->|"no"| D{"first char == zlibmarker?"}
    D -->|"yes"| E["base64-decode\nthen zlib decompress twice\n→ decoded string"]
    D -->|"no"| F["decoded = lststr"]
    E --> G{"rangedelim in decoded?"}
    F --> G
    G -->|"yes"| H["split on rangedelim\nreturn prefix+suffix for each suffix"]
    G -->|"no"| I["return decoded.split(separator)"]
Loading

Last reviewed commit: 4bea37a

Comment on lines +29 to +31
def test_compress_empty_list_returns_empty_string():
"""compress_list([]) must produce '' (join of zero elements)."""
assert compress_list([]) == ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing edge-case test: [""] round-trip collision

compress_list([""]) produces "" (joining a list of one empty string), which is the same output as compress_list([]). After this fix, decompress_list("") returns [], meaning decompress_list(compress_list([""])) returns [] instead of [""].

This is a pre-existing design-level ambiguity (both an empty list and a list containing one empty string compress to the same token), but the fix silently resolves it in favor of []. Adding a test documenting the current behavior would make this intent explicit and prevent future surprises:

def test_roundtrip_single_empty_string():
    # compress_list([""]) == "" == compress_list([]),
    # so decompress cannot distinguish the two; [] is the chosen result.
    assert decompress_list(compress_list([""])) == []

If [""] inputs are genuinely expected in the codebase, the collision should be addressed at the compress_list level (e.g., encoding the count) rather than relying on callers to avoid empty-string elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: decompress_list raises IndexError on empty string input

1 participant