Skip to content

Improve Rank robustness for malformed text/id rows#60

Merged
hemanth-asirvatham merged 1 commit intomainfrom
add-input-validation-to-rank-function
Feb 13, 2026
Merged

Improve Rank robustness for malformed text/id rows#60
hemanth-asirvatham merged 1 commit intomainfrom
add-input-validation-to-rank-function

Conversation

@hemanth-asirvatham
Copy link
Collaborator

Motivation

  • Ranking failed when non-string or missing values were hashed (calling .encode() on floats/NA), causing crashes during identifier derivation and downstream indexing.
  • Ensure ranking and related flows tolerate malformed rows by dropping or skipping them safely and informing users when rows are removed.
  • Keep existing behavior for valid text while avoiding silent coercions that can produce subtle bugs.

Description

  • Added helpers _is_missing_scalar and _hash_text_identifier to safely detect missing scalars and produce stable sha1 identifiers only for valid text-like inputs.
  • Hardened seeding logic in _seed_ratings_from_rate to use safe hashing and drop invalid keys instead of raising.
  • Modified both non-recursive (run) and recursive (_run_recursive) ranking entry paths to filter out malformed rows before processing, print a summary message when rows are dropped (count and percentage), and reset_index after filtering to avoid indexing issues.
  • Added safe handling for the case where all rows are filtered out by returning and saving an appropriately shaped empty result (with attribute columns and _raw/_se columns present).
  • Adjusted resume/checkpoint logic to use the safe hashing helper when comparing against previously saved identifiers.
  • Added regression tests to tests/test_basic.py verifying malformed-row dropping behavior in standard and recursive modes.

Testing

  • Ran targeted tests for the new behavior: pytest -q tests/test_basic.py -k "rank_drops_malformed_rows or recursive_rank_drops_malformed_rows or rank_outputs_standard_errors" which passed.
  • Ran the broader rank-related subset: pytest -q tests/test_basic.py -k "rank and not screenshot" which passed (rank-related tests succeeded).
  • All automated tests exercised during development for the rank changes passed after fixes; no failing automated tests remain for the modified areas.

Codex Task

@github-actions
Copy link


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@hemanth-asirvatham hemanth-asirvatham merged commit 3ab9de5 into main Feb 13, 2026
1 check failed
@github-actions github-actions bot locked and limited conversation to collaborators Feb 13, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant