Improve Rank robustness for malformed text/id rows by hemanth-asirvatham · Pull Request #60 · openai/GABRIEL

hemanth-asirvatham · 2026-02-13T03:10:48Z

Motivation

Ranking failed when non-string or missing values were hashed (calling .encode() on floats/NA), causing crashes during identifier derivation and downstream indexing.
Ensure ranking and related flows tolerate malformed rows by dropping or skipping them safely and informing users when rows are removed.
Keep existing behavior for valid text while avoiding silent coercions that can produce subtle bugs.

Description

Added helpers _is_missing_scalar and _hash_text_identifier to safely detect missing scalars and produce stable sha1 identifiers only for valid text-like inputs.
Hardened seeding logic in _seed_ratings_from_rate to use safe hashing and drop invalid keys instead of raising.
Modified both non-recursive (run) and recursive (_run_recursive) ranking entry paths to filter out malformed rows before processing, print a summary message when rows are dropped (count and percentage), and reset_index after filtering to avoid indexing issues.
Added safe handling for the case where all rows are filtered out by returning and saving an appropriately shaped empty result (with attribute columns and _raw/_se columns present).
Adjusted resume/checkpoint logic to use the safe hashing helper when comparing against previously saved identifiers.
Added regression tests to tests/test_basic.py verifying malformed-row dropping behavior in standard and recursive modes.

Testing

Ran targeted tests for the new behavior: pytest -q tests/test_basic.py -k "rank_drops_malformed_rows or recursive_rank_drops_malformed_rows or rank_outputs_standard_errors" which passed.
Ran the broader rank-related subset: pytest -q tests/test_basic.py -k "rank and not screenshot" which passed (rank-related tests succeeded).
All automated tests exercised during development for the rank changes passed after fixes; no failing automated tests remain for the modified areas.

Codex Task

github-actions · 2026-02-13T03:10:57Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

Harden rank text hashing against malformed rows

85a759a

hemanth-asirvatham added the codex label Feb 13, 2026 — with ChatGPT Codex Connector

hemanth-asirvatham merged commit 3ab9de5 into main Feb 13, 2026
1 check failed

github-actions bot locked and limited conversation to collaborators Feb 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Rank robustness for malformed text/id rows#60

Improve Rank robustness for malformed text/id rows#60
hemanth-asirvatham merged 1 commit intomainfrom
add-input-validation-to-rank-function

hemanth-asirvatham commented Feb 13, 2026

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hemanth-asirvatham commented Feb 13, 2026

Motivation

Description

Testing

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant