Commit 53b0ffb
fix: validate inter-file ordering in eq_properties() (#20329)
## Summary
Discovered this bug while working on #19724.
TLDR: just because the files themselves are sorted doesn't mean the
partition streams are sorted.
- **`eq_properties()` in `FileScanConfig` blindly trusted
`output_ordering`** (set from Parquet `sorting_columns` metadata)
without verifying that files within a group are in the correct
inter-file order
- `EnforceSorting` then removed `SortExec` based on this unvalidated
ordering, producing **wrong results** when filesystem order didn't match
data order
- Added `validated_output_ordering()` that filters orderings using
`MinMaxStatistics::new_from_files()` + `is_sorted()` to verify
inter-file sort order before reporting them to the optimizer
## Changes
### `datafusion/datasource/src/file_scan_config.rs`
- Added `validated_output_ordering()` method on `FileScanConfig` that
validates each output ordering against actual file group statistics
- Changed `eq_properties()` to call `self.validated_output_ordering()`
instead of `self.output_ordering.clone()`
### `datafusion/sqllogictest/test_files/sort_pushdown.slt`
Added 8 new regression tests (Tests 4-11):
| Test | Scenario | Key assertion |
|------|----------|---------------|
| **4** | Reversed filesystem order (inferred ordering) | SortExec
retained — wrong inter-file order detected |
| **5** | Overlapping file ranges (inferred ordering) | SortExec
retained — overlapping ranges detected |
| **6** | `WITH ORDER` + reversed filesystem order | SortExec retained
despite explicit ordering |
| **7** | Correctly ordered multi-file group (positive) | SortExec
eliminated — validation passes |
| **8** | DESC ordering with wrong inter-file DESC order | SortExec
retained for DESC direction |
| **9** | Multi-column sort key (overlapping vs non-overlapping) |
Conservative rejection with overlapping stats; passes with clean
boundaries |
| **10** | Correctly ordered + `WITH ORDER` (positive) | SortExec
eliminated — both ordering and stats agree |
| **11** | Multiple partitions (one file per group) |
`SortPreservingMergeExec` merges; no per-partition sort needed |
## Test plan
- [x] `cargo test --test sqllogictests -- sort_pushdown` — all new +
existing tests pass
- [x] `cargo test -p datafusion-datasource` — 97 unit tests + 6 doc
tests pass
- [x] Existing Test 1 (single-file sort pushdown with `WITH ORDER`)
still eliminates SortExec (no regression)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent 98cc753 commit 53b0ffb
File tree
5 files changed
+660
-47
lines changed- datafusion
- core/tests/physical_optimizer
- datasource/src
- sqllogictest/test_files
5 files changed
+660
-47
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
826 | 826 | | |
827 | 827 | | |
828 | 828 | | |
829 | | - | |
| 829 | + | |
830 | 830 | | |
831 | 831 | | |
832 | 832 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
736 | 736 | | |
737 | 737 | | |
738 | 738 | | |
739 | | - | |
| 739 | + | |
740 | 740 | | |
741 | 741 | | |
742 | 742 | | |
| |||
926 | 926 | | |
927 | 927 | | |
928 | 928 | | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
| 937 | + | |
| 938 | + | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
| 960 | + | |
| 961 | + | |
| 962 | + | |
929 | 963 | | |
930 | 964 | | |
931 | 965 | | |
| |||
1300 | 1334 | | |
1301 | 1335 | | |
1302 | 1336 | | |
| 1337 | + | |
| 1338 | + | |
| 1339 | + | |
| 1340 | + | |
| 1341 | + | |
| 1342 | + | |
| 1343 | + | |
| 1344 | + | |
| 1345 | + | |
| 1346 | + | |
| 1347 | + | |
| 1348 | + | |
| 1349 | + | |
| 1350 | + | |
| 1351 | + | |
| 1352 | + | |
| 1353 | + | |
| 1354 | + | |
| 1355 | + | |
| 1356 | + | |
| 1357 | + | |
| 1358 | + | |
| 1359 | + | |
| 1360 | + | |
| 1361 | + | |
| 1362 | + | |
| 1363 | + | |
| 1364 | + | |
| 1365 | + | |
| 1366 | + | |
| 1367 | + | |
| 1368 | + | |
| 1369 | + | |
| 1370 | + | |
| 1371 | + | |
| 1372 | + | |
| 1373 | + | |
| 1374 | + | |
| 1375 | + | |
| 1376 | + | |
| 1377 | + | |
| 1378 | + | |
| 1379 | + | |
| 1380 | + | |
| 1381 | + | |
1303 | 1382 | | |
1304 | 1383 | | |
1305 | 1384 | | |
| |||
1366 | 1445 | | |
1367 | 1446 | | |
1368 | 1447 | | |
1369 | | - | |
1370 | | - | |
1371 | | - | |
1372 | | - | |
1373 | | - | |
1374 | | - | |
1375 | | - | |
1376 | | - | |
1377 | | - | |
1378 | | - | |
1379 | | - | |
1380 | | - | |
1381 | | - | |
1382 | | - | |
1383 | | - | |
1384 | | - | |
1385 | | - | |
1386 | | - | |
1387 | | - | |
1388 | | - | |
1389 | | - | |
| 1448 | + | |
| 1449 | + | |
| 1450 | + | |
| 1451 | + | |
| 1452 | + | |
| 1453 | + | |
| 1454 | + | |
| 1455 | + | |
| 1456 | + | |
| 1457 | + | |
| 1458 | + | |
1390 | 1459 | | |
1391 | | - | |
1392 | | - | |
1393 | | - | |
1394 | | - | |
1395 | | - | |
1396 | | - | |
1397 | | - | |
1398 | | - | |
1399 | | - | |
1400 | | - | |
1401 | | - | |
1402 | | - | |
1403 | | - | |
1404 | | - | |
1405 | | - | |
1406 | | - | |
1407 | | - | |
1408 | | - | |
1409 | | - | |
| 1460 | + | |
| 1461 | + | |
| 1462 | + | |
| 1463 | + | |
| 1464 | + | |
| 1465 | + | |
| 1466 | + | |
| 1467 | + | |
| 1468 | + | |
| 1469 | + | |
| 1470 | + | |
| 1471 | + | |
| 1472 | + | |
| 1473 | + | |
| 1474 | + | |
| 1475 | + | |
| 1476 | + | |
| 1477 | + | |
| 1478 | + | |
| 1479 | + | |
| 1480 | + | |
| 1481 | + | |
| 1482 | + | |
| 1483 | + | |
| 1484 | + | |
| 1485 | + | |
| 1486 | + | |
1410 | 1487 | | |
1411 | | - | |
1412 | | - | |
1413 | 1488 | | |
1414 | | - | |
1415 | 1489 | | |
1416 | 1490 | | |
1417 | 1491 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
266 | 266 | | |
267 | 267 | | |
268 | 268 | | |
| 269 | + | |
269 | 270 | | |
270 | 271 | | |
271 | 272 | | |
272 | 273 | | |
273 | | - | |
| 274 | + | |
274 | 275 | | |
275 | 276 | | |
276 | 277 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
274 | 274 | | |
275 | 275 | | |
276 | 276 | | |
277 | | - | |
| 277 | + | |
0 commit comments