Skip to content

Conversation

@LiaCastaneda
Copy link
Contributor

@LiaCastaneda LiaCastaneda commented Jan 10, 2026

Which issue does this PR close?

Rationale for this change

The:is_used() API incorrectly returned false for custom DataSource implementations that didn't call reassign_expr_columns() -> with_new_children() . This caused HashJoinExec to skip computing dynamic filters even when they were actually being used.

What changes are included in this PR?

Updated is_used() to check both outer and inner Arc counts

Are these changes tested?

Functionality is covered by existing test test_hashjoin_dynamic_filter_pushdown_is_used. I was not sure if to add a repro since it would require adding a custom DataSource, the current tests in datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs use FileScanConfig

Are there any user-facing changes?

no

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate labels Jan 10, 2026
@LiaCastaneda LiaCastaneda marked this pull request as ready for review January 10, 2026 18:13
@tobixdev
Copy link
Contributor

Thanks for the quick fix!

To me this approach seems great. Fixes the problem and does not make it more complex for users.

@LiaCastaneda
Copy link
Contributor Author

Thank you for reporting the issue :)

@adriangb adriangb added this pull request to the merge queue Jan 12, 2026
Merged via the queue into apache:main with commit 278950a Jan 12, 2026
35 checks passed
LiaCastaneda added a commit to DataDog/datafusion that referenced this pull request Jan 21, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#19715.

## Rationale for this change

The:is_used() API incorrectly returned false for custom `DataSource`
implementations that didn't call reassign_expr_columns() ->
with_new_children() . This caused `HashJoinExec` to skip computing
dynamic filters even when they were actually being used.

## What changes are included in this PR?

Updated is_used() to check both outer and inner Arc counts

## Are these changes tested?

Functionality is covered by existing test
`test_hashjoin_dynamic_filter_pushdown_is_used`. I was not sure if to
add a repro since it would require adding a custom `DataSource`, the
current tests in
datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs use
`FileScanConfig`

## Are there any user-facing changes?

no

(cherry picked from commit 278950a)
github-merge-queue bot pushed a commit that referenced this pull request Jan 27, 2026
## Which issue does this PR close?


## Rationale for this change

The current v52 signature `pub async fn wait_complete(self: &Arc<Self>)`
(introduced in #19546) is a bit unergonomic. The method requires
`&Arc<DynamicFilterPhysicalExpr>`, but when working with `Arc<dyn
PhysicalExpr>`, downcasting only gives you `&DynamicFilterPhysicalExpr`.
Since you can't convert `&DynamicFilterPhysicalExpr` to
`Arc<DynamicFilterPhysicalExpr>`, the method becomes impossible to call.


The `&Arc<Self>` param was used to check` is_used()` via Arc strong
count, but this was overly defensive.

## What changes are included in this PR?

- Changed `DynamicFilterPhysicalExpr::wait_complete` signature from `pub
async fn wait_complete(self: &Arc<Self>)` to `pub async fn
wait_complete(&self)`.

- Removed the `is_used()` check from `wait_complete()` - this method,
like `wait_update()`, should only be called on filters that have
consumers. If the caller doesn't know whether the filter has consumers,
they should call `is_used()` first to avoid waiting indefinitely. This
approach avoids complex signatures and dependencies between the APIs
methods.

## Are these changes tested?

Yes, existing tests cover this functionality, I removed the "mock"
consumer from `test_hash_join_marks_filter_complete_empty_build_side`
and `test_hash_join_marks_filter_complete` since the fix in
#19734 makes is_used check the
outer struct `strong_count` as well.


## Are there any user-facing changes?

The signature of `wait_complete` changed.
LiaCastaneda added a commit to DataDog/datafusion that referenced this pull request Jan 29, 2026
## Which issue does this PR close?

## Rationale for this change

The current v52 signature `pub async fn wait_complete(self: &Arc<Self>)`
(introduced in apache#19546) is a bit unergonomic. The method requires
`&Arc<DynamicFilterPhysicalExpr>`, but when working with `Arc<dyn
PhysicalExpr>`, downcasting only gives you `&DynamicFilterPhysicalExpr`.
Since you can't convert `&DynamicFilterPhysicalExpr` to
`Arc<DynamicFilterPhysicalExpr>`, the method becomes impossible to call.

The `&Arc<Self>` param was used to check` is_used()` via Arc strong
count, but this was overly defensive.

## What changes are included in this PR?

- Changed `DynamicFilterPhysicalExpr::wait_complete` signature from `pub
async fn wait_complete(self: &Arc<Self>)` to `pub async fn
wait_complete(&self)`.

- Removed the `is_used()` check from `wait_complete()` - this method,
like `wait_update()`, should only be called on filters that have
consumers. If the caller doesn't know whether the filter has consumers,
they should call `is_used()` first to avoid waiting indefinitely. This
approach avoids complex signatures and dependencies between the APIs
methods.

## Are these changes tested?

Yes, existing tests cover this functionality, I removed the "mock"
consumer from `test_hash_join_marks_filter_complete_empty_build_side`
and `test_hash_join_marks_filter_complete` since the fix in
apache#19734 makes is_used check the
outer struct `strong_count` as well.

## Are there any user-facing changes?

The signature of `wait_complete` changed.

(cherry picked from commit bef1368)
LiaCastaneda added a commit to DataDog/datafusion that referenced this pull request Jan 29, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#19715.

## Rationale for this change

The:is_used() API incorrectly returned false for custom `DataSource`
implementations that didn't call reassign_expr_columns() ->
with_new_children() . This caused `HashJoinExec` to skip computing
dynamic filters even when they were actually being used.

## What changes are included in this PR?

Updated is_used() to check both outer and inner Arc counts

## Are these changes tested?

Functionality is covered by existing test
`test_hashjoin_dynamic_filter_pushdown_is_used`. I was not sure if to
add a repro since it would require adding a custom `DataSource`, the
current tests in
datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs use
`FileScanConfig`

## Are there any user-facing changes?

no

(cherry picked from commit 278950a)
LiaCastaneda added a commit to DataDog/datafusion that referenced this pull request Jan 29, 2026
## Which issue does this PR close?

## Rationale for this change

The current v52 signature `pub async fn wait_complete(self: &Arc<Self>)`
(introduced in apache#19546) is a bit unergonomic. The method requires
`&Arc<DynamicFilterPhysicalExpr>`, but when working with `Arc<dyn
PhysicalExpr>`, downcasting only gives you `&DynamicFilterPhysicalExpr`.
Since you can't convert `&DynamicFilterPhysicalExpr` to
`Arc<DynamicFilterPhysicalExpr>`, the method becomes impossible to call.

The `&Arc<Self>` param was used to check` is_used()` via Arc strong
count, but this was overly defensive.

## What changes are included in this PR?

- Changed `DynamicFilterPhysicalExpr::wait_complete` signature from `pub
async fn wait_complete(self: &Arc<Self>)` to `pub async fn
wait_complete(&self)`.

- Removed the `is_used()` check from `wait_complete()` - this method,
like `wait_update()`, should only be called on filters that have
consumers. If the caller doesn't know whether the filter has consumers,
they should call `is_used()` first to avoid waiting indefinitely. This
approach avoids complex signatures and dependencies between the APIs
methods.

## Are these changes tested?

Yes, existing tests cover this functionality, I removed the "mock"
consumer from `test_hash_join_marks_filter_complete_empty_build_side`
and `test_hash_join_marks_filter_complete` since the fix in
apache#19734 makes is_used check the
outer struct `strong_count` as well.

## Are there any user-facing changes?

The signature of `wait_complete` changed.

(cherry picked from commit bef1368)
LiaCastaneda added a commit to DataDog/datafusion that referenced this pull request Jan 30, 2026
* Fix dynamic filter is_used function (apache#19734)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#19715.

## Rationale for this change

The:is_used() API incorrectly returned false for custom `DataSource`
implementations that didn't call reassign_expr_columns() ->
with_new_children() . This caused `HashJoinExec` to skip computing
dynamic filters even when they were actually being used.

## What changes are included in this PR?

Updated is_used() to check both outer and inner Arc counts

## Are these changes tested?

Functionality is covered by existing test
`test_hashjoin_dynamic_filter_pushdown_is_used`. I was not sure if to
add a repro since it would require adding a custom `DataSource`, the
current tests in
datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs use
`FileScanConfig`

## Are there any user-facing changes?

no

(cherry picked from commit 278950a)

* Simplify wait_complete function (apache#19937)

## Which issue does this PR close?

## Rationale for this change

The current v52 signature `pub async fn wait_complete(self: &Arc<Self>)`
(introduced in apache#19546) is a bit unergonomic. The method requires
`&Arc<DynamicFilterPhysicalExpr>`, but when working with `Arc<dyn
PhysicalExpr>`, downcasting only gives you `&DynamicFilterPhysicalExpr`.
Since you can't convert `&DynamicFilterPhysicalExpr` to
`Arc<DynamicFilterPhysicalExpr>`, the method becomes impossible to call.

The `&Arc<Self>` param was used to check` is_used()` via Arc strong
count, but this was overly defensive.

## What changes are included in this PR?

- Changed `DynamicFilterPhysicalExpr::wait_complete` signature from `pub
async fn wait_complete(self: &Arc<Self>)` to `pub async fn
wait_complete(&self)`.

- Removed the `is_used()` check from `wait_complete()` - this method,
like `wait_update()`, should only be called on filters that have
consumers. If the caller doesn't know whether the filter has consumers,
they should call `is_used()` first to avoid waiting indefinitely. This
approach avoids complex signatures and dependencies between the APIs
methods.

## Are these changes tested?

Yes, existing tests cover this functionality, I removed the "mock"
consumer from `test_hash_join_marks_filter_complete_empty_build_side`
and `test_hash_join_marks_filter_complete` since the fix in
apache#19734 makes is_used check the
outer struct `strong_count` as well.

## Are there any user-facing changes?

The signature of `wait_complete` changed.

(cherry picked from commit bef1368)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dynamic Filter marked as not used

3 participants