Skip to content

Allow non-Real eltype with quantile#981

Merged
nalimilan merged 5 commits intomasterfrom
nl/quantile
Dec 17, 2025
Merged

Allow non-Real eltype with quantile#981
nalimilan merged 5 commits intomasterfrom
nl/quantile

Conversation

@nalimilan
Copy link
Member

This is consistent with Statistics.quantile and avoids breakage due to #977. It's particularly useful for Union{T, Missing}, e.g. a view of nonmissing entries in a vector. This also allows supporting some types such as Date, though currently this only works for some values (would need to implement type=1).

This is consistent with `Statistics.quantile` and avoids breakage due to #977.
It's particularly useful for `Union{T, Missing}`, e.g. a view of nonmissing entries
in a vector. This also allows supporting some types such as `Date`, though currently
this only works for some values (would need to implement `type=1`).
# prepare out vector
out = Vector{typeof(zero(V)/1)}(undef, length(p))
v1 = vw[1][1]
out = Vector{typeof(v1 + zero(eltype(p))*zero(W)*zero(v1))}(undef, length(p))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I wanted to use map(p) and return the result directly to avoid having to compute the type manually, but I couldn't find a way for the return type to be inferred. The problem is that the loop does not just depend on the quantile, we also have to update the sum of weights, current and previous values, which get boxed. Not sure there's a clean solution.

test/jet.jl Outdated
# Default error analysis for common problem fails since JET errors on interface definitions
# The (deprecated) `model_response(::StatisticalModel)` calls the interface
# function `response(::StatisticalModel)` for which no method exists yet
# JET is also confused by any(ismissing, ::AbstractVector)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure is this:

quantile(v::AbstractVector{…}, w::StatsBase.AbstractWeights{…} where T<:Real, p::AbstractVector{…}) where {} @ StatsBase D:\a\StatsBase.jl\StatsBase.jl\src\weights.jl:653
│ non-boolean `Nothing`, `Some{Bool}` found in boolean context (2/3 union split): goto %108 if not StatsBase.any(StatsBase.ismissing, v::AbstractVector)::Union{Nothing, Some{Bool}, Bool}

I don't understand where this comes from, as the type seems perfectly inferred as Bool:

julia> f(x) = any(ismissing, x)
f (generic function with 1 method)

julia> @code_warntype f(Any[1])
MethodInstance for f(::Vector{Any})
  from f(x) @ Main REPL[3]:1
Arguments
  x::Vector{Any}
Body::Bool
1%1 = Main.any(Main.ismissing, x)::Bool
└──      return %1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, JET is usually right - there might be some definition of any in Base or some dependency that does not ensure that a Boolean is returned.

If the problem persists even with the additional type check suggested above, then I think you could try to fix the JET error by adding a type assertion to any, ie use any(ismissing, v)::Bool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just checked, that error doesn't happen when I enable JET on Julia 1.12. I'm tempted to enable JET only there, but there are two new errors which are wrong (variables are always defined):

═════ 3 possible errors found ═════
┌ check_vectors(x::Any, y::Any, skipmissing::Symbol) @ StatsBase /home/milan/.julia/packages/StatsBase/5D4Dh/src/pairwise.jl:43local variable `indsx` may be undefined: indsx::Any
└────────────────────
┌ check_vectors(x::Any, y::Any, skipmissing::Symbol) @ StatsBase /home/milan/.julia/packages/StatsBase/5D4Dh/src/pairwise.jl:43local variable `indsy` may be undefined: indsy::Any
└────────────────────
┌ model_response(obj::StatisticalModel) @ StatsBase ./deprecated.jl:215
│ no matching method found `response(::StatisticalModel)`: response(obj::StatisticalModel)
└────────────────────

What do you think? Better use JET on 1.12 anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These errors on 1.12 are real, JET's analysis was just previously not capable of capturing them. The compiler can't reason about these if conditions and hence can't infer that the variables indsx and indsy in

indsx == indsy ||
are defined.

Enabling JET on 1.12 probably would require to update the compat entry as the AssertionError apparently was only solved in a more recent release of JET.

That being said, the errors on Julia < 1.12 are most likely real as well. Did you check whether the type assertion would be sufficient to fix them?

So I think enabling JET on 1.12, now that the JET-internal AssertionErrors are fixed, would be a good idea in general but belongs to a different PR, even more so since it requires additional fixes in StatsBase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding ::Bool does the trick so let's do that.

src/weights.jl Outdated
Comment on lines +653 to +654
any(ismissing, v) &&
throw(ArgumentError("quantiles are undefined in presence of missing values"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check should be moved to the beginning of the function body since it operates on the input and eg doesn't require the sorting (similar to the check for empty v).

Moreover, I think it would be good to ensure that the check is skipped if it can be deduced from the type that no missings are present, eg

Suggested change
any(ismissing, v) &&
throw(ArgumentError("quantiles are undefined in presence of missing values"))
if eltype(v) >: Missing && any(ismissing, v)
throw(ArgumentError("quantiles are undefined in presence of missing values"))
end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll move it up. But the eltype check isn't needed, the compiler optimizes the call out automatically:

julia> f(x) = any(ismissing, x)
f (generic function with 1 method)

julia> @code_llvm f([1])
; Function Signature: f(Array{Int64, 1})
;  @ REPL[6]:1 within `f`
define i8 @julia_f_3813(ptr noundef nonnull align 8 dereferenceable(24) %"x::Array") #0 {
top:
; ┌ @ reducedim.jl:989 within `any`
   ret i8 0
; └
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally a bit hesitant to rely on such undocumented compiler optimizations - it's not clear immediately for which types it would be applied and whether it can be expected to be stable across Julia versions. If we want to be certain that it is optimized, I think we would have to add the check in StatsBase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't actually use eltype(v) >: Missing if we want to be generic as types other than Missing may implement it (e.g. https://github.com/nalimilan/TypedMissings.jl). Anyway this is a relatively simple optimization so I think we can rely on it. It's used all over the place, including in Base itself.

test/jet.jl Outdated
# Default error analysis for common problem fails since JET errors on interface definitions
# The (deprecated) `model_response(::StatisticalModel)` calls the interface
# function `response(::StatisticalModel)` for which no method exists yet
# JET is also confused by any(ismissing, ::AbstractVector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, JET is usually right - there might be some definition of any in Base or some dependency that does not ensure that a Boolean is returned.

If the problem persists even with the additional type check suggested above, then I think you could try to fix the JET error by adding a type assertion to any, ie use any(ismissing, v)::Bool.

@nalimilan nalimilan merged commit 7388a8e into master Dec 17, 2025
13 checks passed
@nalimilan nalimilan deleted the nl/quantile branch December 17, 2025 08:34
@nalimilan
Copy link
Member Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants