Conversation
This is consistent with `Statistics.quantile` and avoids breakage due to #977. It's particularly useful for `Union{T, Missing}`, e.g. a view of nonmissing entries in a vector. This also allows supporting some types such as `Date`, though currently this only works for some values (would need to implement `type=1`).
f7df294 to
cde5c6a
Compare
| # prepare out vector | ||
| out = Vector{typeof(zero(V)/1)}(undef, length(p)) | ||
| v1 = vw[1][1] | ||
| out = Vector{typeof(v1 + zero(eltype(p))*zero(W)*zero(v1))}(undef, length(p)) |
There was a problem hiding this comment.
Ideally I wanted to use map(p) and return the result directly to avoid having to compute the type manually, but I couldn't find a way for the return type to be inferred. The problem is that the loop does not just depend on the quantile, we also have to update the sum of weights, current and previous values, which get boxed. Not sure there's a clean solution.
test/jet.jl
Outdated
| # Default error analysis for common problem fails since JET errors on interface definitions | ||
| # The (deprecated) `model_response(::StatisticalModel)` calls the interface | ||
| # function `response(::StatisticalModel)` for which no method exists yet | ||
| # JET is also confused by any(ismissing, ::AbstractVector) |
There was a problem hiding this comment.
The failure is this:
┌ quantile(v::AbstractVector{…}, w::StatsBase.AbstractWeights{…} where T<:Real, p::AbstractVector{…}) where {…} @ StatsBase D:\a\StatsBase.jl\StatsBase.jl\src\weights.jl:653
│ non-boolean `Nothing`, `Some{Bool}` found in boolean context (2/3 union split): goto %108 if not StatsBase.any(StatsBase.ismissing, v::AbstractVector)::Union{Nothing, Some{Bool}, Bool}I don't understand where this comes from, as the type seems perfectly inferred as Bool:
julia> f(x) = any(ismissing, x)
f (generic function with 1 method)
julia> @code_warntype f(Any[1])
MethodInstance for f(::Vector{Any})
from f(x) @ Main REPL[3]:1
Arguments
x::Vector{Any}
Body::Bool
1 ─ %1 = Main.any(Main.ismissing, x)::Bool
└── return %1There was a problem hiding this comment.
In my experience, JET is usually right - there might be some definition of any in Base or some dependency that does not ensure that a Boolean is returned.
If the problem persists even with the additional type check suggested above, then I think you could try to fix the JET error by adding a type assertion to any, ie use any(ismissing, v)::Bool.
There was a problem hiding this comment.
I've just checked, that error doesn't happen when I enable JET on Julia 1.12. I'm tempted to enable JET only there, but there are two new errors which are wrong (variables are always defined):
═════ 3 possible errors found ═════
┌ check_vectors(x::Any, y::Any, skipmissing::Symbol) @ StatsBase /home/milan/.julia/packages/StatsBase/5D4Dh/src/pairwise.jl:43
│ local variable `indsx` may be undefined: indsx::Any
└────────────────────
┌ check_vectors(x::Any, y::Any, skipmissing::Symbol) @ StatsBase /home/milan/.julia/packages/StatsBase/5D4Dh/src/pairwise.jl:43
│ local variable `indsy` may be undefined: indsy::Any
└────────────────────
┌ model_response(obj::StatisticalModel) @ StatsBase ./deprecated.jl:215
│ no matching method found `response(::StatisticalModel)`: response(obj::StatisticalModel)
└────────────────────What do you think? Better use JET on 1.12 anyway?
There was a problem hiding this comment.
These errors on 1.12 are real, JET's analysis was just previously not capable of capturing them. The compiler can't reason about these if conditions and hence can't infer that the variables indsx and indsy in
Line 43 in d70c4a2
Enabling JET on 1.12 probably would require to update the compat entry as the AssertionError apparently was only solved in a more recent release of JET.
That being said, the errors on Julia < 1.12 are most likely real as well. Did you check whether the type assertion would be sufficient to fix them?
So I think enabling JET on 1.12, now that the JET-internal AssertionErrors are fixed, would be a good idea in general but belongs to a different PR, even more so since it requires additional fixes in StatsBase.
There was a problem hiding this comment.
Adding ::Bool does the trick so let's do that.
src/weights.jl
Outdated
| any(ismissing, v) && | ||
| throw(ArgumentError("quantiles are undefined in presence of missing values")) |
There was a problem hiding this comment.
I think this check should be moved to the beginning of the function body since it operates on the input and eg doesn't require the sorting (similar to the check for empty v).
Moreover, I think it would be good to ensure that the check is skipped if it can be deduced from the type that no missings are present, eg
| any(ismissing, v) && | |
| throw(ArgumentError("quantiles are undefined in presence of missing values")) | |
| if eltype(v) >: Missing && any(ismissing, v) | |
| throw(ArgumentError("quantiles are undefined in presence of missing values")) | |
| end |
There was a problem hiding this comment.
Good point, I'll move it up. But the eltype check isn't needed, the compiler optimizes the call out automatically:
julia> f(x) = any(ismissing, x)
f (generic function with 1 method)
julia> @code_llvm f([1])
; Function Signature: f(Array{Int64, 1})
; @ REPL[6]:1 within `f`
define i8 @julia_f_3813(ptr noundef nonnull align 8 dereferenceable(24) %"x::Array") #0 {
top:
; ┌ @ reducedim.jl:989 within `any`
ret i8 0
; └
}There was a problem hiding this comment.
I'm generally a bit hesitant to rely on such undocumented compiler optimizations - it's not clear immediately for which types it would be applied and whether it can be expected to be stable across Julia versions. If we want to be certain that it is optimized, I think we would have to add the check in StatsBase.
There was a problem hiding this comment.
We can't actually use eltype(v) >: Missing if we want to be generic as types other than Missing may implement it (e.g. https://github.com/nalimilan/TypedMissings.jl). Anyway this is a relatively simple optimization so I think we can rely on it. It's used all over the place, including in Base itself.
test/jet.jl
Outdated
| # Default error analysis for common problem fails since JET errors on interface definitions | ||
| # The (deprecated) `model_response(::StatisticalModel)` calls the interface | ||
| # function `response(::StatisticalModel)` for which no method exists yet | ||
| # JET is also confused by any(ismissing, ::AbstractVector) |
There was a problem hiding this comment.
In my experience, JET is usually right - there might be some definition of any in Base or some dependency that does not ensure that a Boolean is returned.
If the problem persists even with the additional type check suggested above, then I think you could try to fix the JET error by adding a type assertion to any, ie use any(ismissing, v)::Bool.
|
Thanks! |
This is consistent with
Statistics.quantileand avoids breakage due to #977. It's particularly useful forUnion{T, Missing}, e.g. a view of nonmissing entries in a vector. This also allows supporting some types such asDate, though currently this only works for some values (would need to implementtype=1).