-
Notifications
You must be signed in to change notification settings - Fork 269
@gcsafe_ccall breaks inlining of ccall wrappers #2347
Copy link
Copy link
Closed
Labels
performanceHow fast can we go?How fast can we go?regressionSomething that used to work, doesn't anymore.Something that used to work, doesn't anymore.
Description
#2262 introduced a regression: Using @gcsafe_ccall instead of plain @ccall apparently makes it so that our ccall wrappers aren't fully inlined anymore, resulting in simple Ref boxes (which we need a ton of with the CUDA C APIs) start allocating.
MWE:
function ccall_macro_lower(func, rettype, types, args, nreq)
# instead of re-using ccall or Expr(:foreigncall) to perform argument conversion,
# we need to do so ourselves in order to insert a jl_gc_safe_enter|leave
# just around the inner ccall
cconvert_exprs = []
cconvert_args = []
for (typ, arg) in zip(types, args)
var = gensym("$(func)_cconvert")
push!(cconvert_args, var)
push!(cconvert_exprs, :($var = Base.cconvert($(esc(typ)), $(esc(arg)))))
end
unsafe_convert_exprs = []
unsafe_convert_args = []
for (typ, arg) in zip(types, cconvert_args)
var = gensym("$(func)_unsafe_convert")
push!(unsafe_convert_args, var)
push!(unsafe_convert_exprs, :($var = Base.unsafe_convert($(esc(typ)), $arg)))
end
call = quote
$(unsafe_convert_exprs...)
gc_state = @ccall(jl_gc_safe_enter()::Int8)
ret = ccall($(esc(func)), $(esc(rettype)), $(Expr(:tuple, map(esc, types)...)),
$(unsafe_convert_args...))
@ccall(jl_gc_safe_leave(gc_state::Int8)::Cvoid)
ret
end
quote
@inline
$(cconvert_exprs...)
GC.@preserve $(cconvert_args...) $(call)
end
end
macro gcsafe_ccall(expr)
ccall_macro_lower(Base.ccall_macro_parse(expr)...)
end
@inline function check(f)
res = f()
if res < 0
throw(res)
end
return
end
function cuCtxGetCurrent(ptr)
check() do
@gcsafe_ccall time(C_NULL::Ptr{Cvoid}, ptr::Ptr{Int})::Cint
end
end
function current_context()
handle_ref = Ref{Int}()
cuCtxGetCurrent(handle_ref)
handle_ref[]
end
current_context()
@show @allocated current_context()Ways to 'make' this inline again (and get 0 allocated bytes):
- replace
@gcsafe_ccallwith plain@ccall - within the
@gcsafe_ccall, remove the call to jl_gc_safe_leave - within current_context, call-site inline the call to
cuCtxGetCurrent
I can understand how the added calls in the @gcsafe_ccall body push the function over the inlining limit, but it's very surprising to me that I can only force inlining by using a call-site @inline, and not by annotating the generated code or function with @inline. @aviatesk, is that expected behavior?
cc @vchuravy
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
performanceHow fast can we go?How fast can we go?regressionSomething that used to work, doesn't anymore.Something that used to work, doesn't anymore.