Skip to content

@gcsafe_ccall breaks inlining of ccall wrappers #2347

@maleadt

Description

@maleadt

#2262 introduced a regression: Using @gcsafe_ccall instead of plain @ccall apparently makes it so that our ccall wrappers aren't fully inlined anymore, resulting in simple Ref boxes (which we need a ton of with the CUDA C APIs) start allocating.

MWE:

function ccall_macro_lower(func, rettype, types, args, nreq)
    # instead of re-using ccall or Expr(:foreigncall) to perform argument conversion,
    # we need to do so ourselves in order to insert a jl_gc_safe_enter|leave
    # just around the inner ccall

    cconvert_exprs = []
    cconvert_args = []
    for (typ, arg) in zip(types, args)
        var = gensym("$(func)_cconvert")
        push!(cconvert_args, var)
        push!(cconvert_exprs, :($var = Base.cconvert($(esc(typ)), $(esc(arg)))))
    end

    unsafe_convert_exprs = []
    unsafe_convert_args = []
    for (typ, arg) in zip(types, cconvert_args)
        var = gensym("$(func)_unsafe_convert")
        push!(unsafe_convert_args, var)
        push!(unsafe_convert_exprs, :($var = Base.unsafe_convert($(esc(typ)), $arg)))
    end

    call = quote
        $(unsafe_convert_exprs...)

        gc_state = @ccall(jl_gc_safe_enter()::Int8)
        ret = ccall($(esc(func)), $(esc(rettype)), $(Expr(:tuple, map(esc, types)...)),
                    $(unsafe_convert_args...))
        @ccall(jl_gc_safe_leave(gc_state::Int8)::Cvoid)
        ret
    end

   quote
        @inline
        $(cconvert_exprs...)
        GC.@preserve $(cconvert_args...) $(call)
    end
end

macro gcsafe_ccall(expr)
    ccall_macro_lower(Base.ccall_macro_parse(expr)...)
end

@inline function check(f)
    res = f()
    if res < 0
        throw(res)
    end

    return
end

function cuCtxGetCurrent(ptr)
    check() do
        @gcsafe_ccall time(C_NULL::Ptr{Cvoid}, ptr::Ptr{Int})::Cint
    end
end

function current_context()
    handle_ref = Ref{Int}()
    cuCtxGetCurrent(handle_ref)
    handle_ref[]
end

current_context()
@show @allocated current_context()

Ways to 'make' this inline again (and get 0 allocated bytes):

  • replace @gcsafe_ccall with plain @ccall
  • within the @gcsafe_ccall, remove the call to jl_gc_safe_leave
  • within current_context, call-site inline the call to cuCtxGetCurrent

I can understand how the added calls in the @gcsafe_ccall body push the function over the inlining limit, but it's very surprising to me that I can only force inlining by using a call-site @inline, and not by annotating the generated code or function with @inline. @aviatesk, is that expected behavior?

cc @vchuravy

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceHow fast can we go?regressionSomething that used to work, doesn't anymore.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions