Local names linking by xal-0 · Pull Request #60031 · JuliaLang/julia

xal-0 · 2025-11-04T00:19:03Z

Overview

This PR overhauls the way linking works in Julia, both in the JIT and AOT. The point is to enable us to generate LLVM IR that depends only on the source IR, eliminating both nondeterminism and statefulness. This serves two purposes. First, if the IR is predictable, we can cache compile objects using the bitcode hash as a key, like how the ThinLTO cache works. #58592 was an early experiment along these lines. Second, we can reuse work that was done in a previous session, like pkgimages, but for the JIT.

We accomplish this by generating names that are unique only within the current LLVM module, removing most uses of the globalUniqueGeneratedNames counter. The replacement for jl_codegen_params_t, jl_codegen_output_t, represents a Julia "translation unit", and tracks the information we'll need to link the compiled module into the running session. When linking, we manipulate the JITLink LinkGraph (after compilation) instead of renaming functions in the LLVM IR (before).

Example

julia> @noinline foo(x) = x + 2.0
       baz(x) = foo(foo(x))

       code_llvm(baz, (Int64,); dump_module=true, optimize=false)

Nightly:

[...]
@"+Core.Float64#774" = private unnamed_addr constant ptr @"+Core.Float64#774.jit"
@"+Core.Float64#774.jit" = private alias ptr, inttoptr (i64 4797624416 to ptr)

; Function Signature: baz(Int64)
;  @ REPL[1]:2 within `baz`
define double @julia_baz_772(i64 signext %"x::Int64") #0 {
top:
  %pgcstack = call ptr @julia.get_pgcstack()
  %0 = call double @j_foo_775(i64 signext %"x::Int64")
  %1 = call double @j_foo_776(double %0)
  ret double %1
}

; Function Attrs: noinline optnone
define nonnull ptr @jfptr_baz_773(ptr %"function::Core.Function", ptr noalias nocapture noundef readonly %"args::Any[]", i32 %"nargs::UInt32") #1 {
top:
  %pgcstack = call ptr @julia.get_pgcstack()
  %0 = getelementptr inbounds i8, ptr %"args::Any[]", i32 0
  %1 = load ptr, ptr %0, align 8
  %.unbox = load i64, ptr %1, align 8
  %2 = call double @julia_baz_772(i64 signext %.unbox)
  %"+Core.Float64#774" = load ptr, ptr @"+Core.Float64#774", align 8
  %Float64 = ptrtoint ptr %"+Core.Float64#774" to i64
  %3 = inttoptr i64 %Float64 to ptr
  %current_task = getelementptr inbounds i8, ptr %pgcstack, i32 -152
  %"box::Float64" = call noalias nonnull align 8 dereferenceable(8) ptr @julia.gc_alloc_obj(ptr %current_task, i64 8, ptr %3) #5
  store double %2, ptr %"box::Float64", align 8
  ret ptr %"box::Float64"
}
[...]

Diff after this PR. Notice how each symbol gets the lowest possible integer suffix that will make it unique to the module, and how the two specializations for foo get different names:

@@ -4,18 +4,18 @@
 target triple = "arm64-apple-darwin24.6.0"
 
-@"+Core.Float64#774" = external global ptr
+@"+Core.Float64#_0" = external global ptr
 
 ; Function Signature: baz(Int64)
 ;  @ REPL[1]:2 within `baz`
-define double @julia_baz_772(i64 signext %"x::Int64") #0 {
+define double @julia_baz_0(i64 signext %"x::Int64") #0 {
 top:
   %pgcstack = call ptr @julia.get_pgcstack()
-  %0 = call double @j_foo_775(i64 signext %"x::Int64")
-  %1 = call double @j_foo_776(double %0)
+  %0 = call double @j_foo_0(i64 signext %"x::Int64")
+  %1 = call double @j_foo_1(double %0)
   ret double %1
 }
 
 ; Function Attrs: noinline optnone
-define nonnull ptr @jfptr_baz_773(ptr %"function::Core.Function", ptr noalias nocapture noundef readonly %"args::Any[]", i32 %"nargs::UInt32") #1 {
+define nonnull ptr @jfptr_baz_0(ptr %"function::Core.Function", ptr noalias nocapture noundef readonly %"args::Any[]", i32 %"nargs::UInt32") #1 {
 top:
   %pgcstack = call ptr @julia.get_pgcstack()
@@ -23,7 +23,7 @@
   %1 = load ptr, ptr %0, align 8
   %.unbox = load i64, ptr %1, align 8
-  %2 = call double @julia_baz_772(i64 signext %.unbox)
-  %"+Core.Float64#774" = load ptr, ptr @"+Core.Float64#774", align 8
-  %Float64 = ptrtoint ptr %"+Core.Float64#774" to i64
+  %2 = call double @julia_baz_0(i64 signext %.unbox)
+  %"+Core.Float64#_0" = load ptr, ptr @"+Core.Float64#_0", align 8
+  %Float64 = ptrtoint ptr %"+Core.Float64#_0" to i64
   %3 = inttoptr i64 %Float64 to ptr
   %current_task = getelementptr inbounds i8, ptr %pgcstack, i32 -152
@@ -39,8 +39,8 @@
 
 ; Function Signature: foo(Int64)
-declare double @j_foo_775(i64 signext) #3
+declare double @j_foo_0(i64 signext) #3
 
 ; Function Signature: foo(Float64)
-declare double @j_foo_776(double) #4
+declare double @j_foo_1(double) #4
 
 attributes #0 = { "frame-pointer"="all" "julia.fsig"="baz(Int64)" "probe-stack"="inline-asm" }

List of changes

Many sources of statefulness and nondeterminism in the emitted LLVM IR have been eliminated, namely:
- Function symbols defined for CodeInstances
- Global symbols referring to data on the Julia heap
- Undefined function symbols referring to invoked external CodeInstances
jl_codeinst_params_t has become jl_codegen_output_t. It now represents one Julia "translation unit". More than one CodeInstance can be emitted to the same jl_codegen_output_t, if desired, though in the JIT every CI gets its own right now. One motivation behind this is to allow us to emit code on multiple threads and avoid the bitcode serialize/deserialize step we currently do, if that proves worthwhile.

When we are done emitting to a jl_codegen_output_t, we call .finish(), which discards the intermediate state and returns only the LLVM module and the info needed for linking (jl_linker_info_t).
The new JLMaterializationUnit wraps emitting Julia LLVM modules and the associated jl_linker_info_t. It informs ORC that we can materialize symbols for the CIs defined by that output, and picks globally unique names for them. When it is materialized, it resolves all the call targets and generates trampolines for CodeInstances that are invoked but have the wrong calling convention, or are not yet compiled.
We now postpone linking decisions to after codegen whenever possible. For example, emit_invoke no longer tries to find a compiled version of the CodeInstance, and it no longer generates trampolines to adapt calling conventions. jl_analyze_workqueue's job has been absorbed into JuliaOJIT::linkOutput.
Some image_codegen differences have been removed:
- Codegen no longer cares if a compiled CodeInstance came from an image. During ahead-of-time linking, we generate thunk functions that load the address from the fvars table.
In jl_emit_native_impl, emit every CodeInstance into one jl_codegen_output_t. We now defer the creation of the llvm::Linker for llvmcalls, which has construction cost that grows with the size of the destination module, until the very end.
RTDyld is removed completely, since we cannot control linking like we can with JITLink. Since Add JLJITLinkMemoryManager (ports memory manager to JITLink) #60105, platforms that previous used the optimized memory manager now use the new one.

General refactoring

Adapt the jl_callingconv_t enum from staticdata.c into jl_invoke_api_t and use it in more places. There is one enumerator for each special jl_callptr_t function that can go in a CodeInstance's invoke field, as well as one that indicates an invoke wrapper should be there. There is a convenience function for reading an invoke pointer and getting the API type, and vice versa.
Avoid using magic string values, and try to directly pass pointers to LLVM Function * or ORC string pool entries when possible.

Future work

DLSymOptimizer should be mostly removed, in favour of emitting raw ccalls and redirecting them to the appropriate target during linking.
We should support ahead-of-time linking multiple jl_codegen_output_ts together, in order to parallelize LLVM IR emission when compiling a system image.
We still pass strings to emit_call_specfun_other, even though the prototype for the function is now created by jl_codegen_output_t::get_call_target. We should hold on to the calling convention info so it doesn't have to be recomputed.

Renumber jl_invoke_api_t

…fptrs

Use JITLink everywhere Rename jlcall_type, add jl_funcs_invoke_ptr Move JLLinkingLayer into JuliaOJIT Use jl_invoke_api_t elsewhere Rename JL_INVOKE_JFPTR -> JL_INVOKE_SPECSIG Put all special symbol names in one place Add helper for specsig -> tojlinvoke (fptr1) and use it Fix invariants for code_outputs Document JIT invariants better; remove invalid assertions Replace workqueue, partially support OpaqueClosure Add JIT tests Stop using strings so much Don't create an LLVM::Linker unless necessary Generate trampolines in aot_link_output GCChecker annotations, misc changes Re-add emit_always_inline Get JLDebuginfoPlugin and eh_frame working again Re-add OpaqueClosure MethodInstance global root Fix GCChecker annotations Clean up TODOs Read dump compile Use multiple threads in the JIT Add PLT/GOT for external fns Name Julia PLT GOT entries Do emit_llvmcall_modules at the end Suppress clang-tidy, static analyzer warnings Keep temporary_roots alive during emit_always_inline Mark pkg PLT thunks noinline Don't attempt to emit inline codeinsts when IR is too large or missing Improve thunk generation on x86 Fix infinite loop in emit_always_inline if inlining not possible Use local names for global targets Fix jl_get_llvmf_defn_impl cfunction hacks

…ests

vchuravy · 2025-11-04T19:23:49Z

src/jitlayers.cpp

+class JLMaterializationUnit : public orc::MaterializationUnit {
+public:
+    static JLMaterializationUnit Create(JuliaOJIT &JIT, ObjectLinkingLayer &OL,
+                                        std::unique_ptr<jl_linker_info_t> Info,
+                                        std::unique_ptr<MemoryBuffer> Obj) JL_NOTSAFEPOINT
+    {


Nice! I have been wanting this for a long time!

Would it make sense to have a C-API for creating these? So that LLVM.jl could create them?

Possibly, though I would not want to expose it in a way that would lock in some of the design choices, like how JLMaterializationUnit owns the object buffer.

I'm undecided on how much work should be deferred to materialization. Right now jl_compile_codeinst_now blocks all threads waiting on compilation until everything is compiled to object files, like on master. I'd like to leave the door open to letting ORC decide when to compile.

Yeah I have been wanting to try and ORC based setup for GPUCompiler

Still no C API, but fwiw I have switched this most recent version over to doing compilation in JLMaterializationUnit::materialize.

Ports our RTDyLD memory manager to JITLink in order to avoid memory use regressions after switching to JITLink everywhere (JuliaLang#60031). This is essentially a direct port: finalization must happen all at once, because it invalidates all allocation `wr_ptr`s. I decided it wasn't worth it to associate `OnFinalizedFunction` callbacks with each block, since they are large enough to make it extremely likely that all in-flight allocations land in the same block; everything must be relocated before finalization can happen. I plan to add support for DualMapAllocator on ARM64 macOS, as well as an alternative for executable memory to come later. For now, we fall back to the old MapperJITLinkMemoryManager.

Ports our RTDyLD memory manager to JITLink in order to avoid memory use regressions after switching to JITLink everywhere (JuliaLang#60031). This is a direct port: finalization must happen all at once, because it invalidates all allocation `wr_ptr`s. I decided it wasn't worth it to associate `OnFinalizedFunction` callbacks with each block, since they are large enough to make it extremely likely that all in-flight allocations land in the same block; everything must be relocated before finalization can happen. I plan to add support for DualMapAllocator on ARM64 macOS, as well as an alternative for executable memory later. For now, we fall back to the old MapperJITLinkMemoryManager.

Ports our RTDyLD memory manager to JITLink in order to avoid memory use regressions after switching to JITLink everywhere (JuliaLang#60031). This is a direct port: finalization must happen all at once, because it invalidates all allocation `wr_ptr`s. I decided it wasn't worth it to associate `OnFinalizedFunction` callbacks with each block, since they are large enough to make it extremely likely that all in-flight allocations land in the same block; everything must be relocated before finalization can happen. I plan to add support for DualMapAllocator on ARM64 macOS, as well as an alternative for executable memory later. For now, we fall back to the old MapperJITLinkMemoryManager. Release JLJITLinkMemoryManager lock when calling FinalizedCallbacks

Ports our RTDyLD memory manager to JITLink in order to avoid memory use regressions after switching to JITLink everywhere (#60031). This is a direct port: finalization must happen all at once, because it invalidates all allocation `wr_ptr`s. I decided it wasn't worth it to associate `OnFinalizedFunction` callbacks with each block, since they are large enough to make it extremely likely that all in-flight allocations land in the same block; everything must be relocated before finalization can happen.

adienes · 2025-11-14T01:49:29Z

eliminating both nondeterminism and the effect of redefining methods in the same session

there are several open issues observing inference changes when methods are redefined; does this PR affect those?

xal-0 · 2025-11-14T18:00:23Z

No, this PR only changes code generation.

Unfortunately the "portable" LLVM way of generating thunks doesn't generate the code we want. Instead, on platforms where it makes sense, we'll steal the LLD PLT thunk code, but in disassembled form. At some point this should be moved to after linking, where it can be in assembled form again. Amusingly it will be more portable in assembled form, because the assembler syntax for relocations differs between object formats.

xal-0 · 2025-11-18T23:50:59Z

This new commit fixes some horrible code generation in emit_pkg_plt_thunk by just emitting inline assembly, using PLT thunks stolen from LLD. This will be less hacky when it happens after linking. Since that requires the renaming of symbols post-compilation, it is out of scope for this PR.

Ports our RTDyLD memory manager to JITLink in order to avoid memory use regressions after switching to JITLink everywhere (#60031). This is a direct port: finalization must happen all at once, because it invalidates all allocation `wr_ptr`s. I decided it wasn't worth it to associate `OnFinalizedFunction` callbacks with each block, since they are large enough to make it extremely likely that all in-flight allocations land in the same block; everything must be relocated before finalization can happen. (cherry picked from commit 6fa0e75)

Replace all uses of `ptrdiff_t slide` and `int64_t slide` with `uint64_t`. If a JITted object is ever assigned an address in the upper half of the address space, which is quite common on 32-bit Linux, the expression `SectionAddr - SectionLoadAddr` has undefined behaviour. This resulted in some very confusing bugs that manifested far from the source. It is easier to use unsigned integers everywhere we need a difference, since we know they have two's complement representation. Cherry-picked from JuliaLang#60031. [1] https://buildkite.com/julialang/julia-master/builds/52196/steps/canvas?sid=019a9d6f-14a6-4ffc-be19-f2f835d1e719

Replace all uses of `ptrdiff_t slide` and `int64_t slide` with `uint64_t`. If a JITted object is ever assigned an address in the upper half of the address space on a platform with `sizeof(char *) = 4`, which is quite common on 32-bit Linux, the following can happen: In JITDebugInfoRegistry::registerJITObject, `SectionAddr - SectionLoadAddr` is computed in uint64_t (ok), then cast to ptrdiff_t (two's complement of the uint64_t version mod 2^32). This is apparently implementation-defined behaviour rather than undefined. Say SectionAddr = 0x1000UL, SectionLoadAddr = 0xe93b2000UL and size_t pointer = 0xe93b20abU. ``` (ptrdiff_t)(SectionAddr - SectionLoadAddr) == (ptrdiff_t)0xffffffff16c4f000 == 382005248 ``` jl_DI_for_fptr implicitly converts the ptrdiff_t to int64_t: ``` (int64_t)382005248 == 382005248L ``` lookup_pointer adds `size_t pointer` to `int64_t slide`. Both are converted to int64_t because it can represent every size_t: ``` (int64_t)0xe93b20abU + 382005248L == 3912966315L + 382005248L == 4294971563L ``` This is converted back to uint64_t by makeAddress, resulting in an address other than the 0x10ab we expected: ``` (uint64_t)4294971563L == 0x1000010abUL ``` It is easier to use unsigned integers everywhere we need a difference, since they avoid the problem of losing upper bits after sign extension and avoid weird UB from signed overflow. Cherry-picked from JuliaLang#60031. [1] https://buildkite.com/julialang/julia-master/builds/52196/steps/canvas?sid=019a9d6f-14a6-4ffc-be19-f2f835d1e719

@topolarity

…61015) When JL_NDEBUG is undefined, we should use these as assertions. Suggested by @topolarity in #60031 (comment). --------- Co-authored-by: Cody Tapscott <84105208+topolarity@users.noreply.github.com>

topolarity

This is definitely the right direction for the JIT, and as a bonus the changes seem to be a net improvement in terms of clarity + simplicity (thanks for that @xal-0).

No major issues stand out to me. An approval from @vtjnash would be more meaningful than mine, since I do not have the "big picture" of our JIT pipeline. Nonetheless, I've done my best to do a "local" review of the JIT logic. This is already moderately well-tested by bootstrap / PkgEval, so I say we land this now and find any remaining bugs / gaps in the wild.

src/jitlayers.cpp

src/codegen.cpp

topolarity · 2026-02-17T14:24:41Z

src/jitlayers.cpp

-    params.temporary_roots_set.clear();
+
+    // contains safepoints
+    jl_promote_method_roots(out, mi, out.get_module());


It's a little strange to me the way that we "consume" (promote) the temporary_roots here but don't de-initialize / reset them until a few lines later.

IIUC that's to allow them to continue to be used as a scratchspace by emit_always_inline but it might be worth a comment.

topolarity · 2026-02-17T14:34:46Z

src/jitlayers.cpp

+        // Tell ORC about all the other definition in this module.  When
+        // linker_info contains enough information to produce the full
+        // Interface, remove this.


Just out of curiosity, what other function definitions are we leaving out right now?

Currently, only tojlinvoke trampolines need to be externally-visible ORC symbols, but we have a few places in codegen where we emit symbols with external linkage. When I change all those to use local names, I'll be able to delete it.

topolarity · 2026-02-17T14:42:45Z

src/jitlayers.h

 void fixupTM(TargetMachine &TM) JL_NOTSAFEPOINT;

-void optimizeDLSyms(Module &M) JL_NOTSAFEPOINT_LEAVE JL_NOTSAFEPOINT_ENTER;
+void optimizeDLSyms(Module &M);


Why are these annotations dropped?

We now call optimizeDLSyms from JLMaterializationUnit::materialize, which is allowed to safepoint.

src/jitlayers.h

topolarity · 2026-02-17T14:46:05Z

src/jitlayers.h

+void optimizeDLSyms(Module &M);
+
+static inline const char *jl_symbol_prefix(jl_symbol_prefix_t type,
+                                           jl_invoke_api_t api) JL_NOTSAFEPOINT


Thank you for adding some structure to these conventions.

topolarity · 2026-02-17T15:07:29Z

src/jitlayers.cpp

+        for (auto [CI, _] : Out.linker_info->ci_funcs) {
+            JL_GC_PROMISE_ROOTED(CI);
+            jl_do_dump_compile(CI, end_time - start_time);
+        }


Random / unrelated, but should we try to align timing_print_module_names with jl_do_dump_compile ?

I'm already confused by these very-similar-but-technically-different instrumentations

Agreed. I'd like to improve Tracy output in general; since stackless inference it is no longer very useful for determining why something takes so long to compile.

…ionUnit

Refer to the module through get_TSM() always, so that we don't move a reference to the owned_TSM of the old jl_codegen_output_t in the move constructor.

xal-0 · 2026-02-19T19:02:39Z

Thanks to @vtjnash for the MWE that crashed on the old jl_jit_unregister_ci-after-invoke strategy. There's now a test in test/jit.jl that reliably triggers the assertion. We now unregister the CodeInstance immediately before calling invoke, which is done by a special jl_invoke_oneshot helper, called from jl_eval_thunk.

@noinline

# Overview This PR overhauls the way linking works in Julia, both in the JIT and AOT. The point is to enable us to generate LLVM IR that depends only on the source IR, eliminating both nondeterminism and statefulness. This serves two purposes. First, if the IR is predictable, we can cache compile objects using the bitcode hash as a key, like how the ThinLTO cache works. JuliaLang#58592 was an early experiment along these lines. Second, we can reuse work that was done in a previous session, like pkgimages, but for the JIT. We accomplish this by generating names that are unique only within the current LLVM module, removing most uses of the `globalUniqueGeneratedNames` counter. The replacement for `jl_codegen_params_t`, `jl_codegen_output_t`, represents a Julia "translation unit", and tracks the information we'll need to link the compiled module into the running session. When linking, we manipulate the JITLink [LinkGraph](https://llvm.org/docs/JITLink.html#linkgraph) (after compilation) instead of renaming functions in the LLVM IR (before). ## Example ``` julia> @noinline foo(x) = x + 2.0 baz(x) = foo(foo(x)) code_llvm(baz, (Int64,); dump_module=true, optimize=false) ``` Nightly: ```llvm [...] @"+Core.Float64#774" = private unnamed_addr constant ptr @"+Core.Float64#774.jit" @"+Core.Float64#774.jit" = private alias ptr, inttoptr (i64 4797624416 to ptr) ; Function Signature: baz(Int64) ; @ REPL[1]:2 within `baz` define double @julia_baz_772(i64 signext %"x::Int64") #0 { top: %pgcstack = call ptr @julia.get_pgcstack() %0 = call double @j_foo_775(i64 signext %"x::Int64") %1 = call double @j_foo_776(double %0) ret double %1 } ; Function Attrs: noinline optnone define nonnull ptr @jfptr_baz_773(ptr %"function::Core.Function", ptr noalias nocapture noundef readonly %"args::Any[]", i32 %"nargs::UInt32") JuliaLang#1 { top: %pgcstack = call ptr @julia.get_pgcstack() %0 = getelementptr inbounds i8, ptr %"args::Any[]", i32 0 %1 = load ptr, ptr %0, align 8 %.unbox = load i64, ptr %1, align 8 %2 = call double @julia_baz_772(i64 signext %.unbox) %"+Core.Float64#774" = load ptr, ptr @"+Core.Float64#774", align 8 %Float64 = ptrtoint ptr %"+Core.Float64#774" to i64 %3 = inttoptr i64 %Float64 to ptr %current_task = getelementptr inbounds i8, ptr %pgcstack, i32 -152 %"box::Float64" = call noalias nonnull align 8 dereferenceable(8) ptr @julia.gc_alloc_obj(ptr %current_task, i64 8, ptr %3) JuliaLang#5 store double %2, ptr %"box::Float64", align 8 ret ptr %"box::Float64" } [...] ``` Diff after this PR. Notice how each symbol gets the lowest possible integer suffix that will make it unique to the module, and how the two specializations for `foo` get different names: ```diff @@ -4,18 +4,18 @@ target triple = "arm64-apple-darwin24.6.0" -@"+Core.Float64#774" = external global ptr +@"+Core.Float64#_0" = external global ptr ; Function Signature: baz(Int64) ; @ REPL[1]:2 within `baz` -define double @julia_baz_772(i64 signext %"x::Int64") #0 { +define double @julia_baz_0(i64 signext %"x::Int64") #0 { top: %pgcstack = call ptr @julia.get_pgcstack() - %0 = call double @j_foo_775(i64 signext %"x::Int64") - %1 = call double @j_foo_776(double %0) + %0 = call double @j_foo_0(i64 signext %"x::Int64") + %1 = call double @j_foo_1(double %0) ret double %1 } ; Function Attrs: noinline optnone -define nonnull ptr @jfptr_baz_773(ptr %"function::Core.Function", ptr noalias nocapture noundef readonly %"args::Any[]", i32 %"nargs::UInt32") JuliaLang#1 { +define nonnull ptr @jfptr_baz_0(ptr %"function::Core.Function", ptr noalias nocapture noundef readonly %"args::Any[]", i32 %"nargs::UInt32") JuliaLang#1 { top: %pgcstack = call ptr @julia.get_pgcstack() @@ -23,7 +23,7 @@ %1 = load ptr, ptr %0, align 8 %.unbox = load i64, ptr %1, align 8 - %2 = call double @julia_baz_772(i64 signext %.unbox) - %"+Core.Float64#774" = load ptr, ptr @"+Core.Float64#774", align 8 - %Float64 = ptrtoint ptr %"+Core.Float64#774" to i64 + %2 = call double @julia_baz_0(i64 signext %.unbox) + %"+Core.Float64#_0" = load ptr, ptr @"+Core.Float64#_0", align 8 + %Float64 = ptrtoint ptr %"+Core.Float64#_0" to i64 %3 = inttoptr i64 %Float64 to ptr %current_task = getelementptr inbounds i8, ptr %pgcstack, i32 -152 @@ -39,8 +39,8 @@ ; Function Signature: foo(Int64) -declare double @j_foo_775(i64 signext) JuliaLang#3 +declare double @j_foo_0(i64 signext) JuliaLang#3 ; Function Signature: foo(Float64) -declare double @j_foo_776(double) JuliaLang#4 +declare double @j_foo_1(double) JuliaLang#4 attributes #0 = { "frame-pointer"="all" "julia.fsig"="baz(Int64)" "probe-stack"="inline-asm" } ``` ## List of changes - Many sources of statefulness and nondeterminism in the emitted LLVM IR have been eliminated, namely: - Function symbols defined for CodeInstances - Global symbols referring to data on the Julia heap - Undefined function symbols referring to invoked external CodeInstances - `jl_codeinst_params_t` has become `jl_codegen_output_t`. It now represents one Julia "translation unit". More than one CodeInstance can be emitted to the same `jl_codegen_output_t`, if desired, though in the JIT every CI gets its own right now. One motivation behind this is to allow us to emit code on multiple threads and avoid the bitcode serialize/deserialize step we currently do, if that proves worthwhile. When we are done emitting to a `jl_codegen_output_t`, we call `.finish()`, which discards the intermediate state and returns only the LLVM module and the info needed for linking (`jl_linker_info_t`). - The new `JLMaterializationUnit` wraps emitting Julia LLVM modules and the associated `jl_linker_info_t`. It informs ORC that we can materialize symbols for the CIs defined by that output, and picks globally unique names for them. When it is materialized, it resolves all the call targets and generates trampolines for CodeInstances that are invoked but have the wrong calling convention, or are not yet compiled. - We now postpone linking decisions to after codegen whenever possible. For example, `emit_invoke` no longer tries to find a compiled version of the CodeInstance, and it no longer generates trampolines to adapt calling conventions. `jl_analyze_workqueue`'s job has been absorbed into `JuliaOJIT::linkOutput`. - Some `image_codegen` differences have been removed: - Codegen no longer cares if a compiled CodeInstance came from an image. During ahead-of-time linking, we generate thunk functions that load the address from the fvars table. - In `jl_emit_native_impl`, emit every CodeInstance into one `jl_codegen_output_t`. We now defer the creation of the `llvm::Linker` for llvmcalls, which has construction cost that grows with the size of the destination module, until the very end. - RTDyld is removed completely, since we cannot control linking like we can with JITLink. Since JuliaLang#60105, platforms that previous used the optimized memory manager now use the new one. ### General refactoring - Adapt the `jl_callingconv_t` enum from `staticdata.c` into `jl_invoke_api_t` and use it in more places. There is one enumerator for each special `jl_callptr_t` function that can go in a CodeInstance's `invoke` field, as well as one that indicates an invoke wrapper should be there. There is a convenience function for reading an invoke pointer and getting the API type, and vice versa. - Avoid using magic string values, and try to directly pass pointers to LLVM `Function *` or ORC string pool entries when possible. ## Future work - `DLSymOptimizer` should be mostly removed, in favour of emitting raw ccalls and redirecting them to the appropriate target during linking. - We should support ahead-of-time linking multiple `jl_codegen_output_t`s together, in order to parallelize LLVM IR emission when compiling a system image. - We still pass strings to `emit_call_specfun_other`, even though the prototype for the function is now created by `jl_codegen_output_t::get_call_target`. We should hold on to the calling convention info so it doesn't have to be recomputed.

The merging of #60031 revealed a few remaining multithreading issues with local names linking (https://buildkite.com/julialang/julia-master/builds/55081/steps/canvas?jid=019c9584-a2fa-4ddc-bc7e-95ee729211a0&tab=output). This PR has a series of commits addressing these issues and making us a little more eager to crash with a useful message in situations that would otherwise result in a deadlock in `JuliaTaskDispatcher`: - We return as soon as possible from `JLMaterializationUnit::materialize` after calling `MaterializationResponsibility::failMaterialization`. - When an ORC lookup fails in `publishCIs`, call `abort()` instead of potentially deadlocking. Two concurrency issues are fixed. The first is that there was a window of time during which a CodeInstance added to the JIT via `jl_emit_codeinst_to_jit` had `invoke == jl_fptr_wait_for_compiled_addr`, but did not have ORC symbols set up in `JuliaOJIT::CISymbols`. We solve this by taking the lock before setting up the ORC symbols, skipping any CodeInstances where another thread beat us to the punch in setting `invoke.` I suspect the second issue is the one that was causing rare CI failures. We had a data race on the `InFlight` counter for `JLJITLinkMemoryManager`, which, if decremented below zero, would cause the `FinalizedCallbacks` to never fire. This manifests as deadlocks in `JuliaTaskDispatcher`, since those symbols will be stuck in the `SymbolState::Resolved` state forever.

xal-0 added 3 commits November 3, 2025 15:28

Add jl_invoke_api_t enum and use it in staticdata.c

70db9ab

Renumber jl_invoke_api_t

Set JL_CI_FLAGS_SPECPTR_SPECIALIZED only on specsig in jl_update_all_…

341bf40

…fptrs

xal-0 added compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM labels Nov 4, 2025

xal-0 added 2 commits November 4, 2025 10:27

Add CodegenParams.unique_names so they can be enabled in llvmpasses t…

da849ec

…ests

Fix timing_print_module_names not capturing JL_TIMING_DEFAULT_BLOCK

f8ab8e9

vchuravy reviewed Nov 4, 2025

View reviewed changes

xal-0 mentioned this pull request Nov 11, 2025

Add JLJITLinkMemoryManager (ports memory manager to JITLink) #60105

Merged

Merge remote-tracking branch 'upstream/master' into local-names-linking

c88e5eb

xal-0 mentioned this pull request Nov 14, 2025

Support DualMapAllocator on aarch64 macOS, and add MapJITAllocator #60117

Closed

xal-0 added 2 commits November 18, 2025 15:33

Merge remote-tracking branch 'upstream/master' into local-names-linking

879d0c0

Use fallback PLT for i386

5f4a808

xal-0 added 2 commits November 19, 2025 10:18

Don't add Attribute::Naked unless using inline assembly PLT thunk

7cb4632

Use TCK_MustTail in fallback PLT thunk

f81c77b

xal-0 mentioned this pull request Nov 20, 2025

Use unsigned integers for debuginfo address differences/slide #60179

Merged

Mark jl_jit_unregister_mi as JL_NOTSAFEPOINT

081bb57

topolarity approved these changes Feb 17, 2026

View reviewed changes

xal-0 added 2 commits February 17, 2026 18:05

Update some out-of-date comments

0b6e2b1

Merge remote-tracking branch 'upstream/master' into local-names-linking

6bde4ea

xal-0 force-pushed the local-names-linking branch from c02bbc6 to 6bde4ea Compare February 18, 2026 02:20

xal-0 added 5 commits February 18, 2026 12:08

Don't store references to call_targets in queue, since we may rehash

603724c

Check return value of JIT.compileModule for failure in JLMaterializat…

5dffc89

…ionUnit

Check invoke_api in CISymbols before returning from linkCallTarget

b3851a7

In jl_codegen_output_t, don't keep reference to owned TSM

b2b42af

Refer to the module through get_TSM() always, so that we don't move a reference to the owned_TSM of the old jl_codegen_output_t in the move constructor.

Add assertion to aot_link_output for documentation reasons

4ad8884

xal-0 force-pushed the local-names-linking branch from ec3aaeb to 4ad8884 Compare February 18, 2026 20:10

Unregister toplevel CodeInstances before invoking, add tests

175e0eb

xal-0 added 7 commits February 19, 2026 15:29

Fix broken BuiltinInvokeTag check in get_item_for_reloc

70a5f23

Call jl_do_dump_compile in GC-unsafe region

03dd273

Remove some unused variables

816e912

Remove commented-out jl_using_gdb_jitevents in JL_DEBUG_BUILD

b6734b1

In emit_always_inline, erase old decl when emitting inline version

42277e9

Fix GCChecker eror in JLMaterializationUnit::materialize (again)

3d0a308

Save and restore errno, last_error in jl_invoke_oneshot

d3a9be5

xal-0 force-pushed the local-names-linking branch from 7437067 to d3a9be5 Compare February 21, 2026 01:06

vtjnash approved these changes Feb 21, 2026

View reviewed changes

xal-0 added 2 commits February 23, 2026 12:19

Merge branch 'master' into local-names-linking

e88a612

Sleep after test_gc_codeinst()

a1ab491

xal-0 merged commit e46125f into JuliaLang:master Feb 24, 2026
8 checks passed

xal-0 mentioned this pull request Feb 26, 2026

Local names linking fixes #61165

Merged

maleadt mentioned this pull request Mar 4, 2026

Test updates for 1.14. maleadt/CompilerCaching.jl#6

Merged

Uh oh!

Conversation

xal-0 commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Example

List of changes

General refactoring

Future work

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adienes commented Nov 14, 2025

Uh oh!

xal-0 commented Nov 14, 2025

Uh oh!

xal-0 commented Nov 18, 2025

Uh oh!

topolarity left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xal-0 commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

xal-0 commented Nov 4, 2025 •

edited

Loading

topolarity left a comment •

edited

Loading