Skip to content

Allow runtime functions to call into themselves.#11

Merged
maleadt merged 1 commit intomasterfrom
tb/runtime_recurse
Apr 23, 2020
Merged

Allow runtime functions to call into themselves.#11
maleadt merged 1 commit intomasterfrom
tb/runtime_recurse

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented Apr 23, 2020

No description provided.

@maleadt maleadt merged commit 94d8c12 into master Apr 23, 2020
@maleadt maleadt deleted the tb/runtime_recurse branch March 9, 2022 10:09
simeonschaub added a commit that referenced this pull request Nov 4, 2025
It seems like the issue is that codegen hard codes `MAX_ALIGN` based on
the host platform ABI and assumes that if the host supports `i128`
allocas the target will support it as well. For now just handle this by
converting `i128` allocas to `<2 x i64>` allocas. Discovered while
working on JuliaGPU/OpenCL.jl#379

To reproduce the issue:

```julia-repl
julia> using OpenCL, SIMD

julia> OpenCL.code_llvm(NTuple{2, Vec{8, Float32}}) do x...
           @noinline +(x...)
       end
;  @ REPL[7]:2 within `#11`
define void @julia__11_16515(ptr noalias nocapture noundef nonnull sret([1 x <8 x float>]) align 16 dereferenceable(32) %sret_return, ptr nocapture noundef nonnull readonly align 16 dereferenceable(32) %"x[1]::Vec", ptr nocapture noundef nonnull readonly align 16 dereferenceable(32) %"x[2]::Vec") local_unnamed_addr {
top:
  %"new::Tuple" = alloca [2 x [1 x <8 x float>]], align 16
  %sret_box = alloca [2 x i128], align 16
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(32) %"new::Tuple", ptr noundef nonnull align 16 dereferenceable(32) %"x[1]::Vec", i64 32, i1 false)
  %0 = getelementptr inbounds i8, ptr %"new::Tuple", i64 32
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(32) %0, ptr noundef nonnull align 16 dereferenceable(32) %"x[2]::Vec", i64 32, i1 false)
  call fastcc void @julia___16519(ptr noalias nocapture noundef sret([1 x <8 x float>]) %sret_box, ptr nocapture readonly %"new::Tuple", ptr nocapture readonly %0)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(32) %sret_return, ptr noundef nonnull align 16 dereferenceable(32) %sret_box, i64 32, i1 false)
  ret void
}
```

A similar workaround might be needed for Metal, but I don't have a Mac
to test
simeonschaub added a commit that referenced this pull request Nov 12, 2025
It seems like the issue is that codegen hard codes `MAX_ALIGN` based on
the host platform ABI and assumes that if the host supports `i128`
allocas the target will support it as well. For now just handle this by
converting `i128` allocas to `<2 x i64>` allocas. Discovered while
working on JuliaGPU/OpenCL.jl#379

To reproduce the issue:

```julia-repl
julia> using OpenCL, SIMD

julia> OpenCL.code_llvm(NTuple{2, Vec{8, Float32}}) do x...
           @noinline +(x...)
       end
;  @ REPL[7]:2 within `#11`
define void @julia__11_16515(ptr noalias nocapture noundef nonnull sret([1 x <8 x float>]) align 16 dereferenceable(32) %sret_return, ptr nocapture noundef nonnull readonly align 16 dereferenceable(32) %"x[1]::Vec", ptr nocapture noundef nonnull readonly align 16 dereferenceable(32) %"x[2]::Vec") local_unnamed_addr {
top:
  %"new::Tuple" = alloca [2 x [1 x <8 x float>]], align 16
  %sret_box = alloca [2 x i128], align 16
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(32) %"new::Tuple", ptr noundef nonnull align 16 dereferenceable(32) %"x[1]::Vec", i64 32, i1 false)
  %0 = getelementptr inbounds i8, ptr %"new::Tuple", i64 32
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(32) %0, ptr noundef nonnull align 16 dereferenceable(32) %"x[2]::Vec", i64 32, i1 false)
  call fastcc void @julia___16519(ptr noalias nocapture noundef sret([1 x <8 x float>]) %sret_box, ptr nocapture readonly %"new::Tuple", ptr nocapture readonly %0)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(32) %sret_return, ptr noundef nonnull align 16 dereferenceable(32) %sret_box, i64 32, i1 false)
  ret void
}
```

A similar workaround might be needed for Metal, but I don't have a Mac
to test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant