-
Notifications
You must be signed in to change notification settings - Fork 1
Compute
This chapter covers compute pipelines, resource binding, dispatching work, and multi-pass compute workflows.
A compute pipeline is the simplest pipeline type in Vulkan -- it consists of a single shader stage, a pipeline layout, and nothing else. No vertex input, no rasterization, no blend state. You bind resources, dispatch workgroups, and the GPU runs your shader.
In Pumicite, all pipelines are represented by the same Pipeline type. A PipelineCache
compiles them:
use pumicite::pipeline::{PipelineCache, PipelineLayout, ShaderModule, ShaderEntry};
let cache = PipelineCache::empty(device.clone())?;
let pipeline = cache.create_compute_pipeline(
layout,
vk::PipelineCreateFlags::empty(),
&ShaderEntry {
module: shader_module,
entry: Cow::Borrowed(c"main"),
flags: vk::PipelineShaderStageCreateFlags::empty(),
stage: vk::ShaderStageFlags::COMPUTE,
specialization_info: Cow::Owned(SpecializationInfo::new()),
},
)?;The PipelineCache can persist compiled pipeline data across application runs using
get_data() and from_initial_data(). Pass PipelineCache::null() if you don't
need caching.
In Bevy, compute pipelines are loaded as assets from .comp.pipeline.ron files. Here's
the Mandelbrot example's pipeline configuration:
ComputePipeline(
shader: (
path: "mandelbrot/mandelbrot.spv",
entry_point: "main",
),
layout: Inline(
PipelineLayout(
push_constants: {
Compute: (0, 16)
},
sets: [
Inline(
DescriptorSetLayout(
bindings: [
(
binding: 0,
ty: UniformBuffer,
count: 1,
stages: [Compute],
push_descriptor: true,
),
(
binding: 1,
ty: StorageImage,
count: 1,
stages: [Compute],
),
],
push_descriptor: true,
)
)
]
)
),
disable_optimization: false,
dispatch_base: false,
)The key fields:
-
shader-- Path to the compiled SPIR-V file and entry point name. -
layout-- The pipeline layout. Three options:-
Inline-- Define descriptor set layouts and push constants directly in the RON file. -
Path-- Reference a separate.playout.ronfile for sharing layouts across pipelines. -
Bindless-- Use the global bindless descriptor set layout fromDescriptorHeap. No inline layout needed.
-
-
disable_optimization-- Skip driver optimizations for faster compilation during development. -
dispatch_base-- EnableVK_PIPELINE_CREATE_DISPATCH_BASEfor non-zero base workgroup IDs.
Each binding in a DescriptorSetLayout has:
| Field | Description |
|---|---|
binding |
Binding index in the shader |
ty |
Descriptor type (UniformBuffer, StorageBuffer, StorageImage, SampledImage, Sampler, etc.) |
count |
Number of descriptors at this binding (1 for non-array) |
stages |
Shader stages that access this binding ([Compute], [Vertex, Fragment], etc.) |
push_descriptor |
If true, this binding uses push descriptors instead of allocated sets |
When push_descriptor is set on any binding, the entire set uses
VK_KHR_push_descriptor -- descriptors are written inline in the command buffer
rather than allocated from a pool.
Push constants are declared as a map from shader stage to (offset, size) in bytes:
push_constants: {
Compute: (0, 64)
}This creates a VkPushConstantRange with stageFlags = COMPUTE, offset = 0,
size = 64.
Load compute pipelines through the asset server:
#[derive(Resource)]
struct MyPipelines {
compute: Handle<ComputePipeline>,
}
fn setup(mut commands: Commands, asset_server: ResMut<AssetServer>) {
commands.insert_resource(MyPipelines {
compute: asset_server.load("my_shader/my_shader.comp.pipeline.ron"),
});
}The ComputePipelineLoader handles the complete loading process:
- Parses the RON configuration
- Loads the SPIR-V shader module (cached by path to avoid duplicates)
- Creates the pipeline layout from the inline definition, a referenced file, or the bindless heap
- Compiles the compute pipeline through the
PipelineCache
Shader modules are cached -- if two pipelines reference the same .spv file, only one
VkShaderModule is created.
Access the compiled pipeline in your render system through Res<Assets<ComputePipeline>>:
fn my_compute_system(
pipeline: Res<MyPipelines>,
compute_pipelines: Res<Assets<ComputePipeline>>,
mut state: SubmissionState,
) {
let Some(pipeline) = compute_pipelines.get(&pipeline.compute) else {
return; // Still loading
};
// ...
}Before dispatching, you need to tell the shader where its data lives. Vulkan provides three mechanisms, from fastest to most flexible.
Push constants are the fastest path for small, frequently-changing data. They're stored directly in the command buffer -- no memory allocation, no descriptor updates:
#[repr(C)]
#[derive(Copy, Clone, bytemuck::Zeroable, bytemuck::Pod)]
struct MyPushConstants {
time: f32,
scale: f32,
offset: [f32; 2],
}
encoder.push_constants(
pipeline.layout(),
vk::ShaderStageFlags::COMPUTE,
0,
bytemuck::bytes_of(&MyPushConstants {
time: elapsed,
scale: 1.0,
offset: [0.0, 0.0],
}),
);Push constants are limited to 128 bytes on most hardware (guaranteed minimum). Use them for per-dispatch parameters like time, resolution, pass index, or resource handles.
Push descriptors (VK_KHR_push_descriptor) let you write descriptors directly into the
command buffer. No descriptor pool, no descriptor set allocation -- the driver manages
the memory:
app.add_device_extension::<ash::khr::push_descriptor::Meta>().unwrap();Write descriptors inline before dispatch:
let buffer_info = vk::DescriptorBufferInfo {
buffer: buffer.vk_handle(),
offset: buffer.offset(),
range: buffer.size(),
};
let image_info = vk::DescriptorImageInfo {
image_view: image_view.vk_handle(),
image_layout: vk::ImageLayout::GENERAL,
sampler: vk::Sampler::null(),
};
encoder.push_descriptor_set(
vk::PipelineBindPoint::COMPUTE,
pipeline.layout(),
0, // set index
&[
vk::WriteDescriptorSet {
dst_binding: 0,
descriptor_count: 1,
descriptor_type: vk::DescriptorType::UNIFORM_BUFFER,
p_buffer_info: &buffer_info,
..Default::default()
},
vk::WriteDescriptorSet {
dst_binding: 1,
descriptor_count: 1,
descriptor_type: vk::DescriptorType::STORAGE_IMAGE,
p_image_info: &image_info,
..Default::default()
},
],
);Push descriptors are ideal for per-frame resources that change every dispatch. Mark
bindings with push_descriptor: true in the RON layout.
For resources that don't change often (samplers, static textures), pre-allocate descriptor sets from a pool:
let mut pool = DescriptorPool::new(
device.clone(),
&[vk::DescriptorPoolSize {
ty: vk::DescriptorType::STORAGE_BUFFER,
descriptor_count: 1,
}],
1,
vk::DescriptorPoolCreateFlags::empty(),
)?;
let descriptor_set = pool.allocate_one(&layout)?;
// Update once
unsafe {
device.update_descriptor_sets(
&[vk::WriteDescriptorSet {
dst_set: descriptor_set,
dst_binding: 0,
descriptor_count: 1,
descriptor_type: vk::DescriptorType::STORAGE_BUFFER,
p_buffer_info: &buffer_info,
..Default::default()
}],
&[],
);
}
// Bind every frame
encoder.bind_descriptor_sets(
vk::PipelineBindPoint::COMPUTE,
pipeline.layout(),
0,
&[descriptor_set],
&[],
);| Mechanism | Best For | Limit |
|---|---|---|
| Push constants | Per-dispatch scalars, handles, flags | 128 bytes |
| Push descriptors | Per-frame buffers and images | Per-set, driver-managed |
| Descriptor sets | Static resources, samplers | Pool-allocated |
You can mix all three in the same pipeline. A common pattern: push constants for per-dispatch data, push descriptors for per-frame ring buffer allocations, and a pre-allocated descriptor set for static samplers.
After binding the pipeline and resources, dispatch workgroups:
let pipeline = encoder.retain(pipeline.clone().into_inner());
encoder.bind_pipeline(vk::PipelineBindPoint::COMPUTE, &pipeline);
// ... bind resources ...
encoder.dispatch(UVec3::new(width.div_ceil(8), height.div_ceil(8), 1));The dispatch() call takes a UVec3 specifying the number of workgroups in each
dimension. The total number of shader invocations is
workgroups.x * workgroups.y * workgroups.z * local_size.x * local_size.y * local_size.z.
For a 1920x1080 image with an 8x8 workgroup size:
let workgroups = UVec3::new(
1920_u32.div_ceil(8), // 240
1080_u32.div_ceil(8), // 135
1,
);
encoder.dispatch(workgroups);
// 240 * 135 * 1 = 32,400 workgroups
// 32,400 * 64 = 2,073,600 invocations (one per pixel)Note the encoder.retain(pipeline.clone().into_inner()) call. ComputePipeline wraps
an Arc<Pipeline>. into_inner() extracts the Arc, and retain() ensures the
pipeline stays alive until the command buffer completes on the GPU.
Real workloads often involve multiple compute passes where each pass reads the output of the previous one. The key is inserting memory barriers between passes so writes from pass N are visible to reads in pass N+1.
Between compute passes that share storage images:
// Pass 1: write to image
encoder.bind_pipeline(vk::PipelineBindPoint::COMPUTE, &pass1_pipeline);
// ... bind resources ...
encoder.dispatch(workgroups);
// Barrier: make pass 1 writes visible to pass 2 reads
encoder.memory_barrier(Access::COMPUTE_WRITE, Access::COMPUTE_READ);
encoder.emit_barriers();
// Pass 2: read from image, write to another
encoder.bind_pipeline(vk::PipelineBindPoint::COMPUTE, &pass2_pipeline);
// ... bind resources ...
encoder.dispatch(workgroups);For ping-pong passes where two images alternate between read and write roles:
for i in 0..num_passes {
encoder.bind_pipeline(vk::PipelineBindPoint::COMPUTE, &pipeline);
// ... bind src_image[i % 2] and dst_image[(i + 1) % 2] ...
encoder.push_constants(
pipeline.layout(),
vk::ShaderStageFlags::COMPUTE,
0,
bytemuck::bytes_of(&pass_params),
);
encoder.dispatch(workgroups);
// Ensure writes complete before the next pass reads
encoder.memory_barrier(Access::COMPUTE_WRITE, Access::COMPUTE_READ);
encoder.emit_barriers();
}After compute passes finish, you often need the result in a different pipeline stage.
Use use_image_resource to transition the layout:
// After compute writes to an image in GENERAL layout...
encoder.use_image_resource(
image,
&mut image_state,
Access::FRAGMENT_SAMPLED_READ,
vk::ImageLayout::SHADER_READ_ONLY_OPTIMAL,
0..1, 0..1, false,
);
encoder.emit_barriers();
// Image is now ready for sampling in a fragment shaderOr to present after compute:
encoder.use_image_resource(
swapchain_image,
&mut swapchain_image.state,
Access::COLOR_ATTACHMENT_WRITE,
vk::ImageLayout::COLOR_ATTACHMENT_OPTIMAL,
0..1, 0..1, false,
);
encoder.emit_barriers();Specialization constants let you set compile-time values in SPIR-V shaders. The driver can optimize the shader based on these known constants -- dead code elimination, loop unrolling, constant folding:
use pumicite::pipeline::SpecializationInfo;
let mut spec = SpecializationInfo::new();
spec.push(0, 16u32); // constant_id 0 = workgroup size
spec.push(1, true); // constant_id 1 = enable feature flag
let pipeline = cache.create_compute_pipeline(
layout,
vk::PipelineCreateFlags::empty(),
&ShaderEntry {
module: shader_module,
entry: Cow::Borrowed(c"main"),
flags: vk::PipelineShaderStageCreateFlags::empty(),
stage: vk::ShaderStageFlags::COMPUTE,
specialization_info: Cow::Owned(spec),
},
)?;Rust bool values are automatically converted to VkBool32 (4 bytes) to match the
SPIR-V OpSpecConstantTrue/OpSpecConstantFalse representation.
Use specialization constants for workgroup sizes, algorithm parameters, or feature toggles that vary between pipeline variants but not between dispatches.
Here's the Mandelbrot example -- an interactive fractal renderer using a single compute pass that writes directly to the swapchain image:
use bevy::prelude::*;
use bevy_pumicite::prelude::*;
fn main() {
let mut app = bevy::app::App::new();
app.add_plugins(bevy_pumicite::DefaultPlugins);
let primary_window = app.world_mut()
.query_filtered::<Entity, With<bevy::window::PrimaryWindow>>()
.iter(app.world())
.next()
.unwrap();
app.world_mut().entity_mut(primary_window).insert(SwapchainConfig {
image_usage: vk::ImageUsageFlags::STORAGE
| vk::ImageUsageFlags::TRANSFER_DST
| vk::ImageUsageFlags::COLOR_ATTACHMENT,
..Default::default()
});
app.add_device_extension::<ash::khr::push_descriptor::Meta>().unwrap();
app.add_systems(Startup, setup);
app.add_systems(PostUpdate, mandelbrot_rendering.in_set(DefaultRenderSet));
app.run();
}
#[derive(Resource)]
struct MandelbrotPipeline {
draw: Handle<ComputePipeline>,
}
#[repr(C)]
#[derive(Resource, Copy, Clone, bytemuck::Zeroable, bytemuck::Pod)]
struct MandelbrotState {
center: [f32; 2],
scale: f32,
max_iter: u32,
}
fn setup(mut commands: Commands, asset_server: ResMut<AssetServer>) {
commands.insert_resource(MandelbrotPipeline {
draw: asset_server.load("mandelbrot/mandelbrot.comp.pipeline.ron"),
});
commands.insert_resource(MandelbrotState {
center: [0.0, 0.0],
scale: 0.005,
max_iter: 1000,
});
}
fn mandelbrot_rendering(
mut swapchain_image: Query<&mut SwapchainImage, With<bevy::window::PrimaryWindow>>,
mut state: SubmissionState,
pipeline: Res<MandelbrotPipeline>,
compute_pipelines: Res<Assets<ComputePipeline>>,
mut ring_buffer: ResMut<UniformRingBuffer>,
mandelbrot_state: Res<MandelbrotState>,
) {
let Ok(mut swapchain_image) = swapchain_image.single_mut() else { return };
let pipeline = compute_pipelines.get(&pipeline.draw);
state.record(|encoder| {
// Upload uniform data to ring buffer
let mut buffer = ring_buffer.allocate_buffer(
std::mem::size_of::<MandelbrotState>() as u64, 128,
);
buffer.as_slice_mut().unwrap()
.copy_from_slice(bytemuck::bytes_of(&*mandelbrot_state));
let buffer = encoder.retain(buffer);
let Some(current) = swapchain_image.current_image() else { return };
let current = encoder.lock(current, vk::PipelineStageFlags2::COMPUTE_SHADER);
// Prepare descriptors
let buffer_info = vk::DescriptorBufferInfo {
buffer: buffer.vk_handle(),
offset: buffer.offset(),
range: buffer.size(),
};
let image_info = vk::DescriptorImageInfo {
image_view: current.linear_view().vk_handle(),
image_layout: vk::ImageLayout::GENERAL,
sampler: vk::Sampler::null(),
};
// Transition swapchain image for compute writes
encoder.use_image_resource(
current, &mut swapchain_image.state,
Access::COMPUTE_WRITE, vk::ImageLayout::GENERAL,
0..1, 0..1, false,
);
encoder.emit_barriers();
if let Some(pipeline) = pipeline {
let pipeline = encoder.retain(pipeline.clone().into_inner());
encoder.bind_pipeline(vk::PipelineBindPoint::COMPUTE, &pipeline);
encoder.push_descriptor_set(
vk::PipelineBindPoint::COMPUTE,
pipeline.layout(),
0,
&[
vk::WriteDescriptorSet {
dst_binding: 0,
descriptor_count: 1,
descriptor_type: vk::DescriptorType::UNIFORM_BUFFER,
p_buffer_info: &buffer_info,
..Default::default()
},
vk::WriteDescriptorSet {
dst_binding: 1,
descriptor_count: 1,
descriptor_type: vk::DescriptorType::STORAGE_IMAGE,
p_image_info: &image_info,
..Default::default()
},
],
);
let (width, height) = (current.extent().x, current.extent().y);
encoder.dispatch(UVec3::new(width.div_ceil(8), height.div_ceil(8), 1));
}
});
}Key points:
-
SwapchainConfigaddsSTORAGEusage so the swapchain image can be used as a compute storage image. -
UniformRingBufferprovides per-frame uniform data without manual buffer management. -
encoder.retain()keeps the ring buffer allocation and pipeline alive until the GPU finishes. -
use_image_resourcetransitions the swapchain image toGENERALlayout for storage image access. - Push descriptors bind the uniform buffer and storage image inline -- no descriptor pool needed.
-
Workgroup calculation uses
div_ceilto handle image dimensions that aren't multiples of the workgroup size.
In this chapter you learned:
-
PipelineCachecompiles compute pipelines from shader modules and layouts -
.comp.pipeline.ronfiles define pipelines declaratively with inline or referenced layouts - Push constants are the fastest path for small per-dispatch data (up to 128 bytes)
- Push descriptors write per-frame buffer and image bindings inline in the command buffer
-
encoder.dispatch()launches workgroups; total invocations = workgroups * local size - Memory barriers between passes ensure compute writes are visible to subsequent reads
- Specialization constants provide compile-time values for driver optimization
Next: Chapter 7: Rendering