Skip to content

Compute

0xbeefd1ed edited this page Mar 18, 2026 · 2 revisions

Compute

This chapter covers compute pipelines, resource binding, dispatching work, and multi-pass compute workflows.

Compute Pipelines

A compute pipeline is the simplest pipeline type in Vulkan -- it consists of a single shader stage, a pipeline layout, and nothing else. No vertex input, no rasterization, no blend state. You bind resources, dispatch workgroups, and the GPU runs your shader.

In Pumicite, all pipelines are represented by the same Pipeline type. A PipelineCache compiles them:

use pumicite::pipeline::{PipelineCache, PipelineLayout, ShaderModule, ShaderEntry};

let cache = PipelineCache::empty(device.clone())?;
let pipeline = cache.create_compute_pipeline(
    layout,
    vk::PipelineCreateFlags::empty(),
    &ShaderEntry {
        module: shader_module,
        entry: Cow::Borrowed(c"main"),
        flags: vk::PipelineShaderStageCreateFlags::empty(),
        stage: vk::ShaderStageFlags::COMPUTE,
        specialization_info: Cow::Owned(SpecializationInfo::new()),
    },
)?;

The PipelineCache can persist compiled pipeline data across application runs using get_data() and from_initial_data(). Pass PipelineCache::null() if you don't need caching.

Pipeline Configuration Files

In Bevy, compute pipelines are loaded as assets from .comp.pipeline.ron files. Here's the Mandelbrot example's pipeline configuration:

ComputePipeline(
    shader: (
        path: "mandelbrot/mandelbrot.spv",
        entry_point: "main",
    ),
    layout: Inline(
        PipelineLayout(
            push_constants: {
                Compute: (0, 16)
            },
            sets: [
                Inline(
                    DescriptorSetLayout(
                        bindings: [
                            (
                                binding: 0,
                                ty: UniformBuffer,
                                count: 1,
                                stages: [Compute],
                                push_descriptor: true,
                            ),
                            (
                                binding: 1,
                                ty: StorageImage,
                                count: 1,
                                stages: [Compute],
                            ),
                        ],
                        push_descriptor: true,
                    )
                )
            ]
        )
    ),
    disable_optimization: false,
    dispatch_base: false,
)

The key fields:

  • shader -- Path to the compiled SPIR-V file and entry point name.
  • layout -- The pipeline layout. Three options:
    • Inline -- Define descriptor set layouts and push constants directly in the RON file.
    • Path -- Reference a separate .playout.ron file for sharing layouts across pipelines.
    • Bindless -- Use the global bindless descriptor set layout from DescriptorHeap. No inline layout needed.
  • disable_optimization -- Skip driver optimizations for faster compilation during development.
  • dispatch_base -- Enable VK_PIPELINE_CREATE_DISPATCH_BASE for non-zero base workgroup IDs.

Descriptor Set Layout Fields

Each binding in a DescriptorSetLayout has:

Field Description
binding Binding index in the shader
ty Descriptor type (UniformBuffer, StorageBuffer, StorageImage, SampledImage, Sampler, etc.)
count Number of descriptors at this binding (1 for non-array)
stages Shader stages that access this binding ([Compute], [Vertex, Fragment], etc.)
push_descriptor If true, this binding uses push descriptors instead of allocated sets

When push_descriptor is set on any binding, the entire set uses VK_KHR_push_descriptor -- descriptors are written inline in the command buffer rather than allocated from a pool.

Push Constants

Push constants are declared as a map from shader stage to (offset, size) in bytes:

push_constants: {
    Compute: (0, 64)
}

This creates a VkPushConstantRange with stageFlags = COMPUTE, offset = 0, size = 64.

Loading Pipelines in Bevy

Load compute pipelines through the asset server:

#[derive(Resource)]
struct MyPipelines {
    compute: Handle<ComputePipeline>,
}

fn setup(mut commands: Commands, asset_server: ResMut<AssetServer>) {
    commands.insert_resource(MyPipelines {
        compute: asset_server.load("my_shader/my_shader.comp.pipeline.ron"),
    });
}

The ComputePipelineLoader handles the complete loading process:

  1. Parses the RON configuration
  2. Loads the SPIR-V shader module (cached by path to avoid duplicates)
  3. Creates the pipeline layout from the inline definition, a referenced file, or the bindless heap
  4. Compiles the compute pipeline through the PipelineCache

Shader modules are cached -- if two pipelines reference the same .spv file, only one VkShaderModule is created.

Access the compiled pipeline in your render system through Res<Assets<ComputePipeline>>:

fn my_compute_system(
    pipeline: Res<MyPipelines>,
    compute_pipelines: Res<Assets<ComputePipeline>>,
    mut state: SubmissionState,
) {
    let Some(pipeline) = compute_pipelines.get(&pipeline.compute) else {
        return;  // Still loading
    };
    // ...
}

Binding Resources

Before dispatching, you need to tell the shader where its data lives. Vulkan provides three mechanisms, from fastest to most flexible.

Push Constants

Push constants are the fastest path for small, frequently-changing data. They're stored directly in the command buffer -- no memory allocation, no descriptor updates:

#[repr(C)]
#[derive(Copy, Clone, bytemuck::Zeroable, bytemuck::Pod)]
struct MyPushConstants {
    time: f32,
    scale: f32,
    offset: [f32; 2],
}

encoder.push_constants(
    pipeline.layout(),
    vk::ShaderStageFlags::COMPUTE,
    0,
    bytemuck::bytes_of(&MyPushConstants {
        time: elapsed,
        scale: 1.0,
        offset: [0.0, 0.0],
    }),
);

Push constants are limited to 128 bytes on most hardware (guaranteed minimum). Use them for per-dispatch parameters like time, resolution, pass index, or resource handles.

Push Descriptors

Push descriptors (VK_KHR_push_descriptor) let you write descriptors directly into the command buffer. No descriptor pool, no descriptor set allocation -- the driver manages the memory:

app.add_device_extension::<ash::khr::push_descriptor::Meta>().unwrap();

Write descriptors inline before dispatch:

let buffer_info = vk::DescriptorBufferInfo {
    buffer: buffer.vk_handle(),
    offset: buffer.offset(),
    range: buffer.size(),
};

let image_info = vk::DescriptorImageInfo {
    image_view: image_view.vk_handle(),
    image_layout: vk::ImageLayout::GENERAL,
    sampler: vk::Sampler::null(),
};

encoder.push_descriptor_set(
    vk::PipelineBindPoint::COMPUTE,
    pipeline.layout(),
    0,  // set index
    &[
        vk::WriteDescriptorSet {
            dst_binding: 0,
            descriptor_count: 1,
            descriptor_type: vk::DescriptorType::UNIFORM_BUFFER,
            p_buffer_info: &buffer_info,
            ..Default::default()
        },
        vk::WriteDescriptorSet {
            dst_binding: 1,
            descriptor_count: 1,
            descriptor_type: vk::DescriptorType::STORAGE_IMAGE,
            p_image_info: &image_info,
            ..Default::default()
        },
    ],
);

Push descriptors are ideal for per-frame resources that change every dispatch. Mark bindings with push_descriptor: true in the RON layout.

Traditional Descriptor Sets

For resources that don't change often (samplers, static textures), pre-allocate descriptor sets from a pool:

let mut pool = DescriptorPool::new(
    device.clone(),
    &[vk::DescriptorPoolSize {
        ty: vk::DescriptorType::STORAGE_BUFFER,
        descriptor_count: 1,
    }],
    1,
    vk::DescriptorPoolCreateFlags::empty(),
)?;
let descriptor_set = pool.allocate_one(&layout)?;

// Update once
unsafe {
    device.update_descriptor_sets(
        &[vk::WriteDescriptorSet {
            dst_set: descriptor_set,
            dst_binding: 0,
            descriptor_count: 1,
            descriptor_type: vk::DescriptorType::STORAGE_BUFFER,
            p_buffer_info: &buffer_info,
            ..Default::default()
        }],
        &[],
    );
}

// Bind every frame
encoder.bind_descriptor_sets(
    vk::PipelineBindPoint::COMPUTE,
    pipeline.layout(),
    0,
    &[descriptor_set],
    &[],
);

When to Use What

Mechanism Best For Limit
Push constants Per-dispatch scalars, handles, flags 128 bytes
Push descriptors Per-frame buffers and images Per-set, driver-managed
Descriptor sets Static resources, samplers Pool-allocated

You can mix all three in the same pipeline. A common pattern: push constants for per-dispatch data, push descriptors for per-frame ring buffer allocations, and a pre-allocated descriptor set for static samplers.

Dispatching Work

After binding the pipeline and resources, dispatch workgroups:

let pipeline = encoder.retain(pipeline.clone().into_inner());
encoder.bind_pipeline(vk::PipelineBindPoint::COMPUTE, &pipeline);
// ... bind resources ...
encoder.dispatch(UVec3::new(width.div_ceil(8), height.div_ceil(8), 1));

The dispatch() call takes a UVec3 specifying the number of workgroups in each dimension. The total number of shader invocations is workgroups.x * workgroups.y * workgroups.z * local_size.x * local_size.y * local_size.z.

For a 1920x1080 image with an 8x8 workgroup size:

let workgroups = UVec3::new(
    1920_u32.div_ceil(8),  // 240
    1080_u32.div_ceil(8),  // 135
    1,
);
encoder.dispatch(workgroups);
// 240 * 135 * 1 = 32,400 workgroups
// 32,400 * 64 = 2,073,600 invocations (one per pixel)

Note the encoder.retain(pipeline.clone().into_inner()) call. ComputePipeline wraps an Arc<Pipeline>. into_inner() extracts the Arc, and retain() ensures the pipeline stays alive until the command buffer completes on the GPU.

Multi-Pass Compute

Real workloads often involve multiple compute passes where each pass reads the output of the previous one. The key is inserting memory barriers between passes so writes from pass N are visible to reads in pass N+1.

Barrier Pattern

Between compute passes that share storage images:

// Pass 1: write to image
encoder.bind_pipeline(vk::PipelineBindPoint::COMPUTE, &pass1_pipeline);
// ... bind resources ...
encoder.dispatch(workgroups);

// Barrier: make pass 1 writes visible to pass 2 reads
encoder.memory_barrier(Access::COMPUTE_WRITE, Access::COMPUTE_READ);
encoder.emit_barriers();

// Pass 2: read from image, write to another
encoder.bind_pipeline(vk::PipelineBindPoint::COMPUTE, &pass2_pipeline);
// ... bind resources ...
encoder.dispatch(workgroups);

For ping-pong passes where two images alternate between read and write roles:

for i in 0..num_passes {
    encoder.bind_pipeline(vk::PipelineBindPoint::COMPUTE, &pipeline);
    // ... bind src_image[i % 2] and dst_image[(i + 1) % 2] ...
    encoder.push_constants(
        pipeline.layout(),
        vk::ShaderStageFlags::COMPUTE,
        0,
        bytemuck::bytes_of(&pass_params),
    );
    encoder.dispatch(workgroups);

    // Ensure writes complete before the next pass reads
    encoder.memory_barrier(Access::COMPUTE_WRITE, Access::COMPUTE_READ);
    encoder.emit_barriers();
}

Transitioning to Other Stages

After compute passes finish, you often need the result in a different pipeline stage. Use use_image_resource to transition the layout:

// After compute writes to an image in GENERAL layout...
encoder.use_image_resource(
    image,
    &mut image_state,
    Access::FRAGMENT_SAMPLED_READ,
    vk::ImageLayout::SHADER_READ_ONLY_OPTIMAL,
    0..1, 0..1, false,
);
encoder.emit_barriers();
// Image is now ready for sampling in a fragment shader

Or to present after compute:

encoder.use_image_resource(
    swapchain_image,
    &mut swapchain_image.state,
    Access::COLOR_ATTACHMENT_WRITE,
    vk::ImageLayout::COLOR_ATTACHMENT_OPTIMAL,
    0..1, 0..1, false,
);
encoder.emit_barriers();

Specialization Constants

Specialization constants let you set compile-time values in SPIR-V shaders. The driver can optimize the shader based on these known constants -- dead code elimination, loop unrolling, constant folding:

use pumicite::pipeline::SpecializationInfo;

let mut spec = SpecializationInfo::new();
spec.push(0, 16u32);      // constant_id 0 = workgroup size
spec.push(1, true);       // constant_id 1 = enable feature flag

let pipeline = cache.create_compute_pipeline(
    layout,
    vk::PipelineCreateFlags::empty(),
    &ShaderEntry {
        module: shader_module,
        entry: Cow::Borrowed(c"main"),
        flags: vk::PipelineShaderStageCreateFlags::empty(),
        stage: vk::ShaderStageFlags::COMPUTE,
        specialization_info: Cow::Owned(spec),
    },
)?;

Rust bool values are automatically converted to VkBool32 (4 bytes) to match the SPIR-V OpSpecConstantTrue/OpSpecConstantFalse representation.

Use specialization constants for workgroup sizes, algorithm parameters, or feature toggles that vary between pipeline variants but not between dispatches.

A Complete Example

Here's the Mandelbrot example -- an interactive fractal renderer using a single compute pass that writes directly to the swapchain image:

use bevy::prelude::*;
use bevy_pumicite::prelude::*;

fn main() {
    let mut app = bevy::app::App::new();
    app.add_plugins(bevy_pumicite::DefaultPlugins);

    let primary_window = app.world_mut()
        .query_filtered::<Entity, With<bevy::window::PrimaryWindow>>()
        .iter(app.world())
        .next()
        .unwrap();
    app.world_mut().entity_mut(primary_window).insert(SwapchainConfig {
        image_usage: vk::ImageUsageFlags::STORAGE
            | vk::ImageUsageFlags::TRANSFER_DST
            | vk::ImageUsageFlags::COLOR_ATTACHMENT,
        ..Default::default()
    });
    app.add_device_extension::<ash::khr::push_descriptor::Meta>().unwrap();

    app.add_systems(Startup, setup);
    app.add_systems(PostUpdate, mandelbrot_rendering.in_set(DefaultRenderSet));
    app.run();
}

#[derive(Resource)]
struct MandelbrotPipeline {
    draw: Handle<ComputePipeline>,
}

#[repr(C)]
#[derive(Resource, Copy, Clone, bytemuck::Zeroable, bytemuck::Pod)]
struct MandelbrotState {
    center: [f32; 2],
    scale: f32,
    max_iter: u32,
}

fn setup(mut commands: Commands, asset_server: ResMut<AssetServer>) {
    commands.insert_resource(MandelbrotPipeline {
        draw: asset_server.load("mandelbrot/mandelbrot.comp.pipeline.ron"),
    });
    commands.insert_resource(MandelbrotState {
        center: [0.0, 0.0],
        scale: 0.005,
        max_iter: 1000,
    });
}

fn mandelbrot_rendering(
    mut swapchain_image: Query<&mut SwapchainImage, With<bevy::window::PrimaryWindow>>,
    mut state: SubmissionState,
    pipeline: Res<MandelbrotPipeline>,
    compute_pipelines: Res<Assets<ComputePipeline>>,
    mut ring_buffer: ResMut<UniformRingBuffer>,
    mandelbrot_state: Res<MandelbrotState>,
) {
    let Ok(mut swapchain_image) = swapchain_image.single_mut() else { return };
    let pipeline = compute_pipelines.get(&pipeline.draw);

    state.record(|encoder| {
        // Upload uniform data to ring buffer
        let mut buffer = ring_buffer.allocate_buffer(
            std::mem::size_of::<MandelbrotState>() as u64, 128,
        );
        buffer.as_slice_mut().unwrap()
            .copy_from_slice(bytemuck::bytes_of(&*mandelbrot_state));
        let buffer = encoder.retain(buffer);

        let Some(current) = swapchain_image.current_image() else { return };
        let current = encoder.lock(current, vk::PipelineStageFlags2::COMPUTE_SHADER);

        // Prepare descriptors
        let buffer_info = vk::DescriptorBufferInfo {
            buffer: buffer.vk_handle(),
            offset: buffer.offset(),
            range: buffer.size(),
        };
        let image_info = vk::DescriptorImageInfo {
            image_view: current.linear_view().vk_handle(),
            image_layout: vk::ImageLayout::GENERAL,
            sampler: vk::Sampler::null(),
        };

        // Transition swapchain image for compute writes
        encoder.use_image_resource(
            current, &mut swapchain_image.state,
            Access::COMPUTE_WRITE, vk::ImageLayout::GENERAL,
            0..1, 0..1, false,
        );
        encoder.emit_barriers();

        if let Some(pipeline) = pipeline {
            let pipeline = encoder.retain(pipeline.clone().into_inner());
            encoder.bind_pipeline(vk::PipelineBindPoint::COMPUTE, &pipeline);
            encoder.push_descriptor_set(
                vk::PipelineBindPoint::COMPUTE,
                pipeline.layout(),
                0,
                &[
                    vk::WriteDescriptorSet {
                        dst_binding: 0,
                        descriptor_count: 1,
                        descriptor_type: vk::DescriptorType::UNIFORM_BUFFER,
                        p_buffer_info: &buffer_info,
                        ..Default::default()
                    },
                    vk::WriteDescriptorSet {
                        dst_binding: 1,
                        descriptor_count: 1,
                        descriptor_type: vk::DescriptorType::STORAGE_IMAGE,
                        p_image_info: &image_info,
                        ..Default::default()
                    },
                ],
            );

            let (width, height) = (current.extent().x, current.extent().y);
            encoder.dispatch(UVec3::new(width.div_ceil(8), height.div_ceil(8), 1));
        }
    });
}

Key points:

  1. SwapchainConfig adds STORAGE usage so the swapchain image can be used as a compute storage image.
  2. UniformRingBuffer provides per-frame uniform data without manual buffer management.
  3. encoder.retain() keeps the ring buffer allocation and pipeline alive until the GPU finishes.
  4. use_image_resource transitions the swapchain image to GENERAL layout for storage image access.
  5. Push descriptors bind the uniform buffer and storage image inline -- no descriptor pool needed.
  6. Workgroup calculation uses div_ceil to handle image dimensions that aren't multiples of the workgroup size.

Summary

In this chapter you learned:

  • PipelineCache compiles compute pipelines from shader modules and layouts
  • .comp.pipeline.ron files define pipelines declaratively with inline or referenced layouts
  • Push constants are the fastest path for small per-dispatch data (up to 128 bytes)
  • Push descriptors write per-frame buffer and image bindings inline in the command buffer
  • encoder.dispatch() launches workgroups; total invocations = workgroups * local size
  • Memory barriers between passes ensure compute writes are visible to subsequent reads
  • Specialization constants provide compile-time values for driver optimization

Next: Chapter 7: Rendering

Clone this wiki locally