Skip to content

Resource Management

0xbeefd1ed edited this page Mar 18, 2026 · 2 revisions

Resource Management

This chapter covers buffers, images, memory allocation, and how to get data onto the GPU efficiently across different hardware.

The Allocator

All buffer and image creation goes through an Allocator, a thread-safe wrapper around the Vulkan Memory Allocator (VMA):

let allocator = Allocator::new(device.clone()).unwrap();

The Allocator is reference-counted -- clone it wherever you need to allocate. It automatically enables bufferDeviceAddress support if the device has that feature enabled.

GPU Memory Architectures

Before choosing an allocation strategy, it helps to understand how memory works on different hardware. Not all GPUs expose the same memory types:

Discrete (no ReBAR) Discrete (ReBAR) AMD Integrated Intel/Apple Integrated
Private DEVICE_LOCAL DEVICE_LOCAL DEVICE_LOCAL DEVICE_LOCAL
Host HOST_VISIBLE HOST_VISIBLE HOST_VISIBLE HOST_VISIBLE
Dynamic HOST_VISIBLE, HOST_CACHED HOST_VISIBLE, HOST_CACHED HOST_VISIBLE, HOST_CACHED HOST_VISIBLE, HOST_CACHED, DEVICE_LOCAL
Upload DEVICE_LOCAL DEVICE_LOCAL, HOST_VISIBLE HOST_VISIBLE HOST_VISIBLE, DEVICE_LOCAL

The key difference: on discrete GPUs without resizable BAR, Upload memory is not host-visible. You can't write to it directly from the CPU -- you need a staging buffer. On integrated GPUs and discrete GPUs with ReBAR, Upload memory is directly writable.

Pumicite pre-calculates the best memory type index for each strategy via MemoryTypeMap, so you don't have to query memory properties yourself. The allocation functions use these indices automatically.

Buffers

Allocation Strategies

Pumicite provides four buffer types, each targeting a different use case:

Buffer::new_private -- GPU-exclusive memory. The buffer lives in VRAM and is not accessible from the CPU. Use for things like scratch buffers and anything generated entirely on the GPU.

let scratch = Buffer::new_private(
    allocator.clone(),
    1024 * 1024,                          // 1 MB
    256,                                   // alignment
    vk::BufferUsageFlags::STORAGE_BUFFER,
).unwrap();

Buffer::new_host -- CPU-accessible memory in system RAM. Use for staging buffers or data that the GPU accesses infrequently.

let staging = Buffer::new_host(
    allocator.clone(),
    size,
    4,
    vk::BufferUsageFlags::TRANSFER_SRC,
).unwrap();

Buffer::new_upload -- Device-local memory that is preferably host-writable. On GPUs with ReBAR or integrated GPUs, you can write directly. On discrete GPUs without ReBAR, a staging copy is needed. Always use BufferExt::update_contents to write data -- it handles both paths transparently.

let vertex_buffer = Buffer::new_upload(
    allocator.clone(),
    vertex_data_size,
    4,
    vk::BufferUsageFlags::VERTEX_BUFFER,
).unwrap();

Buffer::new_dynamic -- Host-visible, host-cached memory that is preferably device-local. Guarantees fast CPU reads. Use for GPU-to-CPU readback.

let readback = Buffer::new_dynamic(
    allocator.clone(),
    size,
    4,
    vk::BufferUsageFlags::TRANSFER_DST,
).unwrap();

Accessing Buffer Memory

Host-visible buffers expose their memory through the BufferLike trait:

// Write to a host-visible buffer
if let Some(slice) = buffer.as_slice_mut() {
    slice.copy_from_slice(&data);
}

// Read from a host-cached buffer
if let Some(slice) = buffer.as_slice() {
    let value = u32::from_le_bytes(slice[0..4].try_into().unwrap());
}

For non-coherent memory, call flush() after writes and invalidate() before reads to ensure visibility between CPU and GPU. These are no-ops on coherent memory.

Updating Buffers Transparently

BufferExt::update_contents abstracts over the host-visible vs staging-copy distinction. If the buffer is host-visible, it writes directly. Otherwise, it allocates a staging buffer, writes to it, and records a copy command:

use pumicite::buffer::BufferExt;

buffer.update_contents(
    |slice| {
        slice.copy_from_slice(bytemuck::cast_slice(&vertices));
        Ok(())
    },
    encoder,
    &mut staging_allocator,  // a RingBuffer or Allocator
)?;

The StagingBufferAllocator trait is implemented by both RingBuffer (for efficient transient allocations) and Allocator (for larger staging buffer allocations).

Ring Buffers

A RingBuffer is a sub-allocator for transient per-frame data -- uniform data, draw parameters, vertex data that changes every frame. Instead of creating and destroying buffers each frame, you allocate slices from a ring buffer:

// Write per-frame uniform data
let mut suballocation = ring_buffer.allocate_buffer(
    std::mem::size_of::<MyUniforms>() as u64,
    128,  // alignment
);
suballocation.as_slice_mut().unwrap()
    .copy_from_slice(bytemuck::bytes_of(&my_uniforms));

// Retain the suballocation so it lives until the GPU finishes
let suballocation = encoder.retain(suballocation);

The ring buffer manages multiple chunks internally. When a chunk fills up, it checks if any old chunks can be recycled (their Arc strong count dropped to 1, meaning the GPU is done with them). If not, it allocates a new chunk.

A RingBufferSuballocation holds an Arc reference to its parent chunk. When you encoder.retain() it, the chunk stays alive until the command buffer completes. Once all suballocations from a chunk are dropped, the ring buffer reuses that chunk.

In Bevy, bevy_pumicite provides pre-configured ring buffers as ECS resources:

fn my_render_system(
    mut ring_buffer: ResMut<DeviceLocalRingBuffer>,
    mut state: SubmissionState,
) {
    state.record(|encoder| {
        let buffer = ring_buffer.allocate_buffer(vertex_size, 4);
        let buffer = encoder.retain(buffer);
        encoder.update_buffer(buffer, bytemuck::cast_slice(&vertices));
        // Use buffer for drawing...
    });
}

Images

Creating Images

Images follow the same private/upload split as buffers:

Image::new_private -- GPU-exclusive. Use for render targets, depth buffers, and textures that are generated on the GPU.

let depth = Image::new_private(
    allocator.clone(),
    &vk::ImageCreateInfo {
        image_type: vk::ImageType::TYPE_2D,
        format: vk::Format::D32_SFLOAT,
        extent: vk::Extent3D { width: 1920, height: 1080, depth: 1 },
        mip_levels: 1,
        array_layers: 1,
        samples: vk::SampleCountFlags::TYPE_1,
        tiling: vk::ImageTiling::OPTIMAL,
        usage: vk::ImageUsageFlags::DEPTH_STENCIL_ATTACHMENT,
        initial_layout: vk::ImageLayout::UNDEFINED,
        ..Default::default()
    },
).unwrap();

Image::new_upload -- Device-local, preferably host-writable. Use for textures loaded from disk. On discrete GPUs without ReBAR, TRANSFER_DST usage is added automatically so staging copies work.

let texture = Image::new_upload(
    allocator.clone(),
    &vk::ImageCreateInfo {
        image_type: vk::ImageType::TYPE_2D,
        format: vk::Format::R8G8B8A8_SRGB,
        extent: vk::Extent3D { width, height, depth: 1 },
        mip_levels: mip_count,
        array_layers: 1,
        samples: vk::SampleCountFlags::TYPE_1,
        tiling: vk::ImageTiling::OPTIMAL,
        usage: vk::ImageUsageFlags::SAMPLED,
        initial_layout: vk::ImageLayout::UNDEFINED,
        ..Default::default()
    },
).unwrap();

Image Views

Vulkan requires image views to access images as render targets. The ImageExt::create_full_view helper creates a view covering all mip levels and array layers:

use pumicite::image::ImageExt;

let texture_view = texture.create_full_view().unwrap();

// Access the view handle for descriptor writes
let view_handle: vk::ImageView = texture_view.vk_handle();

// Access the underlying image
let image_ref: &Image = texture_view.image();

The FullImageView<T> bundles the image and its view together. The view is automatically destroyed when the FullImageView is dropped.

Uploading Image Data

ImageExt::update_contents_async handles the complete upload workflow -- staging buffer allocation, layout transitions, buffer-to-image copies for all mip levels, and final layout transition:

texture.update_contents_async(
    async |staging_slice| {
        // Fill the staging buffer with pixel data for all mip levels
        staging_slice[..pixel_data.len()].copy_from_slice(&pixel_data);
        Ok::<(), vk::Result>(())
    },
    encoder,
    &mut staging_allocator,
    vk::ImageLayout::SHADER_READ_ONLY_OPTIMAL,  // target layout after upload
).await?;

This function was intended to be called in an async recording context (with CommandPool::record_future). Under the hood, this:

  1. Calculates the total bytes needed for all mip levels
  2. Allocates a staging buffer from your StagingBufferAllocator
  3. Calls your writer to fill the staging buffer
  4. Transitions the image to TRANSFER_DST_OPTIMAL
  5. Records vkCmdCopyBufferToImage for each mip level
  6. Transitions the image to your target layout

The BufferLike Trait

All buffer types implement BufferLike, which provides a uniform interface:

pub trait BufferLike {
    fn offset(&self) -> vk::DeviceSize;
    fn device_address(&self) -> vk::DeviceAddress;
    fn size(&self) -> vk::DeviceSize;
    fn as_slice(&self) -> Option<&[u8]>;
    fn as_slice_mut(&mut self) -> Option<&mut [u8]>;
    fn flush(&mut self, range: impl RangeBounds<vk::DeviceSize>) -> VkResult<()>;
    fn invalidate(&mut self, range: impl RangeBounds<vk::DeviceSize>) -> VkResult<()>;
}

This means functions that operate on buffers can accept any buffer type:

fn upload_data<B: BufferLike>(buffer: &mut B, data: &[u8]) {
    if let Some(slice) = buffer.as_slice_mut() {
        slice[..data.len()].copy_from_slice(data);
    }
}

Debug Naming

Vulkan objects implement the DebugObject trait, which lets you assign debug names visible in validation layer messages, RenderDoc, and NSight:

use pumicite::debug::DebugObject;

// Builder pattern
let buffer = Buffer::new_private(allocator, 1024, 4, vk::BufferUsageFlags::STORAGE_BUFFER)
    .unwrap()
    .with_name(c"Scene Transform Buffer");

// Or set later
let mut image = Image::new_private(allocator, &info).unwrap();
image.set_name(c"GBuffer Albedo");

Name your resources early. When a validation error fires, the message will include the debug name instead of a raw handle number.

Resource Lifetimes

GPU commands reference resources by handle. If a resource is freed on the CPU while the GPU is still using it, you get undefined behavior. Pumicite provides two mechanisms for managing this:

encoder.retain() -- Extends an Arc<T>'s lifetime to match the command buffer's GPU execution. The retained object is stored in an arena inside the command buffer and dropped when the command buffer completes.

let buffer = Arc::new(Buffer::new_private(allocator, size, 4, usage).unwrap());
let buffer_ref = encoder.retain(buffer.clone());
// buffer_ref is &'a Buffer, guaranteed alive until GPU finishes

GPUMutex -- Wraps a resource with a timeline semaphore. When the GPUMutex is dropped while GPU work is still pending, the inner resource is sent to a recycler thread that waits for the semaphore before dropping it. encoder.lock() gives you a reference tied to the command buffer's GPU execution lifetime.

let image = GPUMutex::new(image);
// ...
let image_ref = encoder.lock(&image, vk::PipelineStageFlags2::FRAGMENT_SHADER);
// image_ref is &'a Image -- safe to use in this command buffer

Use retain for short-lived or readonly resources like staging buffers or PSOs. Use GPUMutex for long-lived resources that must be synchronized across submissions or queues.

Choosing the Right Strategy

Scenario Strategy
GPU-generated data (render targets, scratch) Buffer::new_private / Image::new_private
CPU writes once, GPU reads many times (meshes, textures) Buffer::new_upload + update_contents / Image::new_upload + update_contents_async
Per-frame data (uniforms, draw params) RingBuffer
GPU writes, CPU reads (readback, screenshots) Buffer::new_dynamic
One-off staging transfers Buffer::new_host

Summary

In this chapter you learned:

  • Allocator wraps VMA for efficient GPU memory management
  • Four buffer strategies (private, host, upload, dynamic) target different access patterns and GPU architectures
  • BufferExt::update_contents transparently handles host-visible vs staging-copy uploads
  • RingBuffer efficiently sub-allocates transient per-frame data
  • ManagedBuffer abstracts the integrated-vs-discrete GPU difference
  • Image::new_private and Image::new_upload mirror the buffer allocation split
  • FullImageView bundles an image with its view for convenience
  • DebugObject names resources for validation and debugging tools
  • retain and GPUMutex ensure resources outlive their GPU usage

Next: Chapter 5: Synchronization

Clone this wiki locally