-
Notifications
You must be signed in to change notification settings - Fork 1
Resource Management
This chapter covers buffers, images, memory allocation, and how to get data onto the GPU efficiently across different hardware.
All buffer and image creation goes through an Allocator, a thread-safe wrapper around
the Vulkan Memory Allocator (VMA):
let allocator = Allocator::new(device.clone()).unwrap();The Allocator is reference-counted -- clone it wherever you need to allocate. It
automatically enables bufferDeviceAddress support if the device has that feature
enabled.
Before choosing an allocation strategy, it helps to understand how memory works on different hardware. Not all GPUs expose the same memory types:
| Discrete (no ReBAR) | Discrete (ReBAR) | AMD Integrated | Intel/Apple Integrated | |
|---|---|---|---|---|
| Private | DEVICE_LOCAL | DEVICE_LOCAL | DEVICE_LOCAL | DEVICE_LOCAL |
| Host | HOST_VISIBLE | HOST_VISIBLE | HOST_VISIBLE | HOST_VISIBLE |
| Dynamic | HOST_VISIBLE, HOST_CACHED | HOST_VISIBLE, HOST_CACHED | HOST_VISIBLE, HOST_CACHED | HOST_VISIBLE, HOST_CACHED, DEVICE_LOCAL |
| Upload | DEVICE_LOCAL | DEVICE_LOCAL, HOST_VISIBLE | HOST_VISIBLE | HOST_VISIBLE, DEVICE_LOCAL |
The key difference: on discrete GPUs without resizable BAR, Upload memory is not host-visible. You can't write to it directly from the CPU -- you need a staging buffer. On integrated GPUs and discrete GPUs with ReBAR, Upload memory is directly writable.
Pumicite pre-calculates the best memory type index for each strategy via MemoryTypeMap,
so you don't have to query memory properties yourself. The allocation functions use
these indices automatically.
Pumicite provides four buffer types, each targeting a different use case:
Buffer::new_private -- GPU-exclusive memory. The buffer lives in VRAM and is not
accessible from the CPU. Use for things like scratch buffers and anything generated
entirely on the GPU.
let scratch = Buffer::new_private(
allocator.clone(),
1024 * 1024, // 1 MB
256, // alignment
vk::BufferUsageFlags::STORAGE_BUFFER,
).unwrap();Buffer::new_host -- CPU-accessible memory in system RAM. Use for staging buffers
or data that the GPU accesses infrequently.
let staging = Buffer::new_host(
allocator.clone(),
size,
4,
vk::BufferUsageFlags::TRANSFER_SRC,
).unwrap();Buffer::new_upload -- Device-local memory that is preferably host-writable. On
GPUs with ReBAR or integrated GPUs, you can write directly. On discrete GPUs without
ReBAR, a staging copy is needed. Always use BufferExt::update_contents to write data
-- it handles both paths transparently.
let vertex_buffer = Buffer::new_upload(
allocator.clone(),
vertex_data_size,
4,
vk::BufferUsageFlags::VERTEX_BUFFER,
).unwrap();Buffer::new_dynamic -- Host-visible, host-cached memory that is preferably
device-local. Guarantees fast CPU reads. Use for GPU-to-CPU readback.
let readback = Buffer::new_dynamic(
allocator.clone(),
size,
4,
vk::BufferUsageFlags::TRANSFER_DST,
).unwrap();Host-visible buffers expose their memory through the BufferLike trait:
// Write to a host-visible buffer
if let Some(slice) = buffer.as_slice_mut() {
slice.copy_from_slice(&data);
}
// Read from a host-cached buffer
if let Some(slice) = buffer.as_slice() {
let value = u32::from_le_bytes(slice[0..4].try_into().unwrap());
}For non-coherent memory, call flush() after writes and invalidate() before reads to
ensure visibility between CPU and GPU. These are no-ops on coherent memory.
BufferExt::update_contents abstracts over the host-visible vs staging-copy distinction.
If the buffer is host-visible, it writes directly. Otherwise, it allocates a staging
buffer, writes to it, and records a copy command:
use pumicite::buffer::BufferExt;
buffer.update_contents(
|slice| {
slice.copy_from_slice(bytemuck::cast_slice(&vertices));
Ok(())
},
encoder,
&mut staging_allocator, // a RingBuffer or Allocator
)?;The StagingBufferAllocator trait is implemented by both RingBuffer (for efficient
transient allocations) and Allocator (for larger staging buffer allocations).
A RingBuffer is a sub-allocator for transient per-frame data -- uniform data, draw
parameters, vertex data that changes every frame. Instead of creating and destroying
buffers each frame, you allocate slices from a ring buffer:
// Write per-frame uniform data
let mut suballocation = ring_buffer.allocate_buffer(
std::mem::size_of::<MyUniforms>() as u64,
128, // alignment
);
suballocation.as_slice_mut().unwrap()
.copy_from_slice(bytemuck::bytes_of(&my_uniforms));
// Retain the suballocation so it lives until the GPU finishes
let suballocation = encoder.retain(suballocation);The ring buffer manages multiple chunks internally. When a chunk fills up, it checks if
any old chunks can be recycled (their Arc strong count dropped to 1, meaning the GPU
is done with them). If not, it allocates a new chunk.
A RingBufferSuballocation holds an Arc reference to its parent chunk. When you
encoder.retain() it, the chunk stays alive until the command buffer completes. Once all
suballocations from a chunk are dropped, the ring buffer reuses that chunk.
In Bevy, bevy_pumicite provides pre-configured ring buffers as ECS resources:
fn my_render_system(
mut ring_buffer: ResMut<DeviceLocalRingBuffer>,
mut state: SubmissionState,
) {
state.record(|encoder| {
let buffer = ring_buffer.allocate_buffer(vertex_size, 4);
let buffer = encoder.retain(buffer);
encoder.update_buffer(buffer, bytemuck::cast_slice(&vertices));
// Use buffer for drawing...
});
}Images follow the same private/upload split as buffers:
Image::new_private -- GPU-exclusive. Use for render targets, depth buffers, and
textures that are generated on the GPU.
let depth = Image::new_private(
allocator.clone(),
&vk::ImageCreateInfo {
image_type: vk::ImageType::TYPE_2D,
format: vk::Format::D32_SFLOAT,
extent: vk::Extent3D { width: 1920, height: 1080, depth: 1 },
mip_levels: 1,
array_layers: 1,
samples: vk::SampleCountFlags::TYPE_1,
tiling: vk::ImageTiling::OPTIMAL,
usage: vk::ImageUsageFlags::DEPTH_STENCIL_ATTACHMENT,
initial_layout: vk::ImageLayout::UNDEFINED,
..Default::default()
},
).unwrap();Image::new_upload -- Device-local, preferably host-writable. Use for textures
loaded from disk. On discrete GPUs without ReBAR, TRANSFER_DST usage is added
automatically so staging copies work.
let texture = Image::new_upload(
allocator.clone(),
&vk::ImageCreateInfo {
image_type: vk::ImageType::TYPE_2D,
format: vk::Format::R8G8B8A8_SRGB,
extent: vk::Extent3D { width, height, depth: 1 },
mip_levels: mip_count,
array_layers: 1,
samples: vk::SampleCountFlags::TYPE_1,
tiling: vk::ImageTiling::OPTIMAL,
usage: vk::ImageUsageFlags::SAMPLED,
initial_layout: vk::ImageLayout::UNDEFINED,
..Default::default()
},
).unwrap();Vulkan requires image views to access images as render targets. The ImageExt::create_full_view
helper creates a view covering all mip levels and array layers:
use pumicite::image::ImageExt;
let texture_view = texture.create_full_view().unwrap();
// Access the view handle for descriptor writes
let view_handle: vk::ImageView = texture_view.vk_handle();
// Access the underlying image
let image_ref: &Image = texture_view.image();The FullImageView<T> bundles the image and its view together. The view is automatically
destroyed when the FullImageView is dropped.
ImageExt::update_contents_async handles the complete upload workflow -- staging buffer
allocation, layout transitions, buffer-to-image copies for all mip levels, and final
layout transition:
texture.update_contents_async(
async |staging_slice| {
// Fill the staging buffer with pixel data for all mip levels
staging_slice[..pixel_data.len()].copy_from_slice(&pixel_data);
Ok::<(), vk::Result>(())
},
encoder,
&mut staging_allocator,
vk::ImageLayout::SHADER_READ_ONLY_OPTIMAL, // target layout after upload
).await?;This function was intended to be called in an async recording context
(with CommandPool::record_future). Under the hood, this:
- Calculates the total bytes needed for all mip levels
- Allocates a staging buffer from your
StagingBufferAllocator - Calls your writer to fill the staging buffer
- Transitions the image to
TRANSFER_DST_OPTIMAL - Records
vkCmdCopyBufferToImagefor each mip level - Transitions the image to your target layout
All buffer types implement BufferLike, which provides a uniform interface:
pub trait BufferLike {
fn offset(&self) -> vk::DeviceSize;
fn device_address(&self) -> vk::DeviceAddress;
fn size(&self) -> vk::DeviceSize;
fn as_slice(&self) -> Option<&[u8]>;
fn as_slice_mut(&mut self) -> Option<&mut [u8]>;
fn flush(&mut self, range: impl RangeBounds<vk::DeviceSize>) -> VkResult<()>;
fn invalidate(&mut self, range: impl RangeBounds<vk::DeviceSize>) -> VkResult<()>;
}This means functions that operate on buffers can accept any buffer type:
fn upload_data<B: BufferLike>(buffer: &mut B, data: &[u8]) {
if let Some(slice) = buffer.as_slice_mut() {
slice[..data.len()].copy_from_slice(data);
}
}Vulkan objects implement the DebugObject trait, which lets you assign debug names
visible in validation layer messages, RenderDoc, and NSight:
use pumicite::debug::DebugObject;
// Builder pattern
let buffer = Buffer::new_private(allocator, 1024, 4, vk::BufferUsageFlags::STORAGE_BUFFER)
.unwrap()
.with_name(c"Scene Transform Buffer");
// Or set later
let mut image = Image::new_private(allocator, &info).unwrap();
image.set_name(c"GBuffer Albedo");Name your resources early. When a validation error fires, the message will include the debug name instead of a raw handle number.
GPU commands reference resources by handle. If a resource is freed on the CPU while the GPU is still using it, you get undefined behavior. Pumicite provides two mechanisms for managing this:
encoder.retain() -- Extends an Arc<T>'s lifetime to match the command buffer's
GPU execution. The retained object is stored in an arena inside the command buffer and
dropped when the command buffer completes.
let buffer = Arc::new(Buffer::new_private(allocator, size, 4, usage).unwrap());
let buffer_ref = encoder.retain(buffer.clone());
// buffer_ref is &'a Buffer, guaranteed alive until GPU finishesGPUMutex -- Wraps a resource with a timeline semaphore. When the GPUMutex is
dropped while GPU work is still pending, the inner resource is sent to a recycler thread
that waits for the semaphore before dropping it. encoder.lock() gives you a reference
tied to the command buffer's GPU execution lifetime.
let image = GPUMutex::new(image);
// ...
let image_ref = encoder.lock(&image, vk::PipelineStageFlags2::FRAGMENT_SHADER);
// image_ref is &'a Image -- safe to use in this command bufferUse retain for short-lived or readonly resources like staging buffers or PSOs. Use GPUMutex for
long-lived resources that must be synchronized across submissions or queues.
| Scenario | Strategy |
|---|---|
| GPU-generated data (render targets, scratch) |
Buffer::new_private / Image::new_private
|
| CPU writes once, GPU reads many times (meshes, textures) |
Buffer::new_upload + update_contents / Image::new_upload + update_contents_async
|
| Per-frame data (uniforms, draw params) | RingBuffer |
| GPU writes, CPU reads (readback, screenshots) | Buffer::new_dynamic |
| One-off staging transfers | Buffer::new_host |
In this chapter you learned:
-
Allocatorwraps VMA for efficient GPU memory management -
Four buffer strategies (
private,host,upload,dynamic) target different access patterns and GPU architectures -
BufferExt::update_contentstransparently handles host-visible vs staging-copy uploads -
RingBufferefficiently sub-allocates transient per-frame data -
ManagedBufferabstracts the integrated-vs-discrete GPU difference -
Image::new_privateandImage::new_uploadmirror the buffer allocation split -
FullImageViewbundles an image with its view for convenience -
DebugObjectnames resources for validation and debugging tools -
retainandGPUMutexensure resources outlive their GPU usage