-
Notifications
You must be signed in to change notification settings - Fork 1
Getting Started
This chapter walks you through creating your first Pumicite program from scratch -- initializing Vulkan, creating a device, recording commands, and submitting work to the GPU.
You need Rust nightly and a Vulkan 1.2+ driver. On macOS, install MoltenVK (e.g. via the Vulkan SDK) or KosmicKrisp.
Verify your Vulkan installation:
# If you have the Vulkan SDK installed:
vulkaninfo --summaryCreate a new project and add Pumicite:
cargo new my_renderer
cd my_renderer# Cargo.toml
[package]
name = "my_renderer"
edition = "2024"
[dependencies]
pumicite = "0.1.0"Since Pumicite requires nightly Rust, create a rust-toolchain.toml:
# rust-toolchain.toml
[toolchain]
channel = "nightly"Before writing code, it helps to understand the Vulkan objects you'll create and how they relate:
Instance Connection to the Vulkan loader
└── PhysicalDevice A GPU on the system
└── Device Logical connection to that GPU
├── Queue Scheduler that submits work to GPU cores
├── CommandPool Allocates command buffers for a queue family
├── Allocator GPU memory management (VMA)
└── Timeline Orders command buffer execution via semaphores
Every GPU operation starts by recording commands into a CommandBuffer, then submitting that buffer to a Queue. The Timeline ensures command buffers execute in the order you schedule them.
The simplest way to get started is Device::create_system_default(). It creates an
Instance, picks the first physical device, and gives you a device with a single
queue from family 0 (typically graphics + compute + transfer):
use pumicite::prelude::*;
fn main() {
let (device, mut queue) = Device::create_system_default().unwrap();
println!("Device created on: {:?}", device.physical_device().properties().device_name());
}Under the hood, create_system_default does the following:
- Loads the Vulkan entry point via
ash::Entry::load() - Creates an
Instancewith Vulkan 1.2 and debug utils enabled - Enumerates physical devices and picks the first one
- Creates a
DeviceBuilder, which automatically enablesVK_KHR_synchronization2andVK_KHR_timeline_semaphore - Enables a single queue from family 0
- Builds the logical device and spawns the recycler thread for deferred resource cleanup
For production code you'll want more control. Here's the equivalent using the builder API:
use std::sync::Arc;
use pumicite::{Instance, Device};
use pumicite::utils::Version;
use ash::vk;
fn main() {
// 1. Load the Vulkan entry point
let entry = Arc::new(unsafe { ash::Entry::load() }.unwrap());
// 2. Create a Vulkan instance
let mut instance_builder = Instance::builder(entry);
instance_builder.info.api_version = Version::V1_2;
instance_builder.info.application_name = std::borrow::Cow::Borrowed(c"My Renderer");
instance_builder.enable_extension::<ash::ext::debug_utils::Meta>().ok();
let instance = instance_builder.build().unwrap();
// 3. Select a physical device
let pdevice = instance.enumerate_physical_devices().unwrap()
.find(|d| {
d.properties().device_type == vk::PhysicalDeviceType::DISCRETE_GPU
})
.or_else(|| instance.enumerate_physical_devices().unwrap().next())
.expect("No Vulkan device found");
println!("Selected GPU: {:?}", pdevice.properties().device_name());
// 4. Configure and build the logical device
let mut builder = Device::builder(pdevice);
// enable_extension and enable_feature for anything you need beyond the defaults
// (synchronization2 and timeline_semaphore are enabled automatically)
let graphics_queue_ref = builder
.enable_queue_with_caps(vk::QueueFlags::GRAPHICS | vk::QueueFlags::COMPUTE, 1.0)
.expect("No suitable queue family");
let device = builder.build().unwrap();
let queue = device.get_queue(graphics_queue_ref);
}When you call Device::builder(), the following are enabled for you:
| Extension / Feature | Why |
|---|---|
VK_KHR_synchronization2 |
Required for vkCmdPipelineBarrier2 and VkPipelineStageFlags2
|
VK_KHR_timeline_semaphore |
Required for GPUMutex, Timeline, and cross-queue sync |
On macOS, VK_KHR_portability_subset is also enabled automatically for MoltenVK
compatibility.
Most GPU resources (buffers and images) need memory. Pumicite uses the Vulkan Memory Allocator (VMA) under the hood:
let allocator = Allocator::new(device.clone()).unwrap();The Allocator is reference-counted and thread-safe. Create it once and clone it
wherever you need to allocate. You'll pass it to Buffer::new_* and
Image::new_* methods.
Recording and submitting GPU work follows this sequence:
alloc ──> schedule ──> begin ──> record ──> finish ──> submit ──> wait
Each step:
-
CommandPool::alloc()-- Allocate a fresh command buffer from the pool. -
Timeline::schedule()-- Assign the command buffer a position in the timeline. This determines its execution order relative to other buffers on the same timeline. -
CommandPool::begin()-- Transition the buffer to the Recording state (vkBeginCommandBuffer). -
CommandPool::record()orrecord_future()-- Record commands through aCommandEncoder. -
CommandPool::finish()-- End recording (vkEndCommandBuffer). -
Queue::submit()-- Submit for GPU execution (vkQueueSubmit2). The timeline semaphore wait/signal is handled automatically. -
CommandBuffer::block_until_completion()-- Wait for the GPU to finish.
After waiting, you can either free the command buffer back to the pool, or reset
it for reuse.
Let's put it all together. This program creates a 128x128 image on the GPU and clears it to a solid color:
use ash::vk;
use pumicite::{prelude::*, command::CommandPool, sync::Timeline, utils::future::yield_now};
fn main() {
// --- Setup ---
let (device, mut queue) = Device::create_system_default().unwrap();
let allocator = Allocator::new(device.clone()).unwrap();
let mut timeline = Timeline::new(device.clone()).unwrap();
let mut command_pool = CommandPool::new(device.clone(), queue.family_index()).unwrap();
let mut command_buffer = command_pool
.alloc()
.unwrap()
.with_name(c"Clear Image Command Buffer"); // Debug name visible in GPU debuggers
// --- Create a GPU image ---
let image = Image::new_private(
allocator,
&vk::ImageCreateInfo {
image_type: vk::ImageType::TYPE_2D,
format: vk::Format::R8G8B8A8_UINT,
extent: vk::Extent3D { width: 128, height: 128, depth: 1 },
mip_levels: 1,
array_layers: 1,
samples: vk::SampleCountFlags::TYPE_1,
tiling: vk::ImageTiling::LINEAR,
usage: vk::ImageUsageFlags::TRANSFER_DST,
initial_layout: vk::ImageLayout::UNDEFINED,
..Default::default()
},
).unwrap();
// --- Synchronization setup ---
// Wrap the image in a GPUMutex for cross-queue / cross-submission safety.
let image = GPUMutex::new(image);
// Create a ResourceState to track the image's pipeline state.
let mut resource_state = ResourceState::default();
// --- Record commands ---
timeline.schedule(&mut command_buffer);
command_pool.begin(&mut command_buffer).unwrap();
command_pool.record_future(&mut command_buffer, async |encoder| {
// Lock the image for use in this command buffer.
// The encoder will wait for any prior GPU work on this image to finish
// before the CLEAR stage begins.
let image = encoder.lock(&image, vk::PipelineStageFlags2::CLEAR);
// Declare how we're about to use the image.
// This accumulates the necessary pipeline barrier internally.
encoder.use_image_resource(
image,
&mut resource_state,
Access::CLEAR,
vk::ImageLayout::TRANSFER_DST_OPTIMAL,
0..1, // mip levels
0..1, // array layers
false, // don't discard previous content
);
// Yield to emit the accumulated barriers.
yield_now().await;
// Now the image is in TRANSFER_DST_OPTIMAL layout -- clear it.
encoder.clear_color_image_with_layout(
&*image,
&vk::ClearColorValue { uint32: [0, 0, 1, 2] },
vk::ImageLayout::TRANSFER_DST_OPTIMAL,
);
});
// --- Submit and wait ---
command_pool.finish(&mut command_buffer).unwrap();
queue.submit(&mut command_buffer).unwrap();
command_buffer.block_until_completion().unwrap();
// --- Cleanup ---
command_pool.free(command_buffer);
println!("Done!");
}Let's trace the key operations:
GPUMutex::new(image) -- Wraps the image so that cross-queue access is tracked.
Since this is the first use and there's no prior GPU work, locking it won't produce
any semaphore waits.
encoder.lock(&image, ...) -- Returns a &'a Image reference tied to the
command buffer's GPU execution lifetime. The encoder records that this command buffer
now "owns" the image until it signals its timeline semaphore. If any other command
buffer had previously locked the same GPUMutex, the encoder would automatically add
a semaphore wait.
encoder.use_image_resource(...) -- Computes the minimal pipeline barrier needed
to transition the image from its current state (unknown / default) to
TRANSFER_DST_OPTIMAL with CLEAR access. The barrier is accumulated internally
but not yet recorded.
yield_now().await -- Tells the executor "I'm done accumulating barriers for
now." The executor calls encoder.emit_barriers(), which records a single
vkCmdPipelineBarrier2 with all accumulated transitions.
queue.submit(&mut command_buffer) -- Builds the vkQueueSubmit2 call with:
- A wait on the timeline semaphore at
timestamp - 1(previous command buffer on this timeline) - A signal of the timeline semaphore at
timestamp(our command buffer) - Any additional semaphore waits added by
encoder.lock()
The example above used record_future with async/await syntax. You can also record
synchronously with record:
command_pool.record(&mut command_buffer, |encoder| {
// Lock, track state, emit barriers, record commands...
let image = encoder.lock(&image, vk::PipelineStageFlags2::CLEAR);
encoder.use_image_resource(
image, &mut resource_state, Access::CLEAR,
vk::ImageLayout::TRANSFER_DST_OPTIMAL, 0..1, 0..1, false,
);
encoder.emit_barriers(); // Must call explicitly in synchronous mode
encoder.clear_color_image_with_layout(
&*image,
&vk::ClearColorValue { uint32: [0, 0, 1, 2] },
vk::ImageLayout::TRANSFER_DST_OPTIMAL,
);
});The difference:
record (synchronous) |
record_future (async) |
|
|---|---|---|
| Barrier emission | You call emit_barriers() manually |
yield_now().await triggers it |
| Return value | Direct return from closure | Returns GPUMutex<T> locked until GPU finishes |
| Barrier merging | No | Yes |
For most cases, synchronous recording is simpler and sufficient. The async path becomes valuable when you need barrier merging and optimizations across multiple futures that record command buffers.
A Timeline serializes command buffer execution using a timeline semaphore with a
monotonically increasing counter. Each call to timeline.schedule(&mut cb) assigns
the next timestamp:
timeline.schedule(&mut cb1); // timestamp = 1, waits for 0, signals 1
timeline.schedule(&mut cb2); // timestamp = 2, waits for 1, signals 2
timeline.schedule(&mut cb3); // timestamp = 3, waits for 2, signals 3
Command buffers on the same timeline are guaranteed to execute in order, regardless of submission order.
If you're asking yourself: isn't that just a Queue? The answer is YES. We call it a "timeline"
to distinguish from vkQueue, which strictly speaking isn't a queue. Submissions made to the same
vkQueue are guaranteed to start in order, but that guarantee is meaningless because they're allowed
to finish out-of-order. A better name for vkQueue might be vkCommandScheduler.
You can create multiple timelines for independent workstreams:
let mut render_timeline = Timeline::new(device.clone()).unwrap();
let mut compute_timeline = Timeline::new(device.clone()).unwrap();
// These can execute independently -- different timeline semaphores.
render_timeline.schedule(&mut render_cb);
compute_timeline.schedule(&mut compute_cb);Cross-timeline dependencies are handled by GPUMutex -- when you lock a resource
that was last used on a different timeline, the semaphore wait is added
automatically.
GPU commands reference resources by their Vulkan handles. If a resource is freed on
the CPU before the GPU finishes using it, you get undefined behavior. Pumicite
solves this with CommandEncoder::retain:
use std::sync::Arc;
let buffer = Arc::new(Buffer::new_private(
allocator, 1024, 4, vk::BufferUsageFlags::STORAGE_BUFFER,
).unwrap());
command_pool.record(&mut command_buffer, |encoder| {
// Extend the buffer's lifetime to match the command buffer's GPU execution
let buffer = encoder.retain(buffer.clone());
// `buffer` is now &'a Buffer -- guaranteed alive until GPU finishes
});The retained object is stored in an arena allocator inside the command buffer.
When the command buffer completes (via block_until_completion or try_complete),
all retained objects are dropped.
encoder.lock() on a GPUMutex gives you the same lifetime guarantee without
needing a separate retain call, because the GPUMutex defers dropping its inner
resource until the semaphore signals.
Instead of blocking on block_until_completion, you can poll:
queue.submit(&mut command_buffer).unwrap();
// Do other CPU work...
while !command_buffer.try_complete() {
// Still executing on the GPU
std::thread::sleep(std::time::Duration::from_millis(1));
}
// Command buffer is now in the Invalid state -- ready to free or resetFor async Rust, there's also block_async_until_completion:
command_buffer.block_async_until_completion().await.unwrap();Many Vulkan features require explicit opt-in. Use the builder API:
let mut builder = Device::builder(pdevice);
// Enable an extension
builder.enable_extension::<ash::khr::dynamic_rendering::Meta>().unwrap();
// Enable a feature (requires its extension to be enabled first)
builder.enable_feature::<vk::PhysicalDeviceDynamicRenderingFeatures>(|f| {
&mut f.dynamic_rendering
}).unwrap();
// Enable buffer device addresses for bindless
builder.enable_feature::<vk::PhysicalDeviceBufferDeviceAddressFeatures>(|f| {
&mut f.buffer_device_address
}).unwrap();Extensions promoted to Vulkan core (like synchronization2 in Vulkan 1.3) are
handled automatically -- if your instance API version is high enough, the extension
enable call is a no-op.
During development, always enable the Vulkan validation layer. It catches API misuse
and synchronization errors. You can do that with vkconfig-egui which is typically
installed with your Vulkan SDK, but you can also enable it manually:
let mut instance_builder = Instance::builder(entry);
instance_builder.enable_layer(c"VK_LAYER_KHRONOS_validation");The syncval feature of the validation layer is particularly important for Pumicite's
"Trust but Verify" philosophy. It validates that your pipeline barriers and image
layout transitions are correct at runtime.
Enable it by setting the VK_LAYER_ENABLES environment variable:
VK_LAYER_ENABLES=VK_VALIDATION_FEATURE_ENABLE_SYNCHRONIZATION_VALIDATION_EXT \
cargo run --example basicsIn this chapter you learned:
-
Device creation with
create_system_default()or theDeviceBuilderAPI - The command recording pipeline: alloc, schedule, begin, record, finish, submit, wait
-
GPUMutexwraps resources for cross-queue safety;encoder.lock()ties them to a command buffer's lifetime -
ResourceStatetracks how a resource was last used;use_image_resourcecomputes minimal barriers -
Timelineserializes execution order with timeline semaphores -
retainextends object lifetimes to match GPU execution -
Validation layers and
syncvalas your safety net