Skip to content

Getting Started

0xbeefd1ed edited this page Feb 11, 2026 · 1 revision

Getting Started

This chapter walks you through creating your first Pumicite program from scratch -- initializing Vulkan, creating a device, recording commands, and submitting work to the GPU.

Prerequisites

You need Rust nightly and a Vulkan 1.2+ driver. On macOS, install MoltenVK (e.g. via the Vulkan SDK) or KosmicKrisp.

Verify your Vulkan installation:

# If you have the Vulkan SDK installed:
vulkaninfo --summary

Project Setup

Create a new project and add Pumicite:

cargo new my_renderer
cd my_renderer
# Cargo.toml
[package]
name = "my_renderer"
edition = "2024"

[dependencies]
pumicite = "0.1.0"

Since Pumicite requires nightly Rust, create a rust-toolchain.toml:

# rust-toolchain.toml
[toolchain]
channel = "nightly"

The Vulkan Object Hierarchy

Before writing code, it helps to understand the Vulkan objects you'll create and how they relate:

Instance                       Connection to the Vulkan loader
  └── PhysicalDevice           A GPU on the system
       └── Device              Logical connection to that GPU
            ├── Queue          Scheduler that submits work to GPU cores
            ├── CommandPool    Allocates command buffers for a queue family
            ├── Allocator      GPU memory management (VMA)
            └── Timeline       Orders command buffer execution via semaphores

Every GPU operation starts by recording commands into a CommandBuffer, then submitting that buffer to a Queue. The Timeline ensures command buffers execute in the order you schedule them.

Creating a Device

The simplest way to get started is Device::create_system_default(). It creates an Instance, picks the first physical device, and gives you a device with a single queue from family 0 (typically graphics + compute + transfer):

use pumicite::prelude::*;

fn main() {
    let (device, mut queue) = Device::create_system_default().unwrap();
    println!("Device created on: {:?}", device.physical_device().properties().device_name());
}

Under the hood, create_system_default does the following:

  1. Loads the Vulkan entry point via ash::Entry::load()
  2. Creates an Instance with Vulkan 1.2 and debug utils enabled
  3. Enumerates physical devices and picks the first one
  4. Creates a DeviceBuilder, which automatically enables VK_KHR_synchronization2 and VK_KHR_timeline_semaphore
  5. Enables a single queue from family 0
  6. Builds the logical device and spawns the recycler thread for deferred resource cleanup
For production code you'll want more control. Here's the equivalent using the builder API:
use std::sync::Arc;
use pumicite::{Instance, Device};
use pumicite::utils::Version;
use ash::vk;

fn main() {
    // 1. Load the Vulkan entry point
    let entry = Arc::new(unsafe { ash::Entry::load() }.unwrap());

    // 2. Create a Vulkan instance
    let mut instance_builder = Instance::builder(entry);
    instance_builder.info.api_version = Version::V1_2;
    instance_builder.info.application_name = std::borrow::Cow::Borrowed(c"My Renderer");
    instance_builder.enable_extension::<ash::ext::debug_utils::Meta>().ok();
    let instance = instance_builder.build().unwrap();

    // 3. Select a physical device
    let pdevice = instance.enumerate_physical_devices().unwrap()
        .find(|d| {
            d.properties().device_type == vk::PhysicalDeviceType::DISCRETE_GPU
        })
        .or_else(|| instance.enumerate_physical_devices().unwrap().next())
        .expect("No Vulkan device found");

    println!("Selected GPU: {:?}", pdevice.properties().device_name());

    // 4. Configure and build the logical device
    let mut builder = Device::builder(pdevice);

    // enable_extension and enable_feature for anything you need beyond the defaults
    // (synchronization2 and timeline_semaphore are enabled automatically)

    let graphics_queue_ref = builder
        .enable_queue_with_caps(vk::QueueFlags::GRAPHICS | vk::QueueFlags::COMPUTE, 1.0)
        .expect("No suitable queue family");

    let device = builder.build().unwrap();
    let queue = device.get_queue(graphics_queue_ref);
}

What Gets Enabled Automatically

When you call Device::builder(), the following are enabled for you:

Extension / Feature Why
VK_KHR_synchronization2 Required for vkCmdPipelineBarrier2 and VkPipelineStageFlags2
VK_KHR_timeline_semaphore Required for GPUMutex, Timeline, and cross-queue sync

On macOS, VK_KHR_portability_subset is also enabled automatically for MoltenVK compatibility.

Allocating GPU Memory

Most GPU resources (buffers and images) need memory. Pumicite uses the Vulkan Memory Allocator (VMA) under the hood:

let allocator = Allocator::new(device.clone()).unwrap();

The Allocator is reference-counted and thread-safe. Create it once and clone it wherever you need to allocate. You'll pass it to Buffer::new_* and Image::new_* methods.

The Command Recording Pipeline

Recording and submitting GPU work follows this sequence:

  alloc ──> schedule ──> begin ──> record ──> finish ──> submit ──> wait

Each step:

  1. CommandPool::alloc() -- Allocate a fresh command buffer from the pool.
  2. Timeline::schedule() -- Assign the command buffer a position in the timeline. This determines its execution order relative to other buffers on the same timeline.
  3. CommandPool::begin() -- Transition the buffer to the Recording state (vkBeginCommandBuffer).
  4. CommandPool::record() or record_future() -- Record commands through a CommandEncoder.
  5. CommandPool::finish() -- End recording (vkEndCommandBuffer).
  6. Queue::submit() -- Submit for GPU execution (vkQueueSubmit2). The timeline semaphore wait/signal is handled automatically.
  7. CommandBuffer::block_until_completion() -- Wait for the GPU to finish.

After waiting, you can either free the command buffer back to the pool, or reset it for reuse.

Your First Command Buffer

Let's put it all together. This program creates a 128x128 image on the GPU and clears it to a solid color:

use ash::vk;
use pumicite::{prelude::*, command::CommandPool, sync::Timeline, utils::future::yield_now};

fn main() {
    // --- Setup ---
    let (device, mut queue) = Device::create_system_default().unwrap();
    let allocator = Allocator::new(device.clone()).unwrap();
    let mut timeline = Timeline::new(device.clone()).unwrap();

    let mut command_pool = CommandPool::new(device.clone(), queue.family_index()).unwrap();
    let mut command_buffer = command_pool
        .alloc()
        .unwrap()
        .with_name(c"Clear Image Command Buffer");  // Debug name visible in GPU debuggers

    // --- Create a GPU image ---
    let image = Image::new_private(
        allocator,
        &vk::ImageCreateInfo {
            image_type: vk::ImageType::TYPE_2D,
            format: vk::Format::R8G8B8A8_UINT,
            extent: vk::Extent3D { width: 128, height: 128, depth: 1 },
            mip_levels: 1,
            array_layers: 1,
            samples: vk::SampleCountFlags::TYPE_1,
            tiling: vk::ImageTiling::LINEAR,
            usage: vk::ImageUsageFlags::TRANSFER_DST,
            initial_layout: vk::ImageLayout::UNDEFINED,
            ..Default::default()
        },
    ).unwrap();

    // --- Synchronization setup ---
    // Wrap the image in a GPUMutex for cross-queue / cross-submission safety.
    let image = GPUMutex::new(image);

    // Create a ResourceState to track the image's pipeline state.
    let mut resource_state = ResourceState::default();

    // --- Record commands ---
    timeline.schedule(&mut command_buffer);
    command_pool.begin(&mut command_buffer).unwrap();

    command_pool.record_future(&mut command_buffer, async |encoder| {
        // Lock the image for use in this command buffer.
        // The encoder will wait for any prior GPU work on this image to finish
        // before the CLEAR stage begins.
        let image = encoder.lock(&image, vk::PipelineStageFlags2::CLEAR);

        // Declare how we're about to use the image.
        // This accumulates the necessary pipeline barrier internally.
        encoder.use_image_resource(
            image,
            &mut resource_state,
            Access::CLEAR,
            vk::ImageLayout::TRANSFER_DST_OPTIMAL,
            0..1,  // mip levels
            0..1,  // array layers
            false, // don't discard previous content
        );

        // Yield to emit the accumulated barriers.
        yield_now().await;

        // Now the image is in TRANSFER_DST_OPTIMAL layout -- clear it.
        encoder.clear_color_image_with_layout(
            &*image,
            &vk::ClearColorValue { uint32: [0, 0, 1, 2] },
            vk::ImageLayout::TRANSFER_DST_OPTIMAL,
        );
    });

    // --- Submit and wait ---
    command_pool.finish(&mut command_buffer).unwrap();
    queue.submit(&mut command_buffer).unwrap();
    command_buffer.block_until_completion().unwrap();

    // --- Cleanup ---
    command_pool.free(command_buffer);
    println!("Done!");
}

What Just Happened

Let's trace the key operations:

GPUMutex::new(image) -- Wraps the image so that cross-queue access is tracked. Since this is the first use and there's no prior GPU work, locking it won't produce any semaphore waits.

encoder.lock(&image, ...) -- Returns a &'a Image reference tied to the command buffer's GPU execution lifetime. The encoder records that this command buffer now "owns" the image until it signals its timeline semaphore. If any other command buffer had previously locked the same GPUMutex, the encoder would automatically add a semaphore wait.

encoder.use_image_resource(...) -- Computes the minimal pipeline barrier needed to transition the image from its current state (unknown / default) to TRANSFER_DST_OPTIMAL with CLEAR access. The barrier is accumulated internally but not yet recorded.

yield_now().await -- Tells the executor "I'm done accumulating barriers for now." The executor calls encoder.emit_barriers(), which records a single vkCmdPipelineBarrier2 with all accumulated transitions.

queue.submit(&mut command_buffer) -- Builds the vkQueueSubmit2 call with:

  • A wait on the timeline semaphore at timestamp - 1 (previous command buffer on this timeline)
  • A signal of the timeline semaphore at timestamp (our command buffer)
  • Any additional semaphore waits added by encoder.lock()

Synchronous vs. Async Recording

The example above used record_future with async/await syntax. You can also record synchronously with record:

command_pool.record(&mut command_buffer, |encoder| {
    // Lock, track state, emit barriers, record commands...
    let image = encoder.lock(&image, vk::PipelineStageFlags2::CLEAR);
    encoder.use_image_resource(
        image, &mut resource_state, Access::CLEAR,
        vk::ImageLayout::TRANSFER_DST_OPTIMAL, 0..1, 0..1, false,
    );
    encoder.emit_barriers();  // Must call explicitly in synchronous mode

    encoder.clear_color_image_with_layout(
        &*image,
        &vk::ClearColorValue { uint32: [0, 0, 1, 2] },
        vk::ImageLayout::TRANSFER_DST_OPTIMAL,
    );
});

The difference:

record (synchronous) record_future (async)
Barrier emission You call emit_barriers() manually yield_now().await triggers it
Return value Direct return from closure Returns GPUMutex<T> locked until GPU finishes
Barrier merging No Yes

For most cases, synchronous recording is simpler and sufficient. The async path becomes valuable when you need barrier merging and optimizations across multiple futures that record command buffers.

The Timeline

A Timeline serializes command buffer execution using a timeline semaphore with a monotonically increasing counter. Each call to timeline.schedule(&mut cb) assigns the next timestamp:

timeline.schedule(&mut cb1);  // timestamp = 1, waits for 0, signals 1
timeline.schedule(&mut cb2);  // timestamp = 2, waits for 1, signals 2
timeline.schedule(&mut cb3);  // timestamp = 3, waits for 2, signals 3

Command buffers on the same timeline are guaranteed to execute in order, regardless of submission order.

If you're asking yourself: isn't that just a Queue? The answer is YES. We call it a "timeline" to distinguish from vkQueue, which strictly speaking isn't a queue. Submissions made to the same vkQueue are guaranteed to start in order, but that guarantee is meaningless because they're allowed to finish out-of-order. A better name for vkQueue might be vkCommandScheduler.

You can create multiple timelines for independent workstreams:

let mut render_timeline = Timeline::new(device.clone()).unwrap();
let mut compute_timeline = Timeline::new(device.clone()).unwrap();

// These can execute independently -- different timeline semaphores.
render_timeline.schedule(&mut render_cb);
compute_timeline.schedule(&mut compute_cb);

Cross-timeline dependencies are handled by GPUMutex -- when you lock a resource that was last used on a different timeline, the semaphore wait is added automatically.

Resource Lifetimes with retain

GPU commands reference resources by their Vulkan handles. If a resource is freed on the CPU before the GPU finishes using it, you get undefined behavior. Pumicite solves this with CommandEncoder::retain:

use std::sync::Arc;

let buffer = Arc::new(Buffer::new_private(
    allocator, 1024, 4, vk::BufferUsageFlags::STORAGE_BUFFER,
).unwrap());

command_pool.record(&mut command_buffer, |encoder| {
    // Extend the buffer's lifetime to match the command buffer's GPU execution
    let buffer = encoder.retain(buffer.clone());
    // `buffer` is now &'a Buffer -- guaranteed alive until GPU finishes
});

The retained object is stored in an arena allocator inside the command buffer. When the command buffer completes (via block_until_completion or try_complete), all retained objects are dropped.

encoder.lock() on a GPUMutex gives you the same lifetime guarantee without needing a separate retain call, because the GPUMutex defers dropping its inner resource until the semaphore signals.

Non-Blocking Completion

Instead of blocking on block_until_completion, you can poll:

queue.submit(&mut command_buffer).unwrap();

// Do other CPU work...
while !command_buffer.try_complete() {
    // Still executing on the GPU
    std::thread::sleep(std::time::Duration::from_millis(1));
}
// Command buffer is now in the Invalid state -- ready to free or reset

For async Rust, there's also block_async_until_completion:

command_buffer.block_async_until_completion().await.unwrap();

Enabling Extensions and Features

Many Vulkan features require explicit opt-in. Use the builder API:

let mut builder = Device::builder(pdevice);

// Enable an extension
builder.enable_extension::<ash::khr::dynamic_rendering::Meta>().unwrap();

// Enable a feature (requires its extension to be enabled first)
builder.enable_feature::<vk::PhysicalDeviceDynamicRenderingFeatures>(|f| {
    &mut f.dynamic_rendering
}).unwrap();

// Enable buffer device addresses for bindless
builder.enable_feature::<vk::PhysicalDeviceBufferDeviceAddressFeatures>(|f| {
    &mut f.buffer_device_address
}).unwrap();

Extensions promoted to Vulkan core (like synchronization2 in Vulkan 1.3) are handled automatically -- if your instance API version is high enough, the extension enable call is a no-op.

Validation Layers

During development, always enable the Vulkan validation layer. It catches API misuse and synchronization errors. You can do that with vkconfig-egui which is typically installed with your Vulkan SDK, but you can also enable it manually:

let mut instance_builder = Instance::builder(entry);
instance_builder.enable_layer(c"VK_LAYER_KHRONOS_validation");

The syncval feature of the validation layer is particularly important for Pumicite's "Trust but Verify" philosophy. It validates that your pipeline barriers and image layout transitions are correct at runtime.

Enable it by setting the VK_LAYER_ENABLES environment variable:

VK_LAYER_ENABLES=VK_VALIDATION_FEATURE_ENABLE_SYNCHRONIZATION_VALIDATION_EXT \
    cargo run --example basics

Summary

In this chapter you learned:

  • Device creation with create_system_default() or the DeviceBuilder API
  • The command recording pipeline: alloc, schedule, begin, record, finish, submit, wait
  • GPUMutex wraps resources for cross-queue safety; encoder.lock() ties them to a command buffer's lifetime
  • ResourceState tracks how a resource was last used; use_image_resource computes minimal barriers
  • Timeline serializes execution order with timeline semaphores
  • retain extends object lifetimes to match GPU execution
  • Validation layers and syncval as your safety net

Next: Chapter 3: Bevy Integration

Clone this wiki locally