Skip to content

openvmm: Add VFIO PCI device assignment#3248

Merged
will-j-wright merged 22 commits into
microsoft:mainfrom
will-j-wright:vfio-pr
Apr 16, 2026
Merged

openvmm: Add VFIO PCI device assignment#3248
will-j-wright merged 22 commits into
microsoft:mainfrom
will-j-wright:vfio-pr

Conversation

@will-j-wright
Copy link
Copy Markdown
Contributor

@will-j-wright will-j-wright commented Apr 10, 2026

Add support for assigning physical PCI devices to OpenVMM guests on Linux using VFIO. A device bound to vfio-pci on the host can be passed through to the guest via a new --vfio CLI flag, where it appears as a PCIe endpoint with full functionality.

Supported

  • Config space: reads/writes proxied to the physical device via VFIO. BAR sizing is emulated locally so the guest can probe without hitting hardware on every access.
  • BAR MMIO: proxied to the physical device via pread/pwrite on the VFIO device fd.
  • MSI-X interrupts: table and PBA are emulated in software via pci_core::MsixEmulator. Interrupt delivery uses irqfd — VFIO signals an eventfd, the kernel injects the MSI directly into the guest with no VMM exit.
  • DMA: guest RAM is identity-mapped into the IOMMU (IOVA == GPA) via VFIO_IOMMU_MAP_DMA. IOMMU Type1v2 is required.

Architecture

VFIO devices use the standard PciDeviceHandleKind resource resolver pattern (same as NVMe, virtio, GDMA). The --vfio CLI flag produces a PcieDeviceConfig with a VfioDeviceHandle resource, which flows through build_pcie_device() like any other PCIe device. The VfioDeviceResolver handles all VFIO-specific setup (container/group/device open, IOMMU configuration, DMA mapping) inside resolve(). No special-case code in dispatch.rs.

New crates

  • vfio_assigned_deviceChipsetDevice implementation (config space proxy, BAR MMIO dispatch, MSI-X emulation with irqfd routing, VFIO container/group lifetime management) and VfioDeviceResolver implementing AsyncResolveResource<PciDeviceHandleKind, VfioDeviceHandle>.
  • vfio_assigned_device_resourcesVfioDeviceHandle { pci_id: String } implementing ResourceId<PciDeviceHandleKind>.

Key changes

  • vfio_sys — Added Container::map_dma() / unmap_dma() for IOMMU DMA mapping, Device::unmap_msix() for teardown.
  • pci_core/msix.rs — New MsixRoute trait for kernel-mediated per-vector interrupt delivery. MsixEmulator gains set_routes() / clear_routes() — routes are automatically called on mask/unmask/addr-data changes and PBA reads. Moves irqfd routing logic out of individual devices into the shared emulator for reuse by vhost-user and other passthrough devices. Includes unit tests with a mock MsixRoute.
  • pci_resourcesResolvePciDeviceHandleParams extended with optional irqfd field for device passthrough resolvers.
  • vmm_core/device_builder.rsbuild_pcie_device() and resolve_and_add_pci_device() updated to pass through irqfd and mem_layout.
  • openvmm_entry — New --vfio <port>:<pci_bdf> CLI flag with BDF validation. VFIO devices merged into the standard pcie_devices config list.
  • openvmm_core/partition.rsHvlitePartition::irqfd() exposes the irqfd trait for passthrough devices.
  • Guide/vfio.md — User guide covering prerequisites, device binding, CLI usage, and troubleshooting.

Current limitations

  • No save/restore for VMs with VFIO devices
  • Hot-plug requires wiring mem_layout into the AddPcieDevice RPC path (not yet done)
  • Linux only

Testing

Tested end-to-end on mshv with NVMe passthrough (config space enumeration, DMA reads/writes, MSI-X interrupts). Tested on KVM with AHCI passthrough (config space + MMIO).

will-j-wright and others added 7 commits April 10, 2026 18:09
Add a new crate that implements ChipsetDevice + PciConfigSpace for a
physical PCI device accessed via Linux VFIO. Config space reads/writes
are proxied to the physical device through the VFIO config region file
descriptor. BARs are cached locally so the guest can probe sizes via
the standard write-all-ones mechanism.

MMIO returns all-ones for now (config space enumeration only).
Save/restore is not supported.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a --vfio <port>:<pci_bdf> CLI argument (Linux only) that assigns a
host PCI device to the guest via VFIO. The device must be bound to
vfio-pci on the host before starting the VM.

In dispatch.rs, VFIO devices are opened via vfio_sys (container, group,
device), then wrapped in a VfioAssignedPciDevice and attached to the
specified PCIe root port. The guest sees the device via ECAM and can
read/write its config space.

Example usage:
  openvmm --hypervisor mshv \
    --pcie-root-complex rc0 \
    --pcie-root-port rc0:rp0 \
    --vfio rp0:3f7a:00:00.0 \
    --kernel /var/images/vmlinux \
    --cmdline "console=ttyS0" \
    --com1 console --memory 256M --processors 2

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a user-facing guide for assigning physical PCI devices to OpenVMM
guests via Linux VFIO. Covers the full workflow: identifying devices,
enabling unsafe interrupts for Hyper-V IOMMU, binding to vfio-pci,
launching OpenVMM with --vfio, and verifying in the guest.

Also links the previously empty 'Direct Assigned' reference entry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- BAR MMIO: proxy reads/writes to physical device via pread/pwrite on
  the VFIO device fd (replaces all-ones stub)
- MSI-X discovery: walk PCI capability list to find MSI-X, extract
  table count, BIR, offsets
- MSI-X emulation: wire pci_core's MsixEmulator for software MSI-X
  table and PBA handling (same emulator used by NVMe/virtio)
- Split-BAR MMIO dispatch: route MSI-X table/PBA accesses to the
  emulator, proxy everything else to hardware
- Config space: intercept MSI-X capability writes to track
  enable/disable state in the emulator
- dispatch.rs: query BAR region info, create MsiConnection, connect
  SignalMsi for interrupt delivery

Interrupts do not yet fire (no eventfd wiring to VFIO). The guest can
enumerate the device, probe BARs, configure MSI-X vectors, and access
device registers. Interrupt delivery requires the planned MsixEmulator
eventfd extension.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add MSI-X eventfd infrastructure (MsixEmulator, event_or_proxy,
map_msix/unmap_msix) and chipset MMIO registration so BAR accesses
are routed to the device.

MSI-X emulation is currently disabled at runtime because Hyper-V's
IOMMU lacks interrupt remapping: in the direct-attach model, device
MSI writes are DMA transactions through the shared IOMMU page table
and go directly to the guest's LAPIC, bypassing L1's VFIO interrupt
handler entirely. VFIO's eventfd-based delivery cannot work in this
configuration. The device operates via completion polling.

The full MSI-X pipeline is implemented and ready to enable once
interrupt remapping is available or a kernel-level MSI passthrough
mechanism is added.

Changes:
- VfioAssignedPciDevice owns vfio_sys::Device directly for map_msix
- VmTaskDriver stored for EventProxy async tasks
- MSI-X discovery, MsixEmulator wiring, eventfd mapping all
  implemented but disabled pending interrupt remapping support
- Chipset MMIO registration: BAR regions registered via
  services.register_mmio() and mapped/unmapped on MMIO enable/disable
- dispatch.rs: passes Device ownership, driver, BAR region info,
  MsiConnection, and MMIO controls to the device

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the event_or_proxy workaround with irqfd-backed interrupt
delivery. Instead of bridging VFIO eventfds through MsixEmulator's
MsiInterrupt → MsiTarget → SignalMsi path (which requires L1 userspace
for every interrupt), use irqfd to let the kernel inject MSIs directly
into the guest when VFIO signals the eventfd.

Changes:
- msix_enable: creates events + irqfd routes via Partition::irqfd(),
  passes same events to VFIO map_msix
- msix_disable: drops routes (auto-cleanup via IrqFdRoute::drop)
- MSI-X table writes: after updating MsixEmulator, reads back the
  affected vector's addr/data and calls route.set_msi() to update
  kernel GSI routing
- Removed VmTaskDriver and EventProxy dependencies
- Added irqfd parameter to VfioAssignedPciDeviceConfig
- Added irqfd() to HvlitePartition trait

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@will-j-wright will-j-wright requested a review from a team as a code owner April 10, 2026 22:53
Copilot AI review requested due to automatic review settings April 10, 2026 22:53
@github-actions github-actions Bot added Guide unsafe Related to unsafe code labels Apr 10, 2026
@github-actions
Copy link
Copy Markdown

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

@will-j-wright will-j-wright changed the title Add VFIO PCI device assignment for OpenVMM Add VFIO PCI device assignment Apr 10, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Linux VFIO-based PCI passthrough support to OpenVMM, enabling assignment of host PCI devices to guests via a new --vfio CLI flag and a new ChipsetDevice implementation for config/BAR/MMIO/MSI-X handling.

Changes:

  • Add VFIO container DMA map/unmap support and MSI-X teardown support in vfio_sys.
  • Introduce vfio_assigned_device crate implementing a VFIO-backed PCIe endpoint with MSI-X table/PBA emulation and irqfd delivery.
  • Wire --vfio through openvmm_entry → config/manifest → openvmm_core device construction, and add user documentation.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
vm/devices/user_driver/vfio_sys/src/lib.rs Adds VFIO DMA map/unmap ioctls and MSI-X vector count handling improvements.
vm/devices/pci/vfio_assigned_device/src/lib.rs New VFIO-backed ChipsetDevice for PCI config/BAR proxying and MSI-X emulation with irqfd routing.
vm/devices/pci/vfio_assigned_device/Cargo.toml Defines the new vfio_assigned_device crate and dependencies.
vm/devices/pci/pci_core/src/capabilities/msix.rs Exposes MsixEmulator::set_pending_bit for passthrough/PBA support.
petri/src/vm/openvmm/construct.rs Adds Linux-only vfio_devices field initialization in OpenVMM config construction.
openvmm/openvmm_entry/src/ttrpc/mod.rs Adds Linux-only vfio_devices field initialization for ttrpc-based configs.
openvmm/openvmm_entry/src/lib.rs Plumbs --vfio CLI values into the VM config on Linux.
openvmm/openvmm_entry/src/cli_args.rs Introduces --vfio flag and parsing for <port_name>:<pci_bdf> on Linux.
openvmm/openvmm_defs/src/config.rs Adds Linux-only vfio_devices to config and defines VfioDeviceConfig.
openvmm/openvmm_core/src/worker/dispatch.rs Implements VFIO device bring-up (container/group/device open, IOMMU selection, DMA mapping, device instantiation).
openvmm/openvmm_core/src/partition.rs Extends HvlitePartition with an irqfd() accessor to support irqfd routing.
openvmm/openvmm_core/Cargo.toml Adds Linux-only dependencies for VFIO device setup and the new assigned-device crate.
Guide/src/user_guide/openvmm/vfio.md Adds a user guide page describing prerequisites, binding, usage, and troubleshooting for --vfio.
Guide/src/SUMMARY.md Adds the new VFIO user guide page to the book summary.
Cargo.toml Registers vfio_assigned_device in workspace dependencies.
Cargo.lock Records the new crate and dependency edges in the lockfile.

Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs
Comment thread openvmm/openvmm_entry/src/cli_args.rs
Comment thread openvmm/openvmm_core/src/worker/dispatch.rs Outdated
Comment thread Guide/src/user_guide/openvmm/vfio.md
Comment thread openvmm/openvmm_entry/src/cli_args.rs
@will-j-wright will-j-wright changed the title Add VFIO PCI device assignment openvmm: Add VFIO PCI device assignment Apr 13, 2026
Comment thread openvmm/openvmm_core/src/worker/dispatch.rs Outdated
Comment thread openvmm/openvmm_defs/src/config.rs Outdated
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs
Copilot AI review requested due to automatic review settings April 13, 2026 20:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 7 comments.

Comment thread vm/devices/pci/pci_core/src/capabilities/msix.rs Outdated
Comment thread vm/devices/user_driver/vfio_sys/src/lib.rs
Comment thread openvmm/openvmm_core/src/worker/dispatch.rs Outdated
Comment thread openvmm/openvmm_core/src/worker/dispatch.rs Outdated
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs Outdated
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs Outdated
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 26 changed files in this pull request and generated 6 comments.

Comment thread vm/devices/user_driver/vfio_sys/src/lib.rs
Comment thread vm/devices/pci/pci_core/src/capabilities/msix.rs
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs Outdated
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs Outdated
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs Outdated
Comment thread vm/devices/pci/vfio_assigned_device_resources/src/lib.rs
Comment thread vm/devices/pci/vfio_assigned_device/src/resolver.rs
Copilot AI review requested due to automatic review settings April 16, 2026 00:20
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs Outdated
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs
}
}

fn read_config_u32(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we already have something like this in our vfio crate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in user_driver/vfio.rs which we can't use. I could factor out the duplicated code to vfio_sys, if you want.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's natural to do so

Comment on lines +587 to +590
// bits). The table offset (DWORD 1) and PBA offset (DWORD 2)
// must come from hardware — the emulator uses different offsets
// than the physical device, and the MMIO handler translates
// based on the physical offsets.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fix this, so that the emulator just mirrors the hardware values?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I would argue for it to be in a separate PR as it'll touch all the MSI emulator callers

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 26 changed files in this pull request and generated 4 comments.

Comment thread vm/devices/user_driver/vfio_sys/src/lib.rs
Comment thread vm/devices/user_driver/vfio_sys/src/lib.rs
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs Outdated
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs
vfio: derive BAR masks from VFIO region_info instead of probing hardware
Copilot AI review requested due to automatic review settings April 16, 2026 00:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 26 changed files in this pull request and generated 4 comments.

Comment thread vm/devices/user_driver/vfio_sys/src/lib.rs
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs
Comment thread openvmm/openvmm_entry/src/cli_args.rs
Comment thread vm/devices/pci/pci_resources/src/lib.rs
Comment thread vm/devices/pci/vfio_assigned_device/src/lib.rs
Comment thread vm/devices/pci/pci_core/src/capabilities/msix.rs Outdated
@jstarks jstarks requested a review from a team April 16, 2026 15:25
@will-j-wright will-j-wright merged commit 94bb8cf into microsoft:main Apr 16, 2026
60 of 62 checks passed
gurasinghMS pushed a commit to gurasinghMS/openvmm that referenced this pull request Apr 17, 2026
Add support for assigning physical PCI devices to OpenVMM guests on
Linux using VFIO. A device bound to `vfio-pci` on the host can be passed
through to the guest via a new `--vfio` CLI flag, where it appears as a
PCIe endpoint with full functionality.

### Supported

- **Config space**: reads/writes proxied to the physical device via
VFIO. BAR sizing is emulated locally so the guest can probe without
hitting hardware on every access.
- **BAR MMIO**: proxied to the physical device via pread/pwrite on the
VFIO device fd.
- **MSI-X interrupts**: table and PBA are emulated in software via
`pci_core::MsixEmulator`. Interrupt delivery uses irqfd — VFIO signals
an eventfd, the kernel injects the MSI directly into the guest with no
VMM exit.
- **DMA**: guest RAM is identity-mapped into the IOMMU (IOVA == GPA) via
`VFIO_IOMMU_MAP_DMA`. IOMMU Type1v2 is required.

### Architecture

VFIO devices use the standard `PciDeviceHandleKind` resource resolver
pattern (same as NVMe, virtio, GDMA). The `--vfio` CLI flag produces a
`PcieDeviceConfig` with a `VfioDeviceHandle` resource, which flows
through `build_pcie_device()` like any other PCIe device. The
`VfioDeviceResolver` handles all VFIO-specific setup
(container/group/device open, IOMMU configuration, DMA mapping) inside
`resolve()`. No special-case code in dispatch.rs.

### New crates

- **`vfio_assigned_device`** — `ChipsetDevice` implementation (config
space proxy, BAR MMIO dispatch, MSI-X emulation with irqfd routing, VFIO
container/group lifetime management) and `VfioDeviceResolver`
implementing `AsyncResolveResource<PciDeviceHandleKind,
VfioDeviceHandle>`.
- **`vfio_assigned_device_resources`** — `VfioDeviceHandle { pci_id:
String }` implementing `ResourceId<PciDeviceHandleKind>`.

### Key changes

- **`vfio_sys`** — Added `Container::map_dma()` / `unmap_dma()` for
IOMMU DMA mapping, `Device::unmap_msix()` for teardown.
- **`pci_core/msix.rs`** — New `MsixRoute` trait for kernel-mediated
per-vector interrupt delivery. `MsixEmulator` gains `set_routes()` /
`clear_routes()` — routes are automatically called on
mask/unmask/addr-data changes and PBA reads. Moves irqfd routing logic
out of individual devices into the shared emulator for reuse by
vhost-user and other passthrough devices. Includes unit tests with a
mock `MsixRoute`.
- **`pci_resources`** — `ResolvePciDeviceHandleParams` extended with
optional `irqfd` field for device passthrough resolvers.
- **`vmm_core/device_builder.rs`** — `build_pcie_device()` and
`resolve_and_add_pci_device()` updated to pass through `irqfd` and
`mem_layout`.
- **`openvmm_entry`** — New `--vfio <port>:<pci_bdf>` CLI flag with BDF
validation. VFIO devices merged into the standard `pcie_devices` config
list.
- **`openvmm_core/partition.rs`** — `HvlitePartition::irqfd()` exposes
the irqfd trait for passthrough devices.
- **`Guide/vfio.md`** — User guide covering prerequisites, device
binding, CLI usage, and troubleshooting.

### Current limitations

- No save/restore for VMs with VFIO devices
- Hot-plug requires wiring mem_layout into the AddPcieDevice RPC path
(not yet done)
- Linux only

### Testing

Tested end-to-end on mshv with NVMe passthrough (config space
enumeration, DMA reads/writes, MSI-X interrupts). Tested on KVM with
AHCI passthrough (config space + MMIO).

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: will-j-wright <1063607+will-j-wright@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Guide unsafe Related to unsafe code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants