openvmm: Add VFIO PCI device assignment#3248
Conversation
Add a new crate that implements ChipsetDevice + PciConfigSpace for a physical PCI device accessed via Linux VFIO. Config space reads/writes are proxied to the physical device through the VFIO config region file descriptor. BARs are cached locally so the guest can probe sizes via the standard write-all-ones mechanism. MMIO returns all-ones for now (config space enumeration only). Save/restore is not supported. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a --vfio <port>:<pci_bdf> CLI argument (Linux only) that assigns a
host PCI device to the guest via VFIO. The device must be bound to
vfio-pci on the host before starting the VM.
In dispatch.rs, VFIO devices are opened via vfio_sys (container, group,
device), then wrapped in a VfioAssignedPciDevice and attached to the
specified PCIe root port. The guest sees the device via ECAM and can
read/write its config space.
Example usage:
openvmm --hypervisor mshv \
--pcie-root-complex rc0 \
--pcie-root-port rc0:rp0 \
--vfio rp0:3f7a:00:00.0 \
--kernel /var/images/vmlinux \
--cmdline "console=ttyS0" \
--com1 console --memory 256M --processors 2
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a user-facing guide for assigning physical PCI devices to OpenVMM guests via Linux VFIO. Covers the full workflow: identifying devices, enabling unsafe interrupts for Hyper-V IOMMU, binding to vfio-pci, launching OpenVMM with --vfio, and verifying in the guest. Also links the previously empty 'Direct Assigned' reference entry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- BAR MMIO: proxy reads/writes to physical device via pread/pwrite on the VFIO device fd (replaces all-ones stub) - MSI-X discovery: walk PCI capability list to find MSI-X, extract table count, BIR, offsets - MSI-X emulation: wire pci_core's MsixEmulator for software MSI-X table and PBA handling (same emulator used by NVMe/virtio) - Split-BAR MMIO dispatch: route MSI-X table/PBA accesses to the emulator, proxy everything else to hardware - Config space: intercept MSI-X capability writes to track enable/disable state in the emulator - dispatch.rs: query BAR region info, create MsiConnection, connect SignalMsi for interrupt delivery Interrupts do not yet fire (no eventfd wiring to VFIO). The guest can enumerate the device, probe BARs, configure MSI-X vectors, and access device registers. Interrupt delivery requires the planned MsixEmulator eventfd extension. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add MSI-X eventfd infrastructure (MsixEmulator, event_or_proxy, map_msix/unmap_msix) and chipset MMIO registration so BAR accesses are routed to the device. MSI-X emulation is currently disabled at runtime because Hyper-V's IOMMU lacks interrupt remapping: in the direct-attach model, device MSI writes are DMA transactions through the shared IOMMU page table and go directly to the guest's LAPIC, bypassing L1's VFIO interrupt handler entirely. VFIO's eventfd-based delivery cannot work in this configuration. The device operates via completion polling. The full MSI-X pipeline is implemented and ready to enable once interrupt remapping is available or a kernel-level MSI passthrough mechanism is added. Changes: - VfioAssignedPciDevice owns vfio_sys::Device directly for map_msix - VmTaskDriver stored for EventProxy async tasks - MSI-X discovery, MsixEmulator wiring, eventfd mapping all implemented but disabled pending interrupt remapping support - Chipset MMIO registration: BAR regions registered via services.register_mmio() and mapped/unmapped on MMIO enable/disable - dispatch.rs: passes Device ownership, driver, BAR region info, MsiConnection, and MMIO controls to the device Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the event_or_proxy workaround with irqfd-backed interrupt delivery. Instead of bridging VFIO eventfds through MsixEmulator's MsiInterrupt → MsiTarget → SignalMsi path (which requires L1 userspace for every interrupt), use irqfd to let the kernel inject MSIs directly into the guest when VFIO signals the eventfd. Changes: - msix_enable: creates events + irqfd routes via Partition::irqfd(), passes same events to VFIO map_msix - msix_disable: drops routes (auto-cleanup via IrqFdRoute::drop) - MSI-X table writes: after updating MsixEmulator, reads back the affected vector's addr/data and calls route.set_msi() to update kernel GSI routing - Removed VmTaskDriver and EventProxy dependencies - Added irqfd parameter to VfioAssignedPciDeviceConfig - Added irqfd() to HvlitePartition trait Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
This PR modifies files containing For more on why we check whole files, instead of just diffs, check out the Rustonomicon |
There was a problem hiding this comment.
Pull request overview
Adds Linux VFIO-based PCI passthrough support to OpenVMM, enabling assignment of host PCI devices to guests via a new --vfio CLI flag and a new ChipsetDevice implementation for config/BAR/MMIO/MSI-X handling.
Changes:
- Add VFIO container DMA map/unmap support and MSI-X teardown support in
vfio_sys. - Introduce
vfio_assigned_devicecrate implementing a VFIO-backed PCIe endpoint with MSI-X table/PBA emulation and irqfd delivery. - Wire
--vfiothroughopenvmm_entry→ config/manifest →openvmm_coredevice construction, and add user documentation.
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| vm/devices/user_driver/vfio_sys/src/lib.rs | Adds VFIO DMA map/unmap ioctls and MSI-X vector count handling improvements. |
| vm/devices/pci/vfio_assigned_device/src/lib.rs | New VFIO-backed ChipsetDevice for PCI config/BAR proxying and MSI-X emulation with irqfd routing. |
| vm/devices/pci/vfio_assigned_device/Cargo.toml | Defines the new vfio_assigned_device crate and dependencies. |
| vm/devices/pci/pci_core/src/capabilities/msix.rs | Exposes MsixEmulator::set_pending_bit for passthrough/PBA support. |
| petri/src/vm/openvmm/construct.rs | Adds Linux-only vfio_devices field initialization in OpenVMM config construction. |
| openvmm/openvmm_entry/src/ttrpc/mod.rs | Adds Linux-only vfio_devices field initialization for ttrpc-based configs. |
| openvmm/openvmm_entry/src/lib.rs | Plumbs --vfio CLI values into the VM config on Linux. |
| openvmm/openvmm_entry/src/cli_args.rs | Introduces --vfio flag and parsing for <port_name>:<pci_bdf> on Linux. |
| openvmm/openvmm_defs/src/config.rs | Adds Linux-only vfio_devices to config and defines VfioDeviceConfig. |
| openvmm/openvmm_core/src/worker/dispatch.rs | Implements VFIO device bring-up (container/group/device open, IOMMU selection, DMA mapping, device instantiation). |
| openvmm/openvmm_core/src/partition.rs | Extends HvlitePartition with an irqfd() accessor to support irqfd routing. |
| openvmm/openvmm_core/Cargo.toml | Adds Linux-only dependencies for VFIO device setup and the new assigned-device crate. |
| Guide/src/user_guide/openvmm/vfio.md | Adds a user guide page describing prerequisites, binding, usage, and troubleshooting for --vfio. |
| Guide/src/SUMMARY.md | Adds the new VFIO user guide page to the book summary. |
| Cargo.toml | Registers vfio_assigned_device in workspace dependencies. |
| Cargo.lock | Records the new crate and dependency edges in the lockfile. |
Agent-Logs-Url: https://github.com/will-j-wright/openvmm/sessions/5f3649fa-55b0-4c83-85fc-be96cc294e19 Co-authored-by: will-j-wright <1063607+will-j-wright@users.noreply.github.com>
| } | ||
| } | ||
|
|
||
| fn read_config_u32( |
There was a problem hiding this comment.
Don't we already have something like this in our vfio crate?
There was a problem hiding this comment.
It's in user_driver/vfio.rs which we can't use. I could factor out the duplicated code to vfio_sys, if you want.
| // bits). The table offset (DWORD 1) and PBA offset (DWORD 2) | ||
| // must come from hardware — the emulator uses different offsets | ||
| // than the physical device, and the MMIO handler translates | ||
| // based on the physical offsets. |
There was a problem hiding this comment.
Should we fix this, so that the emulator just mirrors the hardware values?
There was a problem hiding this comment.
Yes, but I would argue for it to be in a separate PR as it'll touch all the MSI emulator callers
Add support for assigning physical PCI devices to OpenVMM guests on
Linux using VFIO. A device bound to `vfio-pci` on the host can be passed
through to the guest via a new `--vfio` CLI flag, where it appears as a
PCIe endpoint with full functionality.
### Supported
- **Config space**: reads/writes proxied to the physical device via
VFIO. BAR sizing is emulated locally so the guest can probe without
hitting hardware on every access.
- **BAR MMIO**: proxied to the physical device via pread/pwrite on the
VFIO device fd.
- **MSI-X interrupts**: table and PBA are emulated in software via
`pci_core::MsixEmulator`. Interrupt delivery uses irqfd — VFIO signals
an eventfd, the kernel injects the MSI directly into the guest with no
VMM exit.
- **DMA**: guest RAM is identity-mapped into the IOMMU (IOVA == GPA) via
`VFIO_IOMMU_MAP_DMA`. IOMMU Type1v2 is required.
### Architecture
VFIO devices use the standard `PciDeviceHandleKind` resource resolver
pattern (same as NVMe, virtio, GDMA). The `--vfio` CLI flag produces a
`PcieDeviceConfig` with a `VfioDeviceHandle` resource, which flows
through `build_pcie_device()` like any other PCIe device. The
`VfioDeviceResolver` handles all VFIO-specific setup
(container/group/device open, IOMMU configuration, DMA mapping) inside
`resolve()`. No special-case code in dispatch.rs.
### New crates
- **`vfio_assigned_device`** — `ChipsetDevice` implementation (config
space proxy, BAR MMIO dispatch, MSI-X emulation with irqfd routing, VFIO
container/group lifetime management) and `VfioDeviceResolver`
implementing `AsyncResolveResource<PciDeviceHandleKind,
VfioDeviceHandle>`.
- **`vfio_assigned_device_resources`** — `VfioDeviceHandle { pci_id:
String }` implementing `ResourceId<PciDeviceHandleKind>`.
### Key changes
- **`vfio_sys`** — Added `Container::map_dma()` / `unmap_dma()` for
IOMMU DMA mapping, `Device::unmap_msix()` for teardown.
- **`pci_core/msix.rs`** — New `MsixRoute` trait for kernel-mediated
per-vector interrupt delivery. `MsixEmulator` gains `set_routes()` /
`clear_routes()` — routes are automatically called on
mask/unmask/addr-data changes and PBA reads. Moves irqfd routing logic
out of individual devices into the shared emulator for reuse by
vhost-user and other passthrough devices. Includes unit tests with a
mock `MsixRoute`.
- **`pci_resources`** — `ResolvePciDeviceHandleParams` extended with
optional `irqfd` field for device passthrough resolvers.
- **`vmm_core/device_builder.rs`** — `build_pcie_device()` and
`resolve_and_add_pci_device()` updated to pass through `irqfd` and
`mem_layout`.
- **`openvmm_entry`** — New `--vfio <port>:<pci_bdf>` CLI flag with BDF
validation. VFIO devices merged into the standard `pcie_devices` config
list.
- **`openvmm_core/partition.rs`** — `HvlitePartition::irqfd()` exposes
the irqfd trait for passthrough devices.
- **`Guide/vfio.md`** — User guide covering prerequisites, device
binding, CLI usage, and troubleshooting.
### Current limitations
- No save/restore for VMs with VFIO devices
- Hot-plug requires wiring mem_layout into the AddPcieDevice RPC path
(not yet done)
- Linux only
### Testing
Tested end-to-end on mshv with NVMe passthrough (config space
enumeration, DMA reads/writes, MSI-X interrupts). Tested on KVM with
AHCI passthrough (config space + MMIO).
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: will-j-wright <1063607+will-j-wright@users.noreply.github.com>
Add support for assigning physical PCI devices to OpenVMM guests on Linux using VFIO. A device bound to
vfio-pcion the host can be passed through to the guest via a new--vfioCLI flag, where it appears as a PCIe endpoint with full functionality.Supported
pci_core::MsixEmulator. Interrupt delivery uses irqfd — VFIO signals an eventfd, the kernel injects the MSI directly into the guest with no VMM exit.VFIO_IOMMU_MAP_DMA. IOMMU Type1v2 is required.Architecture
VFIO devices use the standard
PciDeviceHandleKindresource resolver pattern (same as NVMe, virtio, GDMA). The--vfioCLI flag produces aPcieDeviceConfigwith aVfioDeviceHandleresource, which flows throughbuild_pcie_device()like any other PCIe device. TheVfioDeviceResolverhandles all VFIO-specific setup (container/group/device open, IOMMU configuration, DMA mapping) insideresolve(). No special-case code in dispatch.rs.New crates
vfio_assigned_device—ChipsetDeviceimplementation (config space proxy, BAR MMIO dispatch, MSI-X emulation with irqfd routing, VFIO container/group lifetime management) andVfioDeviceResolverimplementingAsyncResolveResource<PciDeviceHandleKind, VfioDeviceHandle>.vfio_assigned_device_resources—VfioDeviceHandle { pci_id: String }implementingResourceId<PciDeviceHandleKind>.Key changes
vfio_sys— AddedContainer::map_dma()/unmap_dma()for IOMMU DMA mapping,Device::unmap_msix()for teardown.pci_core/msix.rs— NewMsixRoutetrait for kernel-mediated per-vector interrupt delivery.MsixEmulatorgainsset_routes()/clear_routes()— routes are automatically called on mask/unmask/addr-data changes and PBA reads. Moves irqfd routing logic out of individual devices into the shared emulator for reuse by vhost-user and other passthrough devices. Includes unit tests with a mockMsixRoute.pci_resources—ResolvePciDeviceHandleParamsextended with optionalirqfdfield for device passthrough resolvers.vmm_core/device_builder.rs—build_pcie_device()andresolve_and_add_pci_device()updated to pass throughirqfdandmem_layout.openvmm_entry— New--vfio <port>:<pci_bdf>CLI flag with BDF validation. VFIO devices merged into the standardpcie_devicesconfig list.openvmm_core/partition.rs—HvlitePartition::irqfd()exposes the irqfd trait for passthrough devices.Guide/vfio.md— User guide covering prerequisites, device binding, CLI usage, and troubleshooting.Current limitations
Testing
Tested end-to-end on mshv with NVMe passthrough (config space enumeration, DMA reads/writes, MSI-X interrupts). Tested on KVM with AHCI passthrough (config space + MMIO).