Merge branch 'kvm-nvmx-and-vm-teardown' into HEAD

The immediate issue being fixed here is a nVMX bug where KVM fails to
detect that, after nested VM-Exit, L1 has a pending IRQ (or NMI).
However, checking for a pending interrupt accesses the legacy PIC, and
x86's kvm_arch_destroy_vm() currently frees the PIC before destroying
vCPUs, i.e. checking for IRQs during the forced nested VM-Exit results
in a NULL pointer deref; that's a prerequisite for the nVMX fix.

The remaining patches attempt to bring a bit of sanity to x86's VM
teardown code, which has accumulated a lot of cruft over the years.  E.g.
KVM currently unloads each vCPU's MMUs in a separate operation from
destroying vCPUs, all because when guest SMP support was added, KVM had a
kludgy MMU teardown flow that broke when a VM had more than one 1 vCPU.
And that oddity lived on, for 18 years...

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit is contained in:
Paolo Bonzini
2025-02-26 13:23:29 -05:00
9 changed files with 22 additions and 36 deletions

View File

@@ -489,6 +489,14 @@ void kvm_destroy_vcpus(struct kvm *kvm)
kvm_for_each_vcpu(i, vcpu, kvm) {
kvm_vcpu_destroy(vcpu);
xa_erase(&kvm->vcpu_array, i);
/*
* Assert that the vCPU isn't visible in any way, to ensure KVM
* doesn't trigger a use-after-free if destroying vCPUs results
* in VM-wide request, e.g. to flush remote TLBs when tearing
* down MMUs, or to mark the VM dead if a KVM_BUG_ON() fires.
*/
WARN_ON_ONCE(xa_load(&kvm->vcpu_array, i) || kvm_get_vcpu(kvm, i));
}
atomic_set(&kvm->online_vcpus, 0);
@@ -1263,7 +1271,6 @@ static void kvm_destroy_vm(struct kvm *kvm)
kvm_destroy_pm_notifier(kvm);
kvm_uevent_notify_change(KVM_EVENT_DESTROY_VM, kvm);
kvm_destroy_vm_debugfs(kvm);
kvm_arch_sync_events(kvm);
mutex_lock(&kvm_lock);
list_del(&kvm->vm_list);
mutex_unlock(&kvm_lock);