Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Quite a large pull request, partly due to skipping last week and
  therefore having material from ~all submaintainers in this one. About
  a fourth of it is a new selftest, and a couple more changes are large
  in number of files touched (fixing a -Wflex-array-member-not-at-end
  compiler warning) or lines changed (reformatting of a table in the API
  documentation, thanks rST).

  But who am I kidding---it's a lot of commits and there are a lot of
  bugs being fixed here, some of them on the nastier side like the
  RISC-V ones.

  ARM:

   - Correctly handle deactivation of interrupts that were activated
     from LRs. Since EOIcount only denotes deactivation of interrupts
     that are not present in an LR, start EOIcount deactivation walk
     *after* the last irq that made it into an LR

   - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM
     is already enabled -- not only thhis isn't possible (pKVM will
     reject the call), but it is also useless: this can only happen for
     a CPU that has already booted once, and the capability will not
     change

   - Fix a couple of low-severity bugs in our S2 fault handling path,
     affecting the recently introduced LS64 handling and the even more
     esoteric handling of hwpoison in a nested context

   - Address yet another syzkaller finding in the vgic initialisation,
     where we would end-up destroying an uninitialised vgic with nasty
     consequences

   - Address an annoying case of pKVM failing to boot when some of the
     memblock regions that the host is faulting in are not page-aligned

   - Inject some sanity in the NV stage-2 walker by checking the limits
     against the advertised PA size, and correctly report the resulting
     faults

  PPC:

   - Fix a PPC e500 build error due to a long-standing wart that was
     exposed by the recent conversion to kmalloc_obj(); rip out all the
     ugliness that led to the wart

  RISC-V:

   - Prevent speculative out-of-bounds access using array_index_nospec()
     in APLIC interrupt handling, ONE_REG regiser access, AIA CSR
     access, float register access, and PMU counter access

   - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(),
     kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr()

   - Fix potential null pointer dereference in
     kvm_riscv_vcpu_aia_rmw_topei()

   - Fix off-by-one array access in SBI PMU

   - Skip THP support check during dirty logging

   - Fix error code returned for Smstateen and Ssaia ONE_REG interface

   - Check host Ssaia extension when creating AIA irqchip

  x86:

   - Fix cases where CPUID mitigation features were incorrectly marked
     as available whenever the kernel used scattered feature words for
     them

   - Validate _all_ GVAs, rather than just the first GVA, when
     processing a range of GVAs for Hyper-V's TLB flush hypercalls

   - Fix a brown paper bug in add_atomic_switch_msr()

   - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list,
     to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu

   - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel
     local APIC (and AVIC is enabled at the module level)

   - Update CR8 write interception when AVIC is (de)activated, to fix a
     bug where the guest can run in perpetuity with the CR8 intercept
     enabled

   - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to
     allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by
     default) an unintentional tightening of userspace ABI in 6.17, and
     provides some amount of backwards compatibility with hypervisors
     who want to freeze PMCs on VM-Entry

   - Validate the VMCS/VMCB on return to a nested guest from SMM,
     because either userspace or the guest could stash invalid values in
     memory and trigger the processor's consistency checks

  Generic:

   - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from
     being unnecessary and confusing, triggered compiler warnings due to
     -Wflex-array-member-not-at-end

   - Document that vcpu->mutex is take outside of kvm->slots_lock and
     kvm->slots_arch_lock, which is intentional and desirable despite
     being rather unintuitive

  Selftests:

   - Increase the maximum number of NUMA nodes in the guest_memfd
     selftest to 64 (from 8)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits)
  KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8
  Documentation: kvm: fix formatting of the quirks table
  KVM: x86: clarify leave_smm() return value
  selftests: kvm: add a test that VMX validates controls on RSM
  selftests: kvm: extract common functionality out of smm_test.c
  KVM: SVM: check validity of VMCB controls when returning from SMM
  KVM: VMX: check validity of VMCS controls when returning from SMM
  KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
  KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM
  KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers()
  KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr()
  KVM: x86: hyper-v: Validate all GVAs during PV TLB flush
  KVM: x86: synthesize CPUID bits only if CPU capability is set
  KVM: PPC: e500: Rip out "struct tlbe_ref"
  KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type
  KVM: selftests: Increase 'maxnode' for guest_memfd tests
  KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug
  KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail
  KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2()
  ...
This commit is contained in:
Linus Torvalds
2026-03-15 12:22:10 -07:00
57 changed files with 749 additions and 379 deletions

View File

@@ -8435,115 +8435,123 @@ KVM_CHECK_EXTENSION.
The valid bits in cap.args[0] are:
=================================== ============================================
KVM_X86_QUIRK_LINT0_REENABLED By default, the reset value for the LVT
LINT0 register is 0x700 (APIC_MODE_EXTINT).
When this quirk is disabled, the reset value
is 0x10000 (APIC_LVT_MASKED).
======================================== ================================================
KVM_X86_QUIRK_LINT0_REENABLED By default, the reset value for the LVT
LINT0 register is 0x700 (APIC_MODE_EXTINT).
When this quirk is disabled, the reset value
is 0x10000 (APIC_LVT_MASKED).
KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW on
AMD CPUs to workaround buggy guest firmware
that runs in perpetuity with CR0.CD, i.e.
with caches in "no fill" mode.
KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW on
AMD CPUs to workaround buggy guest firmware
that runs in perpetuity with CR0.CD, i.e.
with caches in "no fill" mode.
When this quirk is disabled, KVM does not
change the value of CR0.CD and CR0.NW.
When this quirk is disabled, KVM does not
change the value of CR0.CD and CR0.NW.
KVM_X86_QUIRK_LAPIC_MMIO_HOLE By default, the MMIO LAPIC interface is
available even when configured for x2APIC
mode. When this quirk is disabled, KVM
disables the MMIO LAPIC interface if the
LAPIC is in x2APIC mode.
KVM_X86_QUIRK_LAPIC_MMIO_HOLE By default, the MMIO LAPIC interface is
available even when configured for x2APIC
mode. When this quirk is disabled, KVM
disables the MMIO LAPIC interface if the
LAPIC is in x2APIC mode.
KVM_X86_QUIRK_OUT_7E_INC_RIP By default, KVM pre-increments %rip before
exiting to userspace for an OUT instruction
to port 0x7e. When this quirk is disabled,
KVM does not pre-increment %rip before
exiting to userspace.
KVM_X86_QUIRK_OUT_7E_INC_RIP By default, KVM pre-increments %rip before
exiting to userspace for an OUT instruction
to port 0x7e. When this quirk is disabled,
KVM does not pre-increment %rip before
exiting to userspace.
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT When this quirk is disabled, KVM sets
CPUID.01H:ECX[bit 3] (MONITOR/MWAIT) if
IA32_MISC_ENABLE[bit 18] (MWAIT) is set.
Additionally, when this quirk is disabled,
KVM clears CPUID.01H:ECX[bit 3] if
IA32_MISC_ENABLE[bit 18] is cleared.
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT When this quirk is disabled, KVM sets
CPUID.01H:ECX[bit 3] (MONITOR/MWAIT) if
IA32_MISC_ENABLE[bit 18] (MWAIT) is set.
Additionally, when this quirk is disabled,
KVM clears CPUID.01H:ECX[bit 3] if
IA32_MISC_ENABLE[bit 18] is cleared.
KVM_X86_QUIRK_FIX_HYPERCALL_INSN By default, KVM rewrites guest
VMMCALL/VMCALL instructions to match the
vendor's hypercall instruction for the
system. When this quirk is disabled, KVM
will no longer rewrite invalid guest
hypercall instructions. Executing the
incorrect hypercall instruction will
generate a #UD within the guest.
KVM_X86_QUIRK_FIX_HYPERCALL_INSN By default, KVM rewrites guest
VMMCALL/VMCALL instructions to match the
vendor's hypercall instruction for the
system. When this quirk is disabled, KVM
will no longer rewrite invalid guest
hypercall instructions. Executing the
incorrect hypercall instruction will
generate a #UD within the guest.
KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS By default, KVM emulates MONITOR/MWAIT (if
they are intercepted) as NOPs regardless of
whether or not MONITOR/MWAIT are supported
according to guest CPUID. When this quirk
is disabled and KVM_X86_DISABLE_EXITS_MWAIT
is not set (MONITOR/MWAIT are intercepted),
KVM will inject a #UD on MONITOR/MWAIT if
they're unsupported per guest CPUID. Note,
KVM will modify MONITOR/MWAIT support in
guest CPUID on writes to MISC_ENABLE if
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is
disabled.
KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS By default, KVM emulates MONITOR/MWAIT (if
they are intercepted) as NOPs regardless of
whether or not MONITOR/MWAIT are supported
according to guest CPUID. When this quirk
is disabled and KVM_X86_DISABLE_EXITS_MWAIT
is not set (MONITOR/MWAIT are intercepted),
KVM will inject a #UD on MONITOR/MWAIT if
they're unsupported per guest CPUID. Note,
KVM will modify MONITOR/MWAIT support in
guest CPUID on writes to MISC_ENABLE if
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is
disabled.
KVM_X86_QUIRK_SLOT_ZAP_ALL By default, for KVM_X86_DEFAULT_VM VMs, KVM
invalidates all SPTEs in all memslots and
address spaces when a memslot is deleted or
moved. When this quirk is disabled (or the
VM type isn't KVM_X86_DEFAULT_VM), KVM only
ensures the backing memory of the deleted
or moved memslot isn't reachable, i.e KVM
_may_ invalidate only SPTEs related to the
memslot.
KVM_X86_QUIRK_SLOT_ZAP_ALL By default, for KVM_X86_DEFAULT_VM VMs, KVM
invalidates all SPTEs in all memslots and
address spaces when a memslot is deleted or
moved. When this quirk is disabled (or the
VM type isn't KVM_X86_DEFAULT_VM), KVM only
ensures the backing memory of the deleted
or moved memslot isn't reachable, i.e KVM
_may_ invalidate only SPTEs related to the
memslot.
KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the
vCPU's MSR_IA32_PERF_CAPABILITIES (0x345),
MSR_IA32_ARCH_CAPABILITIES (0x10a),
MSR_PLATFORM_INFO (0xce), and all VMX MSRs
(0x480..0x492) to the maximal capabilities
supported by KVM. KVM also sets
MSR_IA32_UCODE_REV (0x8b) to an arbitrary
value (which is different for Intel vs.
AMD). Lastly, when guest CPUID is set (by
userspace), KVM modifies select VMX MSR
fields to force consistency between guest
CPUID and L2's effective ISA. When this
quirk is disabled, KVM zeroes the vCPU's MSR
values (with two exceptions, see below),
i.e. treats the feature MSRs like CPUID
leaves and gives userspace full control of
the vCPU model definition. This quirk does
not affect VMX MSRs CR0/CR4_FIXED1 (0x487
and 0x489), as KVM does now allow them to
be set by userspace (KVM sets them based on
guest CPUID, for safety purposes).
KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the
vCPU's MSR_IA32_PERF_CAPABILITIES (0x345),
MSR_IA32_ARCH_CAPABILITIES (0x10a),
MSR_PLATFORM_INFO (0xce), and all VMX MSRs
(0x480..0x492) to the maximal capabilities
supported by KVM. KVM also sets
MSR_IA32_UCODE_REV (0x8b) to an arbitrary
value (which is different for Intel vs.
AMD). Lastly, when guest CPUID is set (by
userspace), KVM modifies select VMX MSR
fields to force consistency between guest
CPUID and L2's effective ISA. When this
quirk is disabled, KVM zeroes the vCPU's MSR
values (with two exceptions, see below),
i.e. treats the feature MSRs like CPUID
leaves and gives userspace full control of
the vCPU model definition. This quirk does
not affect VMX MSRs CR0/CR4_FIXED1 (0x487
and 0x489), as KVM does now allow them to
be set by userspace (KVM sets them based on
guest CPUID, for safety purposes).
KVM_X86_QUIRK_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores
guest PAT and forces the effective memory
type to WB in EPT. The quirk is not available
on Intel platforms which are incapable of
safely honoring guest PAT (i.e., without CPU
self-snoop, KVM always ignores guest PAT and
forces effective memory type to WB). It is
also ignored on AMD platforms or, on Intel,
when a VM has non-coherent DMA devices
assigned; KVM always honors guest PAT in
such case. The quirk is needed to avoid
slowdowns on certain Intel Xeon platforms
(e.g. ICX, SPR) where self-snoop feature is
supported but UC is slow enough to cause
issues with some older guests that use
UC instead of WC to map the video RAM.
Userspace can disable the quirk to honor
guest PAT if it knows that there is no such
guest software, for example if it does not
expose a bochs graphics device (which is
known to have had a buggy driver).
=================================== ============================================
KVM_X86_QUIRK_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores
guest PAT and forces the effective memory
type to WB in EPT. The quirk is not available
on Intel platforms which are incapable of
safely honoring guest PAT (i.e., without CPU
self-snoop, KVM always ignores guest PAT and
forces effective memory type to WB). It is
also ignored on AMD platforms or, on Intel,
when a VM has non-coherent DMA devices
assigned; KVM always honors guest PAT in
such case. The quirk is needed to avoid
slowdowns on certain Intel Xeon platforms
(e.g. ICX, SPR) where self-snoop feature is
supported but UC is slow enough to cause
issues with some older guests that use
UC instead of WC to map the video RAM.
Userspace can disable the quirk to honor
guest PAT if it knows that there is no such
guest software, for example if it does not
expose a bochs graphics device (which is
known to have had a buggy driver).
KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM By default, KVM relaxes the consistency
check for GUEST_IA32_DEBUGCTL in vmcs12
to allow FREEZE_IN_SMM to be set. When
this quirk is disabled, KVM requires this
bit to be cleared. Note that the vmcs02
bit is still completely controlled by the
host, regardless of the quirk setting.
======================================== ================================================
7.32 KVM_CAP_MAX_VCPU_ID
------------------------

View File

@@ -17,6 +17,8 @@ The acquisition orders for mutexes are as follows:
- kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock
- vcpu->mutex is taken outside kvm->slots_lock and kvm->slots_arch_lock
- kvm->slots_lock is taken outside kvm->irq_lock, though acquiring
them together is quite rare.

View File

@@ -784,6 +784,9 @@ struct kvm_host_data {
/* Number of debug breakpoints/watchpoints for this CPU (minus 1) */
unsigned int debug_brps;
unsigned int debug_wrps;
/* Last vgic_irq part of the AP list recorded in an LR */
struct vgic_irq *last_lr_irq;
};
struct kvm_host_psci_config {

View File

@@ -2345,6 +2345,15 @@ static bool can_trap_icv_dir_el1(const struct arm64_cpu_capabilities *entry,
!is_midr_in_range_list(has_vgic_v3))
return false;
/*
* pKVM prevents late onlining of CPUs. This means that whatever
* state the capability is in after deprivilege cannot be affected
* by a new CPU booting -- this is garanteed to be a CPU we have
* already seen, and the cap is therefore unchanged.
*/
if (system_capabilities_finalized() && is_protected_kvm_enabled())
return cpus_have_final_cap(ARM64_HAS_ICH_HCR_EL2_TDIR);
if (is_kernel_in_hyp_mode())
res.a1 = read_sysreg_s(SYS_ICH_VTR_EL2);
else

View File

@@ -1504,8 +1504,6 @@ int __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
fail = true;
}
isb();
if (!fail)
par = read_sysreg_par();

View File

@@ -29,7 +29,7 @@
#include "trace.h"
const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS()
};
@@ -42,7 +42,7 @@ const struct kvm_stats_header kvm_vm_stats_header = {
sizeof(kvm_vm_stats_desc),
};
const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, hvc_exit_stat),
STATS_DESC_COUNTER(VCPU, wfe_exit_stat),

View File

@@ -518,7 +518,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
granule = kvm_granule_size(level);
cur.start = ALIGN_DOWN(addr, granule);
cur.end = cur.start + granule;
if (!range_included(&cur, range))
if (!range_included(&cur, range) && level < KVM_PGTABLE_LAST_LEVEL)
continue;
*range = cur;
return 0;

View File

@@ -1751,6 +1751,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
force_pte = (max_map_size == PAGE_SIZE);
vma_pagesize = min_t(long, vma_pagesize, max_map_size);
vma_shift = __ffs(vma_pagesize);
}
/*
@@ -1837,10 +1838,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (exec_fault && s2_force_noncacheable)
ret = -ENOEXEC;
if (ret) {
kvm_release_page_unused(page);
return ret;
}
if (ret)
goto out_put_page;
/*
* Guest performs atomic/exclusive operations on memory with unsupported
@@ -1850,7 +1849,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
*/
if (esr_fsc_is_excl_atomic_fault(kvm_vcpu_get_esr(vcpu))) {
kvm_inject_dabt_excl_atomic(vcpu, kvm_vcpu_get_hfar(vcpu));
return 1;
ret = 1;
goto out_put_page;
}
if (nested)
@@ -1936,6 +1936,10 @@ out_unlock:
mark_page_dirty_in_slot(kvm, memslot, gfn);
return ret != -EAGAIN ? ret : 0;
out_put_page:
kvm_release_page_unused(page);
return ret;
}
/* Resolve the access fault by making the page young again. */

View File

@@ -152,31 +152,31 @@ static int get_ia_size(struct s2_walk_info *wi)
return 64 - wi->t0sz;
}
static int check_base_s2_limits(struct s2_walk_info *wi,
static int check_base_s2_limits(struct kvm_vcpu *vcpu, struct s2_walk_info *wi,
int level, int input_size, int stride)
{
int start_size, ia_size;
int start_size, pa_max;
ia_size = get_ia_size(wi);
pa_max = kvm_get_pa_bits(vcpu->kvm);
/* Check translation limits */
switch (BIT(wi->pgshift)) {
case SZ_64K:
if (level == 0 || (level == 1 && ia_size <= 42))
if (level == 0 || (level == 1 && pa_max <= 42))
return -EFAULT;
break;
case SZ_16K:
if (level == 0 || (level == 1 && ia_size <= 40))
if (level == 0 || (level == 1 && pa_max <= 40))
return -EFAULT;
break;
case SZ_4K:
if (level < 0 || (level == 0 && ia_size <= 42))
if (level < 0 || (level == 0 && pa_max <= 42))
return -EFAULT;
break;
}
/* Check input size limits */
if (input_size > ia_size)
if (input_size > pa_max)
return -EFAULT;
/* Check number of entries in starting level table */
@@ -269,16 +269,19 @@ static int walk_nested_s2_pgd(struct kvm_vcpu *vcpu, phys_addr_t ipa,
if (input_size > 48 || input_size < 25)
return -EFAULT;
ret = check_base_s2_limits(wi, level, input_size, stride);
if (WARN_ON(ret))
ret = check_base_s2_limits(vcpu, wi, level, input_size, stride);
if (WARN_ON(ret)) {
out->esr = compute_fsc(0, ESR_ELx_FSC_FAULT);
return ret;
}
base_lower_bound = 3 + input_size - ((3 - level) * stride +
wi->pgshift);
base_addr = wi->baddr & GENMASK_ULL(47, base_lower_bound);
if (check_output_size(wi, base_addr)) {
out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
/* R_BFHQH */
out->esr = compute_fsc(0, ESR_ELx_FSC_ADDRSZ);
return 1;
}
@@ -293,8 +296,10 @@ static int walk_nested_s2_pgd(struct kvm_vcpu *vcpu, phys_addr_t ipa,
paddr = base_addr | index;
ret = read_guest_s2_desc(vcpu, paddr, &desc, wi);
if (ret < 0)
if (ret < 0) {
out->esr = ESR_ELx_FSC_SEA_TTW(level);
return ret;
}
new_desc = desc;

View File

@@ -143,23 +143,6 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
kvm->arch.vgic.in_kernel = true;
kvm->arch.vgic.vgic_model = type;
kvm->arch.vgic.implementation_rev = KVM_VGIC_IMP_REV_LATEST;
kvm_for_each_vcpu(i, vcpu, kvm) {
ret = vgic_allocate_private_irqs_locked(vcpu, type);
if (ret)
break;
}
if (ret) {
kvm_for_each_vcpu(i, vcpu, kvm) {
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
kfree(vgic_cpu->private_irqs);
vgic_cpu->private_irqs = NULL;
}
goto out_unlock;
}
kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
aa64pfr0 = kvm_read_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1) & ~ID_AA64PFR0_EL1_GIC;
@@ -176,6 +159,23 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
kvm_set_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1, aa64pfr0);
kvm_set_vm_id_reg(kvm, SYS_ID_PFR1_EL1, pfr1);
kvm_for_each_vcpu(i, vcpu, kvm) {
ret = vgic_allocate_private_irqs_locked(vcpu, type);
if (ret)
break;
}
if (ret) {
kvm_for_each_vcpu(i, vcpu, kvm) {
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
kfree(vgic_cpu->private_irqs);
vgic_cpu->private_irqs = NULL;
}
kvm->arch.vgic.vgic_model = 0;
goto out_unlock;
}
if (type == KVM_DEV_TYPE_ARM_VGIC_V3)
kvm->arch.vgic.nassgicap = system_supports_direct_sgis();

View File

@@ -115,7 +115,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_v2_cpu_if *cpuif = &vgic_cpu->vgic_v2;
u32 eoicount = FIELD_GET(GICH_HCR_EOICOUNT, cpuif->vgic_hcr);
struct vgic_irq *irq;
struct vgic_irq *irq = *host_data_ptr(last_lr_irq);
DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
@@ -123,7 +123,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
vgic_v2_fold_lr(vcpu, cpuif->vgic_lr[lr]);
/* See the GICv3 equivalent for the EOIcount handling rationale */
list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
list_for_each_entry_continue(irq, &vgic_cpu->ap_list_head, ap_list) {
u32 lr;
if (!eoicount) {

View File

@@ -148,7 +148,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_v3_cpu_if *cpuif = &vgic_cpu->vgic_v3;
u32 eoicount = FIELD_GET(ICH_HCR_EL2_EOIcount, cpuif->vgic_hcr);
struct vgic_irq *irq;
struct vgic_irq *irq = *host_data_ptr(last_lr_irq);
DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
@@ -158,12 +158,12 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
/*
* EOIMode=0: use EOIcount to emulate deactivation. We are
* guaranteed to deactivate in reverse order of the activation, so
* just pick one active interrupt after the other in the ap_list,
* and replay the deactivation as if the CPU was doing it. We also
* rely on priority drop to have taken place, and the list to be
* sorted by priority.
* just pick one active interrupt after the other in the tail part
* of the ap_list, past the LRs, and replay the deactivation as if
* the CPU was doing it. We also rely on priority drop to have taken
* place, and the list to be sorted by priority.
*/
list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
list_for_each_entry_continue(irq, &vgic_cpu->ap_list_head, ap_list) {
u64 lr;
/*

View File

@@ -814,6 +814,9 @@ retry:
static inline void vgic_fold_lr_state(struct kvm_vcpu *vcpu)
{
if (!*host_data_ptr(last_lr_irq))
return;
if (kvm_vgic_global_state.type == VGIC_V2)
vgic_v2_fold_lr_state(vcpu);
else
@@ -960,10 +963,13 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
if (irqs_outside_lrs(&als))
vgic_sort_ap_list(vcpu);
*host_data_ptr(last_lr_irq) = NULL;
list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
scoped_guard(raw_spinlock, &irq->irq_lock) {
if (likely(vgic_target_oracle(irq) == vcpu)) {
vgic_populate_lr(vcpu, irq, count++);
*host_data_ptr(last_lr_irq) = irq;
}
}

View File

@@ -14,7 +14,7 @@
#define CREATE_TRACE_POINTS
#include "trace.h"
const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, int_exits),
STATS_DESC_COUNTER(VCPU, idle_exits),

View File

@@ -10,7 +10,7 @@
#include <asm/kvm_eiointc.h>
#include <asm/kvm_pch_pic.h>
const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS(),
STATS_DESC_ICOUNTER(VM, pages),
STATS_DESC_ICOUNTER(VM, hugepages),

View File

@@ -38,7 +38,7 @@
#define VECTORSPACING 0x100 /* for EI/VI mode */
#endif
const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS()
};
@@ -51,7 +51,7 @@ const struct kvm_stats_header kvm_vm_stats_header = {
sizeof(kvm_vm_stats_desc),
};
const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, wait_exits),
STATS_DESC_COUNTER(VCPU, cache_exits),

View File

@@ -38,7 +38,7 @@
/* #define EXIT_DEBUG */
const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS(),
STATS_DESC_ICOUNTER(VM, num_2M_pages),
STATS_DESC_ICOUNTER(VM, num_1G_pages)
@@ -53,7 +53,7 @@ const struct kvm_stats_header kvm_vm_stats_header = {
sizeof(kvm_vm_stats_desc),
};
const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, sum_exits),
STATS_DESC_COUNTER(VCPU, mmio_exits),

View File

@@ -36,7 +36,7 @@
unsigned long kvmppc_booke_handlers;
const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS(),
STATS_DESC_ICOUNTER(VM, num_2M_pages),
STATS_DESC_ICOUNTER(VM, num_1G_pages)
@@ -51,7 +51,7 @@ const struct kvm_stats_header kvm_vm_stats_header = {
sizeof(kvm_vm_stats_desc),
};
const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, sum_exits),
STATS_DESC_COUNTER(VCPU, mmio_exits),

View File

@@ -39,15 +39,11 @@ enum vcpu_ftr {
/* bits [6-5] MAS2_X1 and MAS2_X0 and [4-0] bits for WIMGE */
#define E500_TLB_MAS2_ATTR (0x7f)
struct tlbe_ref {
struct tlbe_priv {
kvm_pfn_t pfn; /* valid only for TLB0, except briefly */
unsigned int flags; /* E500_TLB_* */
};
struct tlbe_priv {
struct tlbe_ref ref;
};
#ifdef CONFIG_KVM_E500V2
struct vcpu_id_table;
#endif

View File

@@ -920,12 +920,12 @@ int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 *vcpu_e500)
vcpu_e500->gtlb_offset[0] = 0;
vcpu_e500->gtlb_offset[1] = KVM_E500_TLB0_SIZE;
vcpu_e500->gtlb_priv[0] = kzalloc_objs(struct tlbe_ref,
vcpu_e500->gtlb_priv[0] = kzalloc_objs(struct tlbe_priv,
vcpu_e500->gtlb_params[0].entries);
if (!vcpu_e500->gtlb_priv[0])
goto free_vcpu;
vcpu_e500->gtlb_priv[1] = kzalloc_objs(struct tlbe_ref,
vcpu_e500->gtlb_priv[1] = kzalloc_objs(struct tlbe_priv,
vcpu_e500->gtlb_params[1].entries);
if (!vcpu_e500->gtlb_priv[1])
goto free_vcpu;

View File

@@ -189,16 +189,16 @@ void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 *vcpu_e500, int tlbsel,
{
struct kvm_book3e_206_tlb_entry *gtlbe =
get_entry(vcpu_e500, tlbsel, esel);
struct tlbe_ref *ref = &vcpu_e500->gtlb_priv[tlbsel][esel].ref;
struct tlbe_priv *tlbe = &vcpu_e500->gtlb_priv[tlbsel][esel];
/* Don't bother with unmapped entries */
if (!(ref->flags & E500_TLB_VALID)) {
WARN(ref->flags & (E500_TLB_BITMAP | E500_TLB_TLB0),
"%s: flags %x\n", __func__, ref->flags);
if (!(tlbe->flags & E500_TLB_VALID)) {
WARN(tlbe->flags & (E500_TLB_BITMAP | E500_TLB_TLB0),
"%s: flags %x\n", __func__, tlbe->flags);
WARN_ON(tlbsel == 1 && vcpu_e500->g2h_tlb1_map[esel]);
}
if (tlbsel == 1 && ref->flags & E500_TLB_BITMAP) {
if (tlbsel == 1 && tlbe->flags & E500_TLB_BITMAP) {
u64 tmp = vcpu_e500->g2h_tlb1_map[esel];
int hw_tlb_indx;
unsigned long flags;
@@ -216,28 +216,28 @@ void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 *vcpu_e500, int tlbsel,
}
mb();
vcpu_e500->g2h_tlb1_map[esel] = 0;
ref->flags &= ~(E500_TLB_BITMAP | E500_TLB_VALID);
tlbe->flags &= ~(E500_TLB_BITMAP | E500_TLB_VALID);
local_irq_restore(flags);
}
if (tlbsel == 1 && ref->flags & E500_TLB_TLB0) {
if (tlbsel == 1 && tlbe->flags & E500_TLB_TLB0) {
/*
* TLB1 entry is backed by 4k pages. This should happen
* rarely and is not worth optimizing. Invalidate everything.
*/
kvmppc_e500_tlbil_all(vcpu_e500);
ref->flags &= ~(E500_TLB_TLB0 | E500_TLB_VALID);
tlbe->flags &= ~(E500_TLB_TLB0 | E500_TLB_VALID);
}
/*
* If TLB entry is still valid then it's a TLB0 entry, and thus
* backed by at most one host tlbe per shadow pid
*/
if (ref->flags & E500_TLB_VALID)
if (tlbe->flags & E500_TLB_VALID)
kvmppc_e500_tlbil_one(vcpu_e500, gtlbe);
/* Mark the TLB as not backed by the host anymore */
ref->flags = 0;
tlbe->flags = 0;
}
static inline int tlbe_is_writable(struct kvm_book3e_206_tlb_entry *tlbe)
@@ -245,26 +245,26 @@ static inline int tlbe_is_writable(struct kvm_book3e_206_tlb_entry *tlbe)
return tlbe->mas7_3 & (MAS3_SW|MAS3_UW);
}
static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
struct kvm_book3e_206_tlb_entry *gtlbe,
kvm_pfn_t pfn, unsigned int wimg,
bool writable)
static inline void kvmppc_e500_tlbe_setup(struct tlbe_priv *tlbe,
struct kvm_book3e_206_tlb_entry *gtlbe,
kvm_pfn_t pfn, unsigned int wimg,
bool writable)
{
ref->pfn = pfn;
ref->flags = E500_TLB_VALID;
tlbe->pfn = pfn;
tlbe->flags = E500_TLB_VALID;
if (writable)
ref->flags |= E500_TLB_WRITABLE;
tlbe->flags |= E500_TLB_WRITABLE;
/* Use guest supplied MAS2_G and MAS2_E */
ref->flags |= (gtlbe->mas2 & MAS2_ATTRIB_MASK) | wimg;
tlbe->flags |= (gtlbe->mas2 & MAS2_ATTRIB_MASK) | wimg;
}
static inline void kvmppc_e500_ref_release(struct tlbe_ref *ref)
static inline void kvmppc_e500_tlbe_release(struct tlbe_priv *tlbe)
{
if (ref->flags & E500_TLB_VALID) {
if (tlbe->flags & E500_TLB_VALID) {
/* FIXME: don't log bogus pfn for TLB1 */
trace_kvm_booke206_ref_release(ref->pfn, ref->flags);
ref->flags = 0;
trace_kvm_booke206_ref_release(tlbe->pfn, tlbe->flags);
tlbe->flags = 0;
}
}
@@ -284,11 +284,8 @@ static void clear_tlb_privs(struct kvmppc_vcpu_e500 *vcpu_e500)
int i;
for (tlbsel = 0; tlbsel <= 1; tlbsel++) {
for (i = 0; i < vcpu_e500->gtlb_params[tlbsel].entries; i++) {
struct tlbe_ref *ref =
&vcpu_e500->gtlb_priv[tlbsel][i].ref;
kvmppc_e500_ref_release(ref);
}
for (i = 0; i < vcpu_e500->gtlb_params[tlbsel].entries; i++)
kvmppc_e500_tlbe_release(&vcpu_e500->gtlb_priv[tlbsel][i]);
}
}
@@ -304,18 +301,18 @@ void kvmppc_core_flush_tlb(struct kvm_vcpu *vcpu)
static void kvmppc_e500_setup_stlbe(
struct kvm_vcpu *vcpu,
struct kvm_book3e_206_tlb_entry *gtlbe,
int tsize, struct tlbe_ref *ref, u64 gvaddr,
int tsize, struct tlbe_priv *tlbe, u64 gvaddr,
struct kvm_book3e_206_tlb_entry *stlbe)
{
kvm_pfn_t pfn = ref->pfn;
kvm_pfn_t pfn = tlbe->pfn;
u32 pr = vcpu->arch.shared->msr & MSR_PR;
bool writable = !!(ref->flags & E500_TLB_WRITABLE);
bool writable = !!(tlbe->flags & E500_TLB_WRITABLE);
BUG_ON(!(ref->flags & E500_TLB_VALID));
BUG_ON(!(tlbe->flags & E500_TLB_VALID));
/* Force IPROT=0 for all guest mappings. */
stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
stlbe->mas2 = (gvaddr & MAS2_EPN) | (ref->flags & E500_TLB_MAS2_ATTR);
stlbe->mas2 = (gvaddr & MAS2_EPN) | (tlbe->flags & E500_TLB_MAS2_ATTR);
stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe->mas7_3, writable, pr);
}
@@ -323,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
u64 gvaddr, gfn_t gfn, struct kvm_book3e_206_tlb_entry *gtlbe,
int tlbsel, struct kvm_book3e_206_tlb_entry *stlbe,
struct tlbe_ref *ref)
struct tlbe_priv *tlbe)
{
struct kvm_memory_slot *slot;
unsigned int psize;
@@ -455,9 +452,9 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
}
}
kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg, writable);
kvmppc_e500_tlbe_setup(tlbe, gtlbe, pfn, wimg, writable);
kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
ref, gvaddr, stlbe);
tlbe, gvaddr, stlbe);
writable = tlbe_is_writable(stlbe);
/* Clear i-cache for new pages */
@@ -474,17 +471,17 @@ static int kvmppc_e500_tlb0_map(struct kvmppc_vcpu_e500 *vcpu_e500, int esel,
struct kvm_book3e_206_tlb_entry *stlbe)
{
struct kvm_book3e_206_tlb_entry *gtlbe;
struct tlbe_ref *ref;
struct tlbe_priv *tlbe;
int stlbsel = 0;
int sesel = 0;
int r;
gtlbe = get_entry(vcpu_e500, 0, esel);
ref = &vcpu_e500->gtlb_priv[0][esel].ref;
tlbe = &vcpu_e500->gtlb_priv[0][esel];
r = kvmppc_e500_shadow_map(vcpu_e500, get_tlb_eaddr(gtlbe),
get_tlb_raddr(gtlbe) >> PAGE_SHIFT,
gtlbe, 0, stlbe, ref);
gtlbe, 0, stlbe, tlbe);
if (r)
return r;
@@ -494,7 +491,7 @@ static int kvmppc_e500_tlb0_map(struct kvmppc_vcpu_e500 *vcpu_e500, int esel,
}
static int kvmppc_e500_tlb1_map_tlb1(struct kvmppc_vcpu_e500 *vcpu_e500,
struct tlbe_ref *ref,
struct tlbe_priv *tlbe,
int esel)
{
unsigned int sesel = vcpu_e500->host_tlb1_nv++;
@@ -507,10 +504,10 @@ static int kvmppc_e500_tlb1_map_tlb1(struct kvmppc_vcpu_e500 *vcpu_e500,
vcpu_e500->g2h_tlb1_map[idx] &= ~(1ULL << sesel);
}
vcpu_e500->gtlb_priv[1][esel].ref.flags |= E500_TLB_BITMAP;
vcpu_e500->gtlb_priv[1][esel].flags |= E500_TLB_BITMAP;
vcpu_e500->g2h_tlb1_map[esel] |= (u64)1 << sesel;
vcpu_e500->h2g_tlb1_rmap[sesel] = esel + 1;
WARN_ON(!(ref->flags & E500_TLB_VALID));
WARN_ON(!(tlbe->flags & E500_TLB_VALID));
return sesel;
}
@@ -522,24 +519,24 @@ static int kvmppc_e500_tlb1_map(struct kvmppc_vcpu_e500 *vcpu_e500,
u64 gvaddr, gfn_t gfn, struct kvm_book3e_206_tlb_entry *gtlbe,
struct kvm_book3e_206_tlb_entry *stlbe, int esel)
{
struct tlbe_ref *ref = &vcpu_e500->gtlb_priv[1][esel].ref;
struct tlbe_priv *tlbe = &vcpu_e500->gtlb_priv[1][esel];
int sesel;
int r;
r = kvmppc_e500_shadow_map(vcpu_e500, gvaddr, gfn, gtlbe, 1, stlbe,
ref);
tlbe);
if (r)
return r;
/* Use TLB0 when we can only map a page with 4k */
if (get_tlb_tsize(stlbe) == BOOK3E_PAGESZ_4K) {
vcpu_e500->gtlb_priv[1][esel].ref.flags |= E500_TLB_TLB0;
vcpu_e500->gtlb_priv[1][esel].flags |= E500_TLB_TLB0;
write_stlbe(vcpu_e500, gtlbe, stlbe, 0, 0);
return 0;
}
/* Otherwise map into TLB1 */
sesel = kvmppc_e500_tlb1_map_tlb1(vcpu_e500, ref, esel);
sesel = kvmppc_e500_tlb1_map_tlb1(vcpu_e500, tlbe, esel);
write_stlbe(vcpu_e500, gtlbe, stlbe, 1, sesel);
return 0;
@@ -561,11 +558,11 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, gpa_t gpaddr,
priv = &vcpu_e500->gtlb_priv[tlbsel][esel];
/* Triggers after clear_tlb_privs or on initial mapping */
if (!(priv->ref.flags & E500_TLB_VALID)) {
if (!(priv->flags & E500_TLB_VALID)) {
kvmppc_e500_tlb0_map(vcpu_e500, esel, &stlbe);
} else {
kvmppc_e500_setup_stlbe(vcpu, gtlbe, BOOK3E_PAGESZ_4K,
&priv->ref, eaddr, &stlbe);
priv, eaddr, &stlbe);
write_stlbe(vcpu_e500, gtlbe, &stlbe, 0, 0);
}
break;

View File

@@ -13,6 +13,7 @@
#include <linux/irqchip/riscv-imsic.h>
#include <linux/irqdomain.h>
#include <linux/kvm_host.h>
#include <linux/nospec.h>
#include <linux/percpu.h>
#include <linux/spinlock.h>
#include <asm/cpufeature.h>
@@ -182,9 +183,14 @@ int kvm_riscv_vcpu_aia_get_csr(struct kvm_vcpu *vcpu,
unsigned long *out_val)
{
struct kvm_vcpu_aia_csr *csr = &vcpu->arch.aia_context.guest_csr;
unsigned long regs_max = sizeof(struct kvm_riscv_aia_csr) / sizeof(unsigned long);
if (reg_num >= sizeof(struct kvm_riscv_aia_csr) / sizeof(unsigned long))
if (!riscv_isa_extension_available(vcpu->arch.isa, SSAIA))
return -ENOENT;
if (reg_num >= regs_max)
return -ENOENT;
reg_num = array_index_nospec(reg_num, regs_max);
*out_val = 0;
if (kvm_riscv_aia_available())
@@ -198,9 +204,14 @@ int kvm_riscv_vcpu_aia_set_csr(struct kvm_vcpu *vcpu,
unsigned long val)
{
struct kvm_vcpu_aia_csr *csr = &vcpu->arch.aia_context.guest_csr;
unsigned long regs_max = sizeof(struct kvm_riscv_aia_csr) / sizeof(unsigned long);
if (reg_num >= sizeof(struct kvm_riscv_aia_csr) / sizeof(unsigned long))
if (!riscv_isa_extension_available(vcpu->arch.isa, SSAIA))
return -ENOENT;
if (reg_num >= regs_max)
return -ENOENT;
reg_num = array_index_nospec(reg_num, regs_max);
if (kvm_riscv_aia_available()) {
((unsigned long *)csr)[reg_num] = val;

View File

@@ -10,6 +10,7 @@
#include <linux/irqchip/riscv-aplic.h>
#include <linux/kvm_host.h>
#include <linux/math.h>
#include <linux/nospec.h>
#include <linux/spinlock.h>
#include <linux/swab.h>
#include <kvm/iodev.h>
@@ -45,7 +46,7 @@ static u32 aplic_read_sourcecfg(struct aplic *aplic, u32 irq)
if (!irq || aplic->nr_irqs <= irq)
return 0;
irqd = &aplic->irqs[irq];
irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
ret = irqd->sourcecfg;
@@ -61,7 +62,7 @@ static void aplic_write_sourcecfg(struct aplic *aplic, u32 irq, u32 val)
if (!irq || aplic->nr_irqs <= irq)
return;
irqd = &aplic->irqs[irq];
irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
if (val & APLIC_SOURCECFG_D)
val = 0;
@@ -81,7 +82,7 @@ static u32 aplic_read_target(struct aplic *aplic, u32 irq)
if (!irq || aplic->nr_irqs <= irq)
return 0;
irqd = &aplic->irqs[irq];
irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
ret = irqd->target;
@@ -97,7 +98,7 @@ static void aplic_write_target(struct aplic *aplic, u32 irq, u32 val)
if (!irq || aplic->nr_irqs <= irq)
return;
irqd = &aplic->irqs[irq];
irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
val &= APLIC_TARGET_EIID_MASK |
(APLIC_TARGET_HART_IDX_MASK << APLIC_TARGET_HART_IDX_SHIFT) |
@@ -116,7 +117,7 @@ static bool aplic_read_pending(struct aplic *aplic, u32 irq)
if (!irq || aplic->nr_irqs <= irq)
return false;
irqd = &aplic->irqs[irq];
irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
ret = (irqd->state & APLIC_IRQ_STATE_PENDING) ? true : false;
@@ -132,7 +133,7 @@ static void aplic_write_pending(struct aplic *aplic, u32 irq, bool pending)
if (!irq || aplic->nr_irqs <= irq)
return;
irqd = &aplic->irqs[irq];
irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
@@ -170,7 +171,7 @@ static bool aplic_read_enabled(struct aplic *aplic, u32 irq)
if (!irq || aplic->nr_irqs <= irq)
return false;
irqd = &aplic->irqs[irq];
irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
ret = (irqd->state & APLIC_IRQ_STATE_ENABLED) ? true : false;
@@ -186,7 +187,7 @@ static void aplic_write_enabled(struct aplic *aplic, u32 irq, bool enabled)
if (!irq || aplic->nr_irqs <= irq)
return;
irqd = &aplic->irqs[irq];
irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
if (enabled)
@@ -205,7 +206,7 @@ static bool aplic_read_input(struct aplic *aplic, u32 irq)
if (!irq || aplic->nr_irqs <= irq)
return false;
irqd = &aplic->irqs[irq];
irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
@@ -254,7 +255,7 @@ static void aplic_update_irq_range(struct kvm *kvm, u32 first, u32 last)
for (irq = first; irq <= last; irq++) {
if (!irq || aplic->nr_irqs <= irq)
continue;
irqd = &aplic->irqs[irq];
irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
@@ -283,7 +284,7 @@ int kvm_riscv_aia_aplic_inject(struct kvm *kvm, u32 source, bool level)
if (!aplic || !source || (aplic->nr_irqs <= source))
return -ENODEV;
irqd = &aplic->irqs[source];
irqd = &aplic->irqs[array_index_nospec(source, aplic->nr_irqs)];
ie = (aplic->domaincfg & APLIC_DOMAINCFG_IE) ? true : false;
raw_spin_lock_irqsave(&irqd->lock, flags);

View File

@@ -11,6 +11,7 @@
#include <linux/irqchip/riscv-imsic.h>
#include <linux/kvm_host.h>
#include <linux/uaccess.h>
#include <linux/cpufeature.h>
static int aia_create(struct kvm_device *dev, u32 type)
{
@@ -22,6 +23,9 @@ static int aia_create(struct kvm_device *dev, u32 type)
if (irqchip_in_kernel(kvm))
return -EEXIST;
if (!riscv_isa_extension_available(NULL, SSAIA))
return -ENODEV;
ret = -EBUSY;
if (kvm_trylock_all_vcpus(kvm))
return ret;
@@ -437,7 +441,7 @@ static int aia_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
static int aia_has_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
{
int nr_vcpus;
int nr_vcpus, r = -ENXIO;
switch (attr->group) {
case KVM_DEV_RISCV_AIA_GRP_CONFIG:
@@ -466,12 +470,18 @@ static int aia_has_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
}
break;
case KVM_DEV_RISCV_AIA_GRP_APLIC:
return kvm_riscv_aia_aplic_has_attr(dev->kvm, attr->attr);
mutex_lock(&dev->kvm->lock);
r = kvm_riscv_aia_aplic_has_attr(dev->kvm, attr->attr);
mutex_unlock(&dev->kvm->lock);
break;
case KVM_DEV_RISCV_AIA_GRP_IMSIC:
return kvm_riscv_aia_imsic_has_attr(dev->kvm, attr->attr);
mutex_lock(&dev->kvm->lock);
r = kvm_riscv_aia_imsic_has_attr(dev->kvm, attr->attr);
mutex_unlock(&dev->kvm->lock);
break;
}
return -ENXIO;
return r;
}
struct kvm_device_ops kvm_riscv_aia_device_ops = {

View File

@@ -908,6 +908,10 @@ int kvm_riscv_vcpu_aia_imsic_rmw(struct kvm_vcpu *vcpu, unsigned long isel,
int r, rc = KVM_INSN_CONTINUE_NEXT_SEPC;
struct imsic *imsic = vcpu->arch.aia_context.imsic_state;
/* If IMSIC vCPU state not initialized then forward to user space */
if (!imsic)
return KVM_INSN_EXIT_TO_USER_SPACE;
if (isel == KVM_RISCV_AIA_IMSIC_TOPEI) {
/* Read pending and enabled interrupt with highest priority */
topei = imsic_mrif_topei(imsic->swfile, imsic->nr_eix,

View File

@@ -245,6 +245,7 @@ out:
bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
{
struct kvm_gstage gstage;
bool mmu_locked;
if (!kvm->arch.pgd)
return false;
@@ -253,9 +254,12 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
gstage.flags = 0;
gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
gstage.pgd = kvm->arch.pgd;
mmu_locked = spin_trylock(&kvm->mmu_lock);
kvm_riscv_gstage_unmap_range(&gstage, range->start << PAGE_SHIFT,
(range->end - range->start) << PAGE_SHIFT,
range->may_block);
if (mmu_locked)
spin_unlock(&kvm->mmu_lock);
return false;
}
@@ -535,7 +539,7 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
goto out_unlock;
/* Check if we are backed by a THP and thus use block mapping if possible */
if (vma_pagesize == PAGE_SIZE)
if (!logging && (vma_pagesize == PAGE_SIZE))
vma_pagesize = transparent_hugepage_adjust(kvm, memslot, hva, &hfn, &gpa);
if (writable) {

View File

@@ -24,7 +24,7 @@
#define CREATE_TRACE_POINTS
#include "trace.h"
const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, ecall_exit_stat),
STATS_DESC_COUNTER(VCPU, wfi_exit_stat),

View File

@@ -10,6 +10,7 @@
#include <linux/errno.h>
#include <linux/err.h>
#include <linux/kvm_host.h>
#include <linux/nospec.h>
#include <linux/uaccess.h>
#include <asm/cpufeature.h>
@@ -93,9 +94,11 @@ int kvm_riscv_vcpu_get_reg_fp(struct kvm_vcpu *vcpu,
if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr))
reg_val = &cntx->fp.f.fcsr;
else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) &&
reg_num <= KVM_REG_RISCV_FP_F_REG(f[31]))
reg_num <= KVM_REG_RISCV_FP_F_REG(f[31])) {
reg_num = array_index_nospec(reg_num,
ARRAY_SIZE(cntx->fp.f.f));
reg_val = &cntx->fp.f.f[reg_num];
else
} else
return -ENOENT;
} else if ((rtype == KVM_REG_RISCV_FP_D) &&
riscv_isa_extension_available(vcpu->arch.isa, d)) {
@@ -107,6 +110,8 @@ int kvm_riscv_vcpu_get_reg_fp(struct kvm_vcpu *vcpu,
reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) {
if (KVM_REG_SIZE(reg->id) != sizeof(u64))
return -EINVAL;
reg_num = array_index_nospec(reg_num,
ARRAY_SIZE(cntx->fp.d.f));
reg_val = &cntx->fp.d.f[reg_num];
} else
return -ENOENT;
@@ -138,9 +143,11 @@ int kvm_riscv_vcpu_set_reg_fp(struct kvm_vcpu *vcpu,
if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr))
reg_val = &cntx->fp.f.fcsr;
else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) &&
reg_num <= KVM_REG_RISCV_FP_F_REG(f[31]))
reg_num <= KVM_REG_RISCV_FP_F_REG(f[31])) {
reg_num = array_index_nospec(reg_num,
ARRAY_SIZE(cntx->fp.f.f));
reg_val = &cntx->fp.f.f[reg_num];
else
} else
return -ENOENT;
} else if ((rtype == KVM_REG_RISCV_FP_D) &&
riscv_isa_extension_available(vcpu->arch.isa, d)) {
@@ -152,6 +159,8 @@ int kvm_riscv_vcpu_set_reg_fp(struct kvm_vcpu *vcpu,
reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) {
if (KVM_REG_SIZE(reg->id) != sizeof(u64))
return -EINVAL;
reg_num = array_index_nospec(reg_num,
ARRAY_SIZE(cntx->fp.d.f));
reg_val = &cntx->fp.d.f[reg_num];
} else
return -ENOENT;

View File

@@ -10,6 +10,7 @@
#include <linux/bitops.h>
#include <linux/errno.h>
#include <linux/err.h>
#include <linux/nospec.h>
#include <linux/uaccess.h>
#include <linux/kvm_host.h>
#include <asm/cacheflush.h>
@@ -127,6 +128,7 @@ static int kvm_riscv_vcpu_isa_check_host(unsigned long kvm_ext, unsigned long *g
kvm_ext >= ARRAY_SIZE(kvm_isa_ext_arr))
return -ENOENT;
kvm_ext = array_index_nospec(kvm_ext, ARRAY_SIZE(kvm_isa_ext_arr));
*guest_ext = kvm_isa_ext_arr[kvm_ext];
switch (*guest_ext) {
case RISCV_ISA_EXT_SMNPM:
@@ -443,13 +445,16 @@ static int kvm_riscv_vcpu_get_reg_core(struct kvm_vcpu *vcpu,
unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
KVM_REG_SIZE_MASK |
KVM_REG_RISCV_CORE);
unsigned long regs_max = sizeof(struct kvm_riscv_core) / sizeof(unsigned long);
unsigned long reg_val;
if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
return -EINVAL;
if (reg_num >= sizeof(struct kvm_riscv_core) / sizeof(unsigned long))
if (reg_num >= regs_max)
return -ENOENT;
reg_num = array_index_nospec(reg_num, regs_max);
if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc))
reg_val = cntx->sepc;
else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num &&
@@ -476,13 +481,16 @@ static int kvm_riscv_vcpu_set_reg_core(struct kvm_vcpu *vcpu,
unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
KVM_REG_SIZE_MASK |
KVM_REG_RISCV_CORE);
unsigned long regs_max = sizeof(struct kvm_riscv_core) / sizeof(unsigned long);
unsigned long reg_val;
if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
return -EINVAL;
if (reg_num >= sizeof(struct kvm_riscv_core) / sizeof(unsigned long))
if (reg_num >= regs_max)
return -ENOENT;
reg_num = array_index_nospec(reg_num, regs_max);
if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
return -EFAULT;
@@ -507,10 +515,13 @@ static int kvm_riscv_vcpu_general_get_csr(struct kvm_vcpu *vcpu,
unsigned long *out_val)
{
struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
unsigned long regs_max = sizeof(struct kvm_riscv_csr) / sizeof(unsigned long);
if (reg_num >= sizeof(struct kvm_riscv_csr) / sizeof(unsigned long))
if (reg_num >= regs_max)
return -ENOENT;
reg_num = array_index_nospec(reg_num, regs_max);
if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) {
kvm_riscv_vcpu_flush_interrupts(vcpu);
*out_val = (csr->hvip >> VSIP_TO_HVIP_SHIFT) & VSIP_VALID_MASK;
@@ -526,10 +537,13 @@ static int kvm_riscv_vcpu_general_set_csr(struct kvm_vcpu *vcpu,
unsigned long reg_val)
{
struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
unsigned long regs_max = sizeof(struct kvm_riscv_csr) / sizeof(unsigned long);
if (reg_num >= sizeof(struct kvm_riscv_csr) / sizeof(unsigned long))
if (reg_num >= regs_max)
return -ENOENT;
reg_num = array_index_nospec(reg_num, regs_max);
if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) {
reg_val &= VSIP_VALID_MASK;
reg_val <<= VSIP_TO_HVIP_SHIFT;
@@ -548,10 +562,15 @@ static inline int kvm_riscv_vcpu_smstateen_set_csr(struct kvm_vcpu *vcpu,
unsigned long reg_val)
{
struct kvm_vcpu_smstateen_csr *csr = &vcpu->arch.smstateen_csr;
unsigned long regs_max = sizeof(struct kvm_riscv_smstateen_csr) /
sizeof(unsigned long);
if (reg_num >= sizeof(struct kvm_riscv_smstateen_csr) /
sizeof(unsigned long))
return -EINVAL;
if (!riscv_isa_extension_available(vcpu->arch.isa, SMSTATEEN))
return -ENOENT;
if (reg_num >= regs_max)
return -ENOENT;
reg_num = array_index_nospec(reg_num, regs_max);
((unsigned long *)csr)[reg_num] = reg_val;
return 0;
@@ -562,10 +581,15 @@ static int kvm_riscv_vcpu_smstateen_get_csr(struct kvm_vcpu *vcpu,
unsigned long *out_val)
{
struct kvm_vcpu_smstateen_csr *csr = &vcpu->arch.smstateen_csr;
unsigned long regs_max = sizeof(struct kvm_riscv_smstateen_csr) /
sizeof(unsigned long);
if (reg_num >= sizeof(struct kvm_riscv_smstateen_csr) /
sizeof(unsigned long))
return -EINVAL;
if (!riscv_isa_extension_available(vcpu->arch.isa, SMSTATEEN))
return -ENOENT;
if (reg_num >= regs_max)
return -ENOENT;
reg_num = array_index_nospec(reg_num, regs_max);
*out_val = ((unsigned long *)csr)[reg_num];
return 0;
@@ -595,10 +619,7 @@ static int kvm_riscv_vcpu_get_reg_csr(struct kvm_vcpu *vcpu,
rc = kvm_riscv_vcpu_aia_get_csr(vcpu, reg_num, &reg_val);
break;
case KVM_REG_RISCV_CSR_SMSTATEEN:
rc = -EINVAL;
if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SMSTATEEN))
rc = kvm_riscv_vcpu_smstateen_get_csr(vcpu, reg_num,
&reg_val);
rc = kvm_riscv_vcpu_smstateen_get_csr(vcpu, reg_num, &reg_val);
break;
default:
rc = -ENOENT;
@@ -640,10 +661,7 @@ static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu *vcpu,
rc = kvm_riscv_vcpu_aia_set_csr(vcpu, reg_num, reg_val);
break;
case KVM_REG_RISCV_CSR_SMSTATEEN:
rc = -EINVAL;
if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SMSTATEEN))
rc = kvm_riscv_vcpu_smstateen_set_csr(vcpu, reg_num,
reg_val);
rc = kvm_riscv_vcpu_smstateen_set_csr(vcpu, reg_num, reg_val);
break;
default:
rc = -ENOENT;

View File

@@ -10,6 +10,7 @@
#include <linux/errno.h>
#include <linux/err.h>
#include <linux/kvm_host.h>
#include <linux/nospec.h>
#include <linux/perf/riscv_pmu.h>
#include <asm/csr.h>
#include <asm/kvm_vcpu_sbi.h>
@@ -87,7 +88,8 @@ static void kvm_pmu_release_perf_event(struct kvm_pmc *pmc)
static u64 kvm_pmu_get_perf_event_hw_config(u32 sbi_event_code)
{
return hw_event_perf_map[sbi_event_code];
return hw_event_perf_map[array_index_nospec(sbi_event_code,
SBI_PMU_HW_GENERAL_MAX)];
}
static u64 kvm_pmu_get_perf_event_cache_config(u32 sbi_event_code)
@@ -218,6 +220,7 @@ static int pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
return -EINVAL;
}
cidx = array_index_nospec(cidx, RISCV_KVM_MAX_COUNTERS);
pmc = &kvpmu->pmc[cidx];
if (pmc->cinfo.type != SBI_PMU_CTR_TYPE_FW)
@@ -244,6 +247,7 @@ static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
return -EINVAL;
}
cidx = array_index_nospec(cidx, RISCV_KVM_MAX_COUNTERS);
pmc = &kvpmu->pmc[cidx];
if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
@@ -520,11 +524,12 @@ int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
{
struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
if (cidx > RISCV_KVM_MAX_COUNTERS || cidx == 1) {
if (cidx >= RISCV_KVM_MAX_COUNTERS || cidx == 1) {
retdata->err_val = SBI_ERR_INVALID_PARAM;
return 0;
}
cidx = array_index_nospec(cidx, RISCV_KVM_MAX_COUNTERS);
retdata->out_val = kvpmu->pmc[cidx].cinfo.value;
return 0;
@@ -559,7 +564,8 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
}
/* Start the counters that have been configured and requested by the guest */
for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
pmc_index = i + ctr_base;
pmc_index = array_index_nospec(i + ctr_base,
RISCV_KVM_MAX_COUNTERS);
if (!test_bit(pmc_index, kvpmu->pmc_in_use))
continue;
/* The guest started the counter again. Reset the overflow status */
@@ -630,7 +636,8 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
/* Stop the counters that have been configured and requested by the guest */
for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
pmc_index = i + ctr_base;
pmc_index = array_index_nospec(i + ctr_base,
RISCV_KVM_MAX_COUNTERS);
if (!test_bit(pmc_index, kvpmu->pmc_in_use))
continue;
pmc = &kvpmu->pmc[pmc_index];
@@ -761,6 +768,7 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
}
}
ctr_idx = array_index_nospec(ctr_idx, RISCV_KVM_MAX_COUNTERS);
pmc = &kvpmu->pmc[ctr_idx];
pmc->idx = ctr_idx;

View File

@@ -13,7 +13,7 @@
#include <linux/kvm_host.h>
#include <asm/kvm_mmu.h>
const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS()
};
static_assert(ARRAY_SIZE(kvm_vm_stats_desc) ==

View File

@@ -65,7 +65,7 @@
#define VCPU_IRQS_MAX_BUF (sizeof(struct kvm_s390_irq) * \
(KVM_MAX_VCPUS + LOCAL_IRQS))
const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS(),
STATS_DESC_COUNTER(VM, inject_io),
STATS_DESC_COUNTER(VM, inject_float_mchk),
@@ -91,7 +91,7 @@ const struct kvm_stats_header kvm_vm_stats_header = {
sizeof(kvm_vm_stats_desc),
};
const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, exit_userspace),
STATS_DESC_COUNTER(VCPU, exit_null),

View File

@@ -2485,7 +2485,8 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
KVM_X86_QUIRK_SLOT_ZAP_ALL | \
KVM_X86_QUIRK_STUFF_FEATURE_MSRS | \
KVM_X86_QUIRK_IGNORE_GUEST_PAT)
KVM_X86_QUIRK_IGNORE_GUEST_PAT | \
KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM)
#define KVM_X86_CONDITIONAL_QUIRKS \
(KVM_X86_QUIRK_CD_NW_CLEARED | \

View File

@@ -476,6 +476,7 @@ struct kvm_sync_regs {
#define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7)
#define KVM_X86_QUIRK_STUFF_FEATURE_MSRS (1 << 8)
#define KVM_X86_QUIRK_IGNORE_GUEST_PAT (1 << 9)
#define KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM (1 << 10)
#define KVM_STATE_NESTED_FORMAT_VMX 0
#define KVM_STATE_NESTED_FORMAT_SVM 1

View File

@@ -776,7 +776,10 @@ do { \
#define SYNTHESIZED_F(name) \
({ \
kvm_cpu_cap_synthesized |= feature_bit(name); \
F(name); \
\
BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES); \
if (boot_cpu_has(X86_FEATURE_##name)) \
F(name); \
})
/*

View File

@@ -1981,16 +1981,17 @@ int kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
if (entries[i] == KVM_HV_TLB_FLUSHALL_ENTRY)
goto out_flush_all;
if (is_noncanonical_invlpg_address(entries[i], vcpu))
continue;
/*
* Lower 12 bits of 'address' encode the number of additional
* pages to flush.
*/
gva = entries[i] & PAGE_MASK;
for (j = 0; j < (entries[i] & ~PAGE_MASK) + 1; j++)
for (j = 0; j < (entries[i] & ~PAGE_MASK) + 1; j++) {
if (is_noncanonical_invlpg_address(gva + j * PAGE_SIZE, vcpu))
continue;
kvm_x86_call(flush_tlb_gva)(vcpu, gva + j * PAGE_SIZE);
}
++vcpu->stat.tlb_flush;
}

View File

@@ -321,7 +321,8 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin,
idx = srcu_read_lock(&kvm->irq_srcu);
gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin);
if (gsi != -1)
hlist_for_each_entry_rcu(kimn, &ioapic->mask_notifier_list, link)
hlist_for_each_entry_srcu(kimn, &ioapic->mask_notifier_list, link,
srcu_read_lock_held(&kvm->irq_srcu))
if (kimn->irq == gsi)
kimn->func(kimn, mask);
srcu_read_unlock(&kvm->irq_srcu, idx);

View File

@@ -189,12 +189,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
struct kvm_vcpu *vcpu = &svm->vcpu;
vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
vmcb->control.avic_physical_id |= avic_get_max_physical_id(vcpu);
vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
svm_clr_intercept(svm, INTERCEPT_CR8_WRITE);
/*
* Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
* accesses, while interrupt injection to a running vCPU can be
@@ -226,6 +226,9 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
if (!sev_es_guest(svm->vcpu.kvm))
svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
/*
* If running nested and the guest uses its own MSR bitmap, there
* is no need to update L0's msr bitmap
@@ -368,7 +371,7 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb)
vmcb->control.avic_physical_id = __sme_set(__pa(kvm_svm->avic_physical_id_table));
vmcb->control.avic_vapic_bar = APIC_DEFAULT_PHYS_BASE;
if (kvm_apicv_activated(svm->vcpu.kvm))
if (kvm_vcpu_apicv_active(&svm->vcpu))
avic_activate_vmcb(svm);
else
avic_deactivate_vmcb(svm);

View File

@@ -418,6 +418,15 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
return __nested_vmcb_check_controls(vcpu, ctl);
}
int nested_svm_check_cached_vmcb12(struct kvm_vcpu *vcpu)
{
if (!nested_vmcb_check_save(vcpu) ||
!nested_vmcb_check_controls(vcpu))
return -EINVAL;
return 0;
}
/*
* If a feature is not advertised to L1, clear the corresponding vmcb12
* intercept.
@@ -1028,8 +1037,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
if (!nested_vmcb_check_save(vcpu) ||
!nested_vmcb_check_controls(vcpu)) {
if (nested_svm_check_cached_vmcb12(vcpu) < 0) {
vmcb12->control.exit_code = SVM_EXIT_ERR;
vmcb12->control.exit_info_1 = 0;
vmcb12->control.exit_info_2 = 0;

View File

@@ -1077,8 +1077,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
svm_set_intercept(svm, INTERCEPT_CR0_WRITE);
svm_set_intercept(svm, INTERCEPT_CR3_WRITE);
svm_set_intercept(svm, INTERCEPT_CR4_WRITE);
if (!kvm_vcpu_apicv_active(vcpu))
svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
set_dr_intercepts(svm);
@@ -1189,7 +1188,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS))
svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP;
if (kvm_vcpu_apicv_active(vcpu))
if (enable_apicv && irqchip_in_kernel(vcpu->kvm))
avic_init_vmcb(svm, vmcb);
if (vnmi)
@@ -2674,9 +2673,11 @@ static int dr_interception(struct kvm_vcpu *vcpu)
static int cr8_write_interception(struct kvm_vcpu *vcpu)
{
u8 cr8_prev = kvm_get_cr8(vcpu);
int r;
u8 cr8_prev = kvm_get_cr8(vcpu);
WARN_ON_ONCE(kvm_vcpu_apicv_active(vcpu));
/* instruction emulation calls kvm_set_cr8() */
r = cr_interception(vcpu);
if (lapic_in_kernel(vcpu))
@@ -4879,11 +4880,15 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
vmcb12 = map.hva;
nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
ret = enter_svm_guest_mode(vcpu, smram64->svm_guest_vmcb_gpa, vmcb12, false);
if (ret)
if (nested_svm_check_cached_vmcb12(vcpu) < 0)
goto unmap_save;
if (enter_svm_guest_mode(vcpu, smram64->svm_guest_vmcb_gpa,
vmcb12, false) != 0)
goto unmap_save;
ret = 0;
svm->nested.nested_run_pending = 1;
unmap_save:

View File

@@ -797,6 +797,7 @@ static inline int nested_svm_simple_vmexit(struct vcpu_svm *svm, u32 exit_code)
int nested_svm_exit_handled(struct vcpu_svm *svm);
int nested_svm_check_permissions(struct kvm_vcpu *vcpu);
int nested_svm_check_cached_vmcb12(struct kvm_vcpu *vcpu);
int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
bool has_error_code, u32 error_code);
int nested_svm_exit_special(struct vcpu_svm *svm);

View File

@@ -3300,10 +3300,24 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
if (CC(vmcs12->guest_cr4 & X86_CR4_CET && !(vmcs12->guest_cr0 & X86_CR0_WP)))
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
(CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
return -EINVAL;
if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) {
u64 debugctl = vmcs12->guest_ia32_debugctl;
/*
* FREEZE_IN_SMM is not virtualized, but allow L1 to set it in
* vmcs12's DEBUGCTL under a quirk for backwards compatibility.
* Note that the quirk only relaxes the consistency check. The
* vmcc02 bit is still under the control of the host. In
* particular, if a host administrator decides to clear the bit,
* then L1 has no say in the matter.
*/
if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM))
debugctl &= ~DEBUGCTLMSR_FREEZE_IN_SMM;
if (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
CC(!vmx_is_valid_debugctl(vcpu, debugctl, false)))
return -EINVAL;
}
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
CC(!kvm_pat_valid(vmcs12->guest_ia32_pat)))
@@ -6842,13 +6856,34 @@ void vmx_leave_nested(struct kvm_vcpu *vcpu)
free_nested(vcpu);
}
int nested_vmx_check_restored_vmcs12(struct kvm_vcpu *vcpu)
{
enum vm_entry_failure_code ignored;
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
if (nested_cpu_has_shadow_vmcs(vmcs12) &&
vmcs12->vmcs_link_pointer != INVALID_GPA) {
struct vmcs12 *shadow_vmcs12 = get_shadow_vmcs12(vcpu);
if (shadow_vmcs12->hdr.revision_id != VMCS12_REVISION ||
!shadow_vmcs12->hdr.shadow_vmcs)
return -EINVAL;
}
if (nested_vmx_check_controls(vcpu, vmcs12) ||
nested_vmx_check_host_state(vcpu, vmcs12) ||
nested_vmx_check_guest_state(vcpu, vmcs12, &ignored))
return -EINVAL;
return 0;
}
static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
struct kvm_nested_state __user *user_kvm_nested_state,
struct kvm_nested_state *kvm_state)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct vmcs12 *vmcs12;
enum vm_entry_failure_code ignored;
struct kvm_vmx_nested_state_data __user *user_vmx_nested_state =
&user_kvm_nested_state->data.vmx[0];
int ret;
@@ -6979,25 +7014,20 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
vmx->nested.mtf_pending =
!!(kvm_state->flags & KVM_STATE_NESTED_MTF_PENDING);
ret = -EINVAL;
if (nested_cpu_has_shadow_vmcs(vmcs12) &&
vmcs12->vmcs_link_pointer != INVALID_GPA) {
struct vmcs12 *shadow_vmcs12 = get_shadow_vmcs12(vcpu);
ret = -EINVAL;
if (kvm_state->size <
sizeof(*kvm_state) +
sizeof(user_vmx_nested_state->vmcs12) + sizeof(*shadow_vmcs12))
goto error_guest_mode;
ret = -EFAULT;
if (copy_from_user(shadow_vmcs12,
user_vmx_nested_state->shadow_vmcs12,
sizeof(*shadow_vmcs12))) {
ret = -EFAULT;
goto error_guest_mode;
}
if (shadow_vmcs12->hdr.revision_id != VMCS12_REVISION ||
!shadow_vmcs12->hdr.shadow_vmcs)
sizeof(*shadow_vmcs12)))
goto error_guest_mode;
}
@@ -7008,9 +7038,8 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
kvm_state->hdr.vmx.preemption_timer_deadline;
}
if (nested_vmx_check_controls(vcpu, vmcs12) ||
nested_vmx_check_host_state(vcpu, vmcs12) ||
nested_vmx_check_guest_state(vcpu, vmcs12, &ignored))
ret = nested_vmx_check_restored_vmcs12(vcpu);
if (ret < 0)
goto error_guest_mode;
vmx->nested.dirty_vmcs12 = true;

View File

@@ -22,6 +22,7 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps);
void nested_vmx_hardware_unsetup(void);
__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
void nested_vmx_set_vmcs_shadowing_bitmap(void);
int nested_vmx_check_restored_vmcs12(struct kvm_vcpu *vcpu);
void nested_vmx_free_vcpu(struct kvm_vcpu *vcpu);
enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
bool from_vmentry);

View File

@@ -1149,7 +1149,7 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
}
vmx_add_auto_msr(&m->guest, msr, guest_val, VM_ENTRY_MSR_LOAD_COUNT, kvm);
vmx_add_auto_msr(&m->guest, msr, host_val, VM_EXIT_MSR_LOAD_COUNT, kvm);
vmx_add_auto_msr(&m->host, msr, host_val, VM_EXIT_MSR_LOAD_COUNT, kvm);
}
static bool update_transition_efer(struct vcpu_vmx *vmx)
@@ -8528,9 +8528,13 @@ int vmx_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
}
if (vmx->nested.smm.guest_mode) {
/* Triple fault if the state is invalid. */
if (nested_vmx_check_restored_vmcs12(vcpu) < 0)
return 1;
ret = nested_vmx_enter_non_root_mode(vcpu, false);
if (ret)
return ret;
if (ret != NVMX_VMENTRY_SUCCESS)
return 1;
vmx->nested.nested_run_pending = 1;
vmx->nested.smm.guest_mode = false;

View File

@@ -243,7 +243,7 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_ipiv);
bool __read_mostly enable_device_posted_irqs = true;
EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_device_posted_irqs);
const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS(),
STATS_DESC_COUNTER(VM, mmu_shadow_zapped),
STATS_DESC_COUNTER(VM, mmu_pte_write),
@@ -269,7 +269,7 @@ const struct kvm_stats_header kvm_vm_stats_header = {
sizeof(kvm_vm_stats_desc),
};
const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, pf_taken),
STATS_DESC_COUNTER(VCPU, pf_fixed),

View File

@@ -1940,56 +1940,43 @@ enum kvm_stat_kind {
struct kvm_stat_data {
struct kvm *kvm;
const struct _kvm_stats_desc *desc;
const struct kvm_stats_desc *desc;
enum kvm_stat_kind kind;
};
struct _kvm_stats_desc {
struct kvm_stats_desc desc;
char name[KVM_STATS_NAME_SIZE];
};
#define STATS_DESC_COMMON(type, unit, base, exp, sz, bsz) \
.flags = type | unit | base | \
BUILD_BUG_ON_ZERO(type & ~KVM_STATS_TYPE_MASK) | \
BUILD_BUG_ON_ZERO(unit & ~KVM_STATS_UNIT_MASK) | \
BUILD_BUG_ON_ZERO(base & ~KVM_STATS_BASE_MASK), \
.exponent = exp, \
.size = sz, \
#define STATS_DESC_COMMON(type, unit, base, exp, sz, bsz) \
.flags = type | unit | base | \
BUILD_BUG_ON_ZERO(type & ~KVM_STATS_TYPE_MASK) | \
BUILD_BUG_ON_ZERO(unit & ~KVM_STATS_UNIT_MASK) | \
BUILD_BUG_ON_ZERO(base & ~KVM_STATS_BASE_MASK), \
.exponent = exp, \
.size = sz, \
.bucket_size = bsz
#define VM_GENERIC_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \
{ \
{ \
STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \
.offset = offsetof(struct kvm_vm_stat, generic.stat) \
}, \
.name = #stat, \
}
#define VCPU_GENERIC_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \
{ \
{ \
STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \
.offset = offsetof(struct kvm_vcpu_stat, generic.stat) \
}, \
.name = #stat, \
}
#define VM_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \
{ \
{ \
STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \
.offset = offsetof(struct kvm_vm_stat, stat) \
}, \
.name = #stat, \
}
#define VCPU_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \
{ \
{ \
STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \
.offset = offsetof(struct kvm_vcpu_stat, stat) \
}, \
.name = #stat, \
}
#define VM_GENERIC_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \
{ \
STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \
.offset = offsetof(struct kvm_vm_stat, generic.stat), \
.name = #stat, \
}
#define VCPU_GENERIC_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \
{ \
STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \
.offset = offsetof(struct kvm_vcpu_stat, generic.stat), \
.name = #stat, \
}
#define VM_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \
{ \
STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \
.offset = offsetof(struct kvm_vm_stat, stat), \
.name = #stat, \
}
#define VCPU_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \
{ \
STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \
.offset = offsetof(struct kvm_vcpu_stat, stat), \
.name = #stat, \
}
/* SCOPE: VM, VM_GENERIC, VCPU, VCPU_GENERIC */
#define STATS_DESC(SCOPE, stat, type, unit, base, exp, sz, bsz) \
SCOPE##_STATS_DESC(stat, type, unit, base, exp, sz, bsz)
@@ -2066,7 +2053,7 @@ struct _kvm_stats_desc {
STATS_DESC_IBOOLEAN(VCPU_GENERIC, blocking)
ssize_t kvm_stats_read(char *id, const struct kvm_stats_header *header,
const struct _kvm_stats_desc *desc,
const struct kvm_stats_desc *desc,
void *stats, size_t size_stats,
char __user *user_buffer, size_t size, loff_t *offset);
@@ -2111,9 +2098,9 @@ static inline void kvm_stats_log_hist_update(u64 *data, size_t size, u64 value)
extern const struct kvm_stats_header kvm_vm_stats_header;
extern const struct _kvm_stats_desc kvm_vm_stats_desc[];
extern const struct kvm_stats_desc kvm_vm_stats_desc[];
extern const struct kvm_stats_header kvm_vcpu_stats_header;
extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[];
extern const struct kvm_stats_desc kvm_vcpu_stats_desc[];
static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq)
{

View File

@@ -14,6 +14,10 @@
#include <linux/ioctl.h>
#include <asm/kvm.h>
#ifdef __KERNEL__
#include <linux/kvm_types.h>
#endif
#define KVM_API_VERSION 12
/*
@@ -1601,7 +1605,11 @@ struct kvm_stats_desc {
__u16 size;
__u32 offset;
__u32 bucket_size;
#ifdef __KERNEL__
char name[KVM_STATS_NAME_SIZE];
#else
char name[];
#endif
};
#define KVM_GET_STATS_FD _IO(KVMIO, 0xce)

View File

@@ -71,6 +71,7 @@ TEST_GEN_PROGS_x86 += x86/cpuid_test
TEST_GEN_PROGS_x86 += x86/cr4_cpuid_sync_test
TEST_GEN_PROGS_x86 += x86/dirty_log_page_splitting_test
TEST_GEN_PROGS_x86 += x86/feature_msrs_test
TEST_GEN_PROGS_x86 += x86/evmcs_smm_controls_test
TEST_GEN_PROGS_x86 += x86/exit_on_emulation_failure_test
TEST_GEN_PROGS_x86 += x86/fastops_test
TEST_GEN_PROGS_x86 += x86/fix_hypercall_test

View File

@@ -80,7 +80,7 @@ static void test_mbind(int fd, size_t total_size)
{
const unsigned long nodemask_0 = 1; /* nid: 0 */
unsigned long nodemask = 0;
unsigned long maxnode = 8;
unsigned long maxnode = BITS_PER_TYPE(nodemask);
int policy;
char *mem;
int ret;

View File

@@ -557,6 +557,11 @@ static inline uint64_t get_cr0(void)
return cr0;
}
static inline void set_cr0(uint64_t val)
{
__asm__ __volatile__("mov %0, %%cr0" : : "r" (val) : "memory");
}
static inline uint64_t get_cr3(void)
{
uint64_t cr3;
@@ -566,6 +571,11 @@ static inline uint64_t get_cr3(void)
return cr3;
}
static inline void set_cr3(uint64_t val)
{
__asm__ __volatile__("mov %0, %%cr3" : : "r" (val) : "memory");
}
static inline uint64_t get_cr4(void)
{
uint64_t cr4;
@@ -580,6 +590,19 @@ static inline void set_cr4(uint64_t val)
__asm__ __volatile__("mov %0, %%cr4" : : "r" (val) : "memory");
}
static inline uint64_t get_cr8(void)
{
uint64_t cr8;
__asm__ __volatile__("mov %%cr8, %[cr8]" : [cr8]"=r"(cr8));
return cr8;
}
static inline void set_cr8(uint64_t val)
{
__asm__ __volatile__("mov %0, %%cr8" : : "r" (val) : "memory");
}
static inline void set_idt(const struct desc_ptr *idt_desc)
{
__asm__ __volatile__("lidt %0"::"m"(*idt_desc));

View File

@@ -0,0 +1,17 @@
// SPDX-License-Identifier: GPL-2.0-only
#ifndef SELFTEST_KVM_SMM_H
#define SELFTEST_KVM_SMM_H
#include "kvm_util.h"
#define SMRAM_SIZE 65536
#define SMRAM_MEMSLOT ((1 << 16) | 1)
#define SMRAM_PAGES (SMRAM_SIZE / PAGE_SIZE)
void setup_smram(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
uint64_t smram_gpa,
const void *smi_handler, size_t handler_size);
void inject_smi(struct kvm_vcpu *vcpu);
#endif /* SELFTEST_KVM_SMM_H */

View File

@@ -8,6 +8,7 @@
#include "kvm_util.h"
#include "pmu.h"
#include "processor.h"
#include "smm.h"
#include "svm_util.h"
#include "sev.h"
#include "vmx.h"
@@ -1444,3 +1445,28 @@ bool kvm_arch_has_default_irqchip(void)
{
return true;
}
void setup_smram(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
uint64_t smram_gpa,
const void *smi_handler, size_t handler_size)
{
vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, smram_gpa,
SMRAM_MEMSLOT, SMRAM_PAGES, 0);
TEST_ASSERT(vm_phy_pages_alloc(vm, SMRAM_PAGES, smram_gpa,
SMRAM_MEMSLOT) == smram_gpa,
"Could not allocate guest physical addresses for SMRAM");
memset(addr_gpa2hva(vm, smram_gpa), 0x0, SMRAM_SIZE);
memcpy(addr_gpa2hva(vm, smram_gpa) + 0x8000, smi_handler, handler_size);
vcpu_set_msr(vcpu, MSR_IA32_SMBASE, smram_gpa);
}
void inject_smi(struct kvm_vcpu *vcpu)
{
struct kvm_vcpu_events events;
vcpu_events_get(vcpu, &events);
events.smi.pending = 1;
events.flags |= KVM_VCPUEVENT_VALID_SMM;
vcpu_events_set(vcpu, &events);
}

View File

@@ -0,0 +1,150 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (C) 2026, Red Hat, Inc.
*
* Test that vmx_leave_smm() validates vmcs12 controls before re-entering
* nested guest mode on RSM.
*/
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include "test_util.h"
#include "kvm_util.h"
#include "smm.h"
#include "hyperv.h"
#include "vmx.h"
#define SMRAM_GPA 0x1000000
#define SMRAM_STAGE 0xfe
#define SYNC_PORT 0xe
#define STR(x) #x
#define XSTR(s) STR(s)
/*
* SMI handler: runs in real-address mode.
* Reports SMRAM_STAGE via port IO, then does RSM.
*/
static uint8_t smi_handler[] = {
0xb0, SMRAM_STAGE, /* mov $SMRAM_STAGE, %al */
0xe4, SYNC_PORT, /* in $SYNC_PORT, %al */
0x0f, 0xaa, /* rsm */
};
static inline void sync_with_host(uint64_t phase)
{
asm volatile("in $" XSTR(SYNC_PORT) ", %%al \n"
: "+a" (phase));
}
static void l2_guest_code(void)
{
sync_with_host(1);
/* After SMI+RSM with invalid controls, we should not reach here. */
vmcall();
}
static void guest_code(struct vmx_pages *vmx_pages,
struct hyperv_test_pages *hv_pages)
{
#define L2_GUEST_STACK_SIZE 64
unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
/* Set up Hyper-V enlightenments and eVMCS */
wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
enable_vp_assist(hv_pages->vp_assist_gpa, hv_pages->vp_assist);
evmcs_enable();
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_evmcs(hv_pages));
prepare_vmcs(vmx_pages, l2_guest_code,
&l2_guest_stack[L2_GUEST_STACK_SIZE]);
GUEST_ASSERT(!vmlaunch());
/* L2 exits via vmcall if test fails */
sync_with_host(2);
}
int main(int argc, char *argv[])
{
vm_vaddr_t vmx_pages_gva = 0, hv_pages_gva = 0;
struct hyperv_test_pages *hv;
struct hv_enlightened_vmcs *evmcs;
struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
struct kvm_regs regs;
int stage_reported;
TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));
TEST_REQUIRE(kvm_has_cap(KVM_CAP_NESTED_STATE));
TEST_REQUIRE(kvm_has_cap(KVM_CAP_HYPERV_ENLIGHTENED_VMCS));
TEST_REQUIRE(kvm_has_cap(KVM_CAP_X86_SMM));
vm = vm_create_with_one_vcpu(&vcpu, guest_code);
setup_smram(vm, vcpu, SMRAM_GPA, smi_handler, sizeof(smi_handler));
vcpu_set_hv_cpuid(vcpu);
vcpu_enable_evmcs(vcpu);
vcpu_alloc_vmx(vm, &vmx_pages_gva);
hv = vcpu_alloc_hyperv_test_pages(vm, &hv_pages_gva);
vcpu_args_set(vcpu, 2, vmx_pages_gva, hv_pages_gva);
vcpu_run(vcpu);
/* L2 is running and syncs with host. */
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
vcpu_regs_get(vcpu, &regs);
stage_reported = regs.rax & 0xff;
TEST_ASSERT(stage_reported == 1,
"Expected stage 1, got %d", stage_reported);
/* Inject SMI while L2 is running. */
inject_smi(vcpu);
vcpu_run(vcpu);
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
vcpu_regs_get(vcpu, &regs);
stage_reported = regs.rax & 0xff;
TEST_ASSERT(stage_reported == SMRAM_STAGE,
"Expected SMM handler stage %#x, got %#x",
SMRAM_STAGE, stage_reported);
/*
* Guest is now paused in the SMI handler, about to execute RSM.
* Hack the eVMCS page to set-up invalid pin-based execution
* control (PIN_BASED_VIRTUAL_NMIS without PIN_BASED_NMI_EXITING).
*/
evmcs = hv->enlightened_vmcs_hva;
evmcs->pin_based_vm_exec_control |= PIN_BASED_VIRTUAL_NMIS;
evmcs->hv_clean_fields = 0;
/*
* Trigger copy_enlightened_to_vmcs12() via KVM_GET_NESTED_STATE,
* copying the invalid pin_based_vm_exec_control into cached_vmcs12.
*/
union {
struct kvm_nested_state state;
char state_[16384];
} nested_state_buf;
memset(&nested_state_buf, 0, sizeof(nested_state_buf));
nested_state_buf.state.size = sizeof(nested_state_buf);
vcpu_nested_state_get(vcpu, &nested_state_buf.state);
/*
* Resume the guest. The SMI handler executes RSM, which calls
* vmx_leave_smm(). nested_vmx_check_controls() should detect
* VIRTUAL_NMIS without NMI_EXITING and cause a triple fault.
*/
vcpu_run(vcpu);
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_SHUTDOWN);
kvm_vm_free(vm);
return 0;
}

View File

@@ -13,6 +13,30 @@
#include "linux/psp-sev.h"
#include "sev.h"
static void guest_sev_test_msr(uint32_t msr)
{
uint64_t val = rdmsr(msr);
wrmsr(msr, val);
GUEST_ASSERT(val == rdmsr(msr));
}
#define guest_sev_test_reg(reg) \
do { \
uint64_t val = get_##reg(); \
\
set_##reg(val); \
GUEST_ASSERT(val == get_##reg()); \
} while (0)
static void guest_sev_test_regs(void)
{
guest_sev_test_msr(MSR_EFER);
guest_sev_test_reg(cr0);
guest_sev_test_reg(cr3);
guest_sev_test_reg(cr4);
guest_sev_test_reg(cr8);
}
#define XFEATURE_MASK_X87_AVX (XFEATURE_MASK_FP | XFEATURE_MASK_SSE | XFEATURE_MASK_YMM)
@@ -24,6 +48,8 @@ static void guest_snp_code(void)
GUEST_ASSERT(sev_msr & MSR_AMD64_SEV_ES_ENABLED);
GUEST_ASSERT(sev_msr & MSR_AMD64_SEV_SNP_ENABLED);
guest_sev_test_regs();
wrmsr(MSR_AMD64_SEV_ES_GHCB, GHCB_MSR_TERM_REQ);
vmgexit();
}
@@ -34,6 +60,8 @@ static void guest_sev_es_code(void)
GUEST_ASSERT(rdmsr(MSR_AMD64_SEV) & MSR_AMD64_SEV_ENABLED);
GUEST_ASSERT(rdmsr(MSR_AMD64_SEV) & MSR_AMD64_SEV_ES_ENABLED);
guest_sev_test_regs();
/*
* TODO: Add GHCB and ucall support for SEV-ES guests. For now, simply
* force "termination" to signal "done" via the GHCB MSR protocol.
@@ -47,6 +75,8 @@ static void guest_sev_code(void)
GUEST_ASSERT(this_cpu_has(X86_FEATURE_SEV));
GUEST_ASSERT(rdmsr(MSR_AMD64_SEV) & MSR_AMD64_SEV_ENABLED);
guest_sev_test_regs();
GUEST_DONE();
}

View File

@@ -14,13 +14,11 @@
#include "test_util.h"
#include "kvm_util.h"
#include "smm.h"
#include "vmx.h"
#include "svm_util.h"
#define SMRAM_SIZE 65536
#define SMRAM_MEMSLOT ((1 << 16) | 1)
#define SMRAM_PAGES (SMRAM_SIZE / PAGE_SIZE)
#define SMRAM_GPA 0x1000000
#define SMRAM_STAGE 0xfe
@@ -113,18 +111,6 @@ static void guest_code(void *arg)
sync_with_host(DONE);
}
void inject_smi(struct kvm_vcpu *vcpu)
{
struct kvm_vcpu_events events;
vcpu_events_get(vcpu, &events);
events.smi.pending = 1;
events.flags |= KVM_VCPUEVENT_VALID_SMM;
vcpu_events_set(vcpu, &events);
}
int main(int argc, char *argv[])
{
vm_vaddr_t nested_gva = 0;
@@ -140,16 +126,7 @@ int main(int argc, char *argv[])
/* Create VM */
vm = vm_create_with_one_vcpu(&vcpu, guest_code);
vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, SMRAM_GPA,
SMRAM_MEMSLOT, SMRAM_PAGES, 0);
TEST_ASSERT(vm_phy_pages_alloc(vm, SMRAM_PAGES, SMRAM_GPA, SMRAM_MEMSLOT)
== SMRAM_GPA, "could not allocate guest physical addresses?");
memset(addr_gpa2hva(vm, SMRAM_GPA), 0x0, SMRAM_SIZE);
memcpy(addr_gpa2hva(vm, SMRAM_GPA) + 0x8000, smi_handler,
sizeof(smi_handler));
vcpu_set_msr(vcpu, MSR_IA32_SMBASE, SMRAM_GPA);
setup_smram(vm, vcpu, SMRAM_GPA, smi_handler, sizeof(smi_handler));
if (kvm_has_cap(KVM_CAP_NESTED_STATE)) {
if (kvm_cpu_has(X86_FEATURE_SVM))

View File

@@ -50,7 +50,7 @@
* Return: the number of bytes that has been successfully read
*/
ssize_t kvm_stats_read(char *id, const struct kvm_stats_header *header,
const struct _kvm_stats_desc *desc,
const struct kvm_stats_desc *desc,
void *stats, size_t size_stats,
char __user *user_buffer, size_t size, loff_t *offset)
{

View File

@@ -973,9 +973,9 @@ static void kvm_free_memslots(struct kvm *kvm, struct kvm_memslots *slots)
kvm_free_memslot(kvm, memslot);
}
static umode_t kvm_stats_debugfs_mode(const struct _kvm_stats_desc *pdesc)
static umode_t kvm_stats_debugfs_mode(const struct kvm_stats_desc *desc)
{
switch (pdesc->desc.flags & KVM_STATS_TYPE_MASK) {
switch (desc->flags & KVM_STATS_TYPE_MASK) {
case KVM_STATS_TYPE_INSTANT:
return 0444;
case KVM_STATS_TYPE_CUMULATIVE:
@@ -1010,7 +1010,7 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname)
struct dentry *dent;
char dir_name[ITOA_MAX_LEN * 2];
struct kvm_stat_data *stat_data;
const struct _kvm_stats_desc *pdesc;
const struct kvm_stats_desc *pdesc;
int i, ret = -ENOMEM;
int kvm_debugfs_num_entries = kvm_vm_stats_header.num_desc +
kvm_vcpu_stats_header.num_desc;
@@ -6171,11 +6171,11 @@ static int kvm_stat_data_get(void *data, u64 *val)
switch (stat_data->kind) {
case KVM_STAT_VM:
r = kvm_get_stat_per_vm(stat_data->kvm,
stat_data->desc->desc.offset, val);
stat_data->desc->offset, val);
break;
case KVM_STAT_VCPU:
r = kvm_get_stat_per_vcpu(stat_data->kvm,
stat_data->desc->desc.offset, val);
stat_data->desc->offset, val);
break;
}
@@ -6193,11 +6193,11 @@ static int kvm_stat_data_clear(void *data, u64 val)
switch (stat_data->kind) {
case KVM_STAT_VM:
r = kvm_clear_stat_per_vm(stat_data->kvm,
stat_data->desc->desc.offset);
stat_data->desc->offset);
break;
case KVM_STAT_VCPU:
r = kvm_clear_stat_per_vcpu(stat_data->kvm,
stat_data->desc->desc.offset);
stat_data->desc->offset);
break;
}
@@ -6345,7 +6345,7 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
static void kvm_init_debug(void)
{
const struct file_operations *fops;
const struct _kvm_stats_desc *pdesc;
const struct kvm_stats_desc *pdesc;
int i;
kvm_debugfs_dir = debugfs_create_dir("kvm", NULL);
@@ -6358,7 +6358,7 @@ static void kvm_init_debug(void)
fops = &vm_stat_readonly_fops;
debugfs_create_file(pdesc->name, kvm_stats_debugfs_mode(pdesc),
kvm_debugfs_dir,
(void *)(long)pdesc->desc.offset, fops);
(void *)(long)pdesc->offset, fops);
}
for (i = 0; i < kvm_vcpu_stats_header.num_desc; ++i) {
@@ -6369,7 +6369,7 @@ static void kvm_init_debug(void)
fops = &vcpu_stat_readonly_fops;
debugfs_create_file(pdesc->name, kvm_stats_debugfs_mode(pdesc),
kvm_debugfs_dir,
(void *)(long)pdesc->desc.offset, fops);
(void *)(long)pdesc->offset, fops);
}
}