Commit Graph

124 Commits

Author SHA1 Message Date
Oliver Upton
e0b5a7967d KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available
Marc reports that the performance of running an L3 guest has regressed
by 60% as a result of setting MDCR_EL2.TDA to hide bad architecture.
That's of course terrible for the single user of recursive NV ;-)

While there's nothing to be done on non-FGT systems, take advantage of
the precise write trap of MDSCR_EL1 and leave the rest of the debug
registers untrapped.

Reported-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13 14:44:37 +01:00
Paolo Bonzini
924ebaefce Merge tag 'kvmarm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.18

- Add support for FF-A 1.2 as the secure memory conduit for pKVM,
  allowing more registers to be used as part of the message payload.

- Change the way pKVM allocates its VM handles, making sure that the
  privileged hypervisor is never tricked into using uninitialised
  data.

- Speed up MMIO range registration by avoiding unnecessary RCU
  synchronisation, which results in VMs starting much quicker.

- Add the dump of the instruction stream when panic-ing in the EL2
  payload, just like the rest of the kernel has always done. This will
  hopefully help debugging non-VHE setups.

- Add 52bit PA support to the stage-1 page-table walker, and make use
  of it to populate the fault level reported to the guest on failing
  to translate a stage-1 walk.

- Add NV support to the GICv3-on-GICv5 emulation code, ensuring
  feature parity for guests, irrespective of the host platform.

- Fix some really ugly architecture problems when dealing with debug
  in a nested VM. This has some bad performance impacts, but is at
  least correct.

- Add enough infrastructure to be able to disable EL2 features and
  give effective values to the EL2 control registers. This then allows
  a bunch of features to be turned off, which helps cross-host
  migration.

- Large rework of the selftest infrastructure to allow most tests to
  transparently run at EL2. This is the first step towards enabling
  NV testing.

- Various fixes and improvements all over the map, including one BE
  fix, just in time for the removal of the feature.
2025-09-30 13:23:28 -04:00
Paolo Bonzini
8cbb0df294 Merge tag 'kvmarm-fixes-6.17-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #3

 - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
   visiting from an MMU notifier

 - Fixes to the TLB match process and TLB invalidation range for
   managing the VCNR pseudo-TLB

 - Prevent SPE from erroneously profiling guests due to UNKNOWN reset
   values in PMSCR_EL1

 - Fix save/restore of host MDCR_EL2 to account for eagerly programming
   at vcpu_load() on VHE systems

 - Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios
   where an xarray's spinlock was nested with a *raw* spinlock

 - Permit stage-2 read permission aborts which are possible in the case
   of NV depending on the guest hypervisor's stage-2 translation

 - Call raw_spin_unlock() instead of the internal spinlock API

 - Fix parameter ordering when assigning VBAR_EL1

[Pull into kvm/master to fix conflicts. - Paolo]
2025-09-30 13:23:06 -04:00
Marc Zyngier
46bd74ef07 Merge branch kvm-arm64/el2-feature-control into kvmarm-master/next
* kvm-arm64/el2-feature-control: (23 commits)
  : .
  : General rework of EL2 features that can be disabled to satisfy
  : the requirement of migration between heterogeneous hosts:
  :
  : - Handle effective RES0 behaviour of undefined registers, making sure
  :   that disabling a feature affects full registeres, and not just
  :   individual control bits. (20250918151402.1665315-1-maz@kernel.org)
  :
  : - Allow ID_AA64MMFR1_EL1.{TWED,HCX} to be disabled from userspace.
  :   (20250911114621.3724469-1-yangjinqian1@huawei.com)
  :
  : - Turn the NV feature management into a deny-list, and expose
  :   missing features to EL2 guests.
  :   (20250912212258.407350-1-oliver.upton@linux.dev)
  : .
  KVM: arm64: nv: Expose up to FEAT_Debugv8p8 to NV-enabled VMs
  KVM: arm64: nv: Advertise FEAT_TIDCP1 to NV-enabled VMs
  KVM: arm64: nv: Advertise FEAT_SpecSEI to NV-enabled VMs
  KVM: arm64: nv: Expose FEAT_TWED to NV-enabled VMs
  KVM: arm64: nv: Exclude guest's TWED configuration when TWE isn't set
  KVM: arm64: nv: Expose FEAT_AFP to NV-enabled VMs
  KVM: arm64: nv: Expose FEAT_ECBHB to NV-enabled VMs
  KVM: arm64: nv: Expose FEAT_RASv1p1 via RAS_frac
  KVM: arm64: nv: Expose FEAT_DF2 to NV-enabled VMs
  KVM: arm64: nv: Don't erroneously claim FEAT_DoubleLock for NV VMs
  KVM: arm64: nv: Convert masks to denylists in limit_nv_id_reg()
  KVM: arm64: selftests: Test writes to ID_AA64MMFR1_EL1.{HCX, TWED}
  KVM: arm64: Make ID_AA64MMFR1_EL1.{HCX, TWED} writable from userspace
  KVM: arm64: Convert MDCR_EL2 RES0 handling to compute_reg_res0_bits()
  KVM: arm64: Convert SCTLR_EL1 RES0 handling to compute_reg_res0_bits()
  KVM: arm64: Enforce absence of FEAT_TCR2 on TCR2_EL2
  KVM: arm64: Enforce absence of FEAT_SCTLR2 on SCTLR2_EL{1,2}
  KVM: arm64: Convert HCR_EL2 RES0 handling to compute_reg_res0_bits()
  KVM: arm64: Enforce absence of FEAT_HCX on HCRX_EL2
  KVM: arm64: Enforce absence of FEAT_FGT2 on FGT2 registers
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-20 12:26:18 +01:00
Marc Zyngier
f01c7baa16 Merge branch kvm-arm64/nv-debug into kvmarm-master/next
* kvm-arm64/nv-debug:
  : .
  : Fix handling of MDSCR_EL1 in NV context, which is unfortunately
  : mishandled by the architecture. Patches courtesy of Oliver Upton
  : (20250917203125.283116-2-oliver.upton@linux.dev)
  : .
  KVM: arm64: nv: Apply guest's MDCR traps in nested context
  KVM: arm64: nv: Trap debug registers when in hyp context

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-20 12:26:11 +01:00
Marc Zyngier
23cf13def0 KVM: arm64: Account for 52bit when computing maximum OA
Adjust the computation of the max OA to account for 52bit PAs.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-20 11:05:12 +01:00
Oliver Upton
b8b1d62f17 KVM: arm64: nv: Expose up to FEAT_Debugv8p8 to NV-enabled VMs
The changes to the debug architecture up to v8.8 are concerned with
external debug, which of course has no direct impact on VMs. Raise the
feature limit and document what's preventing us from raising it further.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19 14:01:35 +01:00
Oliver Upton
6f2224ef07 KVM: arm64: nv: Advertise FEAT_TIDCP1 to NV-enabled VMs
While KVM does not expose IMPDEF features to VMs, FEAT_TIDCP1 is an
architecturally-defined EL1 trap of a particular sysreg encoding range.
Furthermore, KVM already advertises this feature to non-NV VMs.

As there is no interaction with EL2 traps, expose the feature.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19 14:01:35 +01:00
Oliver Upton
fe2c9cd439 KVM: arm64: nv: Advertise FEAT_SpecSEI to NV-enabled VMs
FEAT_SpecSEI is an informational feature describing whether speculative
loads may generate SErrors. Since there are already cases where KVM
reinjects an SError into the VM it is already possible this may happen
due to a speculative load within the VM.

Stop hiding the feature from NV-enabled VMs.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19 14:01:35 +01:00
Oliver Upton
952387c9d3 KVM: arm64: nv: Expose FEAT_TWED to NV-enabled VMs
KVM now handles HCR_EL2.{TWEDEn,TWEDEL} correctly when computing the
effective HCR for a nested context. Advertise the feature.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19 14:01:35 +01:00
Oliver Upton
09dc6b42c6 KVM: arm64: nv: Expose FEAT_AFP to NV-enabled VMs
FEAT_AFP doesn't intersect with any EL2 trap behavior, expose to
NV-enabled VMs.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19 14:01:35 +01:00
Oliver Upton
7cbdb25bed KVM: arm64: nv: Expose FEAT_ECBHB to NV-enabled VMs
The exact wording of the restrictions on branch prediction due to
FEAT_ECBHB in DDI0487L.b is as follows:

  When FEAT_ECBHB is implemented, the branch history information created
  in a context before an exception to a higher Exception level using
  AArch64 cannot be used by code before that exception to exploitatively
  control the execution of any indirect branches in code in a different
  context after the exception.

While vEL2 and EL1 are multiplexed at EL1, they exist in different
hardware-described contexts as KVM uses different stage-2 MMUs to
represent the corresponding translation regimes. Additionally, exception
entries into vEL2 always imply a hardware exception entry into literal EL2
for the emulated regime change.

Given all of this, and the fact that FEAT_ECBHB places no limitation on
the EL of the protected context after the exception, we can claim
FEAT_ECBHB on supporting hardware.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19 14:01:35 +01:00
Oliver Upton
26785cf28b KVM: arm64: nv: Expose FEAT_RASv1p1 via RAS_frac
KVM already supports FEAT_RASv1p1 for NV-enabled VMs but only when
advertised through the canonical field. Stop masking the silly frac
field to expose the feature on systems without FEAT_DF.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19 14:01:35 +01:00
Oliver Upton
fac4ee7abe KVM: arm64: nv: Expose FEAT_DF2 to NV-enabled VMs
The supporting infrastructure in KVM's abort injection code was merged a
while ago, but the author (me!) forgot to relax the NV limitation when
FEAT_DF2 got exposed to non-NV VMs. Fix it.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19 14:01:35 +01:00
Oliver Upton
49da9872a6 KVM: arm64: nv: Don't erroneously claim FEAT_DoubleLock for NV VMs
ID_AA64DFR0_EL1.DoubleLock is one of those annoying signed feature
fields where a non-negative value implies that a feature is implemented
and a negative value implies that it is not. While the intention of
masking this field was likely to hide the feature, KVM actually
advertises it, even on unsupporting hardware.

Remove FEAT_DoubleLock from the mask, making the NI value visible to the
VM. Take care to accept the old, incorrect values for this field as
we've lied to userspace.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19 14:01:35 +01:00
Oliver Upton
d3c35b7c57 KVM: arm64: nv: Convert masks to denylists in limit_nv_id_reg()
Consistently use denylisting of features such that the limitations of
KVM's nested implementation are explicitly documented (rather than
implied).

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19 14:01:35 +01:00
Oliver Upton
3af1105c4f KVM: arm64: nv: Apply guest's MDCR traps in nested context
KVM needs to ensure the guest hypervisor's traps take effect when the
vCPU is in a nested context. While supporting infrastructure is in place
for most of the EL2 trap registers, MDCR_EL2 is not.

Fold the guest's trap configuration into the effective MDCR_EL2. Apply
it directly to the in-memory representation as it gets recomputed on
every vcpu_load() anyway.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-18 16:46:20 +01:00
Oliver Upton
4a68408842 KVM: arm64: nv: Trap debug registers when in hyp context
In case you haven't realized it yet, the architecture is _slightly_
broken in the context of nested virt. Here we have another example of
FEAT_NV2 redirecting a sysreg (MDSCR_EL1) to memory that actually
affects execution at vEL2.

Fortunately, MDCR_EL2.TDA provides the necessary traps to hide this
mess at the expense of unnecessarily trapping the breakpoint/watchpoint
registers. Yes, FEAT_FGT gives us a precise trap but let's just opt for
obvious correctness to start.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-18 16:46:20 +01:00
Dongha Lee
ebb2d8fd81 KVM: arm64: nv: Fix incorrect VNCR invalidation range calculation
The code for invalidating VNCR entries in both kvm_invalidate_vncr_ipa()
and invalidate_vncr_va() incorrectly uses a bitwise AND with `(size - 1)`
instead of `~(size - 1)` to align the start address. This results
in masking the address bits instead of aligning them down to the start
of the block.

This bug may cause stale VNCR TLB entries to remain valid even after a
TLBI or MMU notifier, leading to incorrect memory translation and
unexpected guest behavior.

Credit to Team 0xB6 in bob14: DongHa Lee, Gyujeong Jin, Daehyeon Ko,
Geonha Lee, Hyungyu Oh, and Jaewon Yang.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Dongha Lee <p@sswd.pw>
Link: https://lore.kernel.org/r/20250906040724.72960-1-p@sswd.pw
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-09-10 02:56:20 -07:00
Geonha Lee
860b21c31d KVM: arm64: nv: fix VNCR TLB ASID match logic for non-Global entries
kvm_vncr_tlb_lookup() is supposed to return true when the cached VNCR
TLB entry is valid for the current context. For non-Global entries, that
means the entry’s ASID must match the current ASID.

The current code returns true when the ASIDs do *not* match, which
inverts the logic. This is a potential vulnerability:

- Valid entries are ignored and we fall back to kvm_translate_vncr(),
  hurting performance.
- Mismatched entries are treated as permission faults (-EPERM) instead
  of triggering a fresh translation.
- This can also cause stale translations to be (wrongly) considered
  valid across address spaces.

Flip the predicate so non-Global entries only hit when ASIDs match.

Special credit to Team 0xB6 for reporting: DongHa Lee, Gyujeong Jin,
Daehyeon Ko, Geonha Lee, Hyungyu Oh, and Jaewon Yang.

Signed-off-by: Geonha Lee <w1nsom3gna@korea.ac.kr>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250903150421.90752-1-w1nsom3gna@korea.ac.kr
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-09-10 02:56:19 -07:00
Paolo Bonzini
42a0305ab1 Merge tag 'kvmarm-fixes-6.17-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, take #2

 - Correctly handle 'invariant' system registers for protected VMs

 - Improved handling of VNCR data aborts, including external aborts

 - Fixes for handling of FEAT_RAS for NV guests, providing a sane
   fault context during SEA injection and preventing the use of
   RASv1p1 fault injection hardware

 - Ensure that page table destruction when a VM is destroyed gives an
   opportunity to reschedule

 - Large fix to KVM's infrastructure for managing guest context loaded
   on the CPU, addressing issues where the output of AT emulation
   doesn't get reflected to the guest

 - Fix AT S12 emulation to actually perform stage-2 translation when
   necessary

 - Avoid attempting vLPI irqbypass when GICv4 has been explicitly
   disabled for a VM

 - Minor KVM + selftest fixes
2025-08-29 12:57:31 -04:00
Fuad Tabba
f4e740309e KVM: arm64: nv: Handle VNCR_EL2-triggered faults backed by guest_memfd
Handle faults for memslots backed by guest_memfd in arm64 nested
virtualization triggered by VNCR_EL2.

* Introduce is_gmem output parameter to kvm_translate_vncr(), indicating
  whether the faulted memory slot is backed by guest_memfd.

* Dispatch faults backed by guest_memfd to kvm_gmem_get_pfn().

* Update kvm_handle_vncr_abort() to handle potential guest_memfd errors.
  Some of the guest_memfd errors need to be handled by userspace instead
  of attempting to (implicitly) retry by returning to the guest.

Suggested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-20-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-27 04:36:50 -04:00
Oliver Upton
69f8fe955d KVM: arm64: nv: Handle SEAs due to VNCR redirection
System register accesses redirected to the VNCR page can also generate
external aborts just like any other form of memory access. Route to
kvm_handle_guest_sea() for potential APEI handling, falling back to a
vSError if the kernel didn't handle the abort.

Take the opportunity to throw out the useless kvm_ras.h which provided a
helper with a single callsite...

Cc: Jiaqi Yan <jiaqiyan@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250729182342.3281742-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-04 22:16:10 -07:00
Marc Zyngier
07f557f60a KVM: arm64: nv: Properly check ESR_EL2.VNCR on taking a VNCR_EL2 related fault
Instead of checking for the ESR_EL2.VNCR bit being set (the only case
we should be here), we are actually testing random bits in ESR_EL2.DFSC.

13 obviously being a lucky number, it matches both permission and
translation fault status codes, which explains why we never saw it
failing. This was found by inspection, while reviewing a vaguely
related patch.

Whilst we're at it, turn the BUG_ON() into a WARN_ON_ONCE(), as
exploding here is just silly.

Fixes: 069a05e535 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250730101828.1168707-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-04 22:15:29 -07:00
Paolo Bonzini
314b40b3b6 Merge tag 'kvmarm-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #1

 - Host driver for GICv5, the next generation interrupt controller for
   arm64, including support for interrupt routing, MSIs, interrupt
   translation and wired interrupts.

 - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
   GICv5 hardware, leveraging the legacy VGIC interface.

 - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
   userspace to disable support for SGIs w/o an active state on hardware
   that previously advertised it unconditionally.

 - Map supporting endpoints with cacheable memory attributes on systems
   with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
   maintenance on the address range.

 - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
   hypervisor to inject external aborts into an L2 VM and take traps of
   masked external aborts to the hypervisor.

 - Convert more system register sanitization to the config-driven
   implementation.

 - Fixes to the visibility of EL2 registers, namely making VGICv3 system
   registers accessible through the VGIC device instead of the ONE_REG
   vCPU ioctls.

 - Various cleanups and minor fixes.
2025-07-29 12:27:40 -04:00
Oliver Upton
d9b9fa2c32 Merge branch 'kvm-arm64/config-masks' into kvmarm/next
* kvm-arm64/config-masks:
  : More config-driven mask computation, courtesy of Marc Zyngier
  :
  : Converts more system registers to the config-driven computation of RESx
  : masks based on the advertised feature set
  KVM: arm64: Tighten the definition of FEAT_PMUv3p9
  KVM: arm64: Convert MDCR_EL2 to config-driven sanitisation
  KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation
  KVM: arm64: Convert TCR2_EL2 to config-driven sanitisation
  arm64: sysreg: Add THE/ASID2 controls to TCR2_ELx

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-28 08:03:08 -07:00
Marc Zyngier
cd64587f10 KVM: arm64: Convert MDCR_EL2 to config-driven sanitisation
As for other registers, convert the determination of the RES0 bits
affecting MDCR_EL2 to be driven by a table extracted from the 2025-06
JSON drop

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:39:42 -07:00
Marc Zyngier
6bd4a274b0 KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation
As for other registers, convert the determination of the RES0 bits
affecting SCTLR_EL1 to be driven by a table extracted from the 2025-06
JSON drop

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:39:42 -07:00
Marc Zyngier
001e032c0f KVM: arm64: Convert TCR2_EL2 to config-driven sanitisation
As for other registers, convert the determination of the RES0 bits
affecting TCR2_EL2 to be driven by a table extracted from the 2025-06
JSON drop.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:39:42 -07:00
Oliver Upton
abc693fef3 KVM: arm64: Describe SCTLR2_ELx RESx masks
External abort injection will soon rely on a sanitised view of
SCTLR2_ELx to determine exception routing. Compute the RESx masks.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-15-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 11:36:35 -07:00
Oliver Upton
a99456abd8 KVM: arm64: nv: Advertise support for FEAT_RAS
Now that the missing bits for vSError injection/deferral have been added
we can merrily claim support for FEAT_RAS.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-10-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 11:36:34 -07:00
Oliver Upton
77ee70a073 KVM: arm64: nv: Honor SError exception routing / masking
To date KVM has used HCR_EL2.VSE to track the state of a pending SError
for the guest. With this bit set, hardware respects the EL1 exception
routing / masking rules and injects the vSError when appropriate.

This isn't correct for NV guests as hardware is oblivious to vEL2's
intentions for SErrors. Better yet, with FEAT_NV2 the guest can change
the routing behind our back as HCR_EL2 is redirected to memory. Cope
with this mess by:

 - Using a flag (instead of HCR_EL2.VSE) to track the pending SError
   state when SErrors are unconditionally masked for the current context

 - Resampling the routing / masking of a pending SError on every guest
   entry/exit

 - Emulating exception entry when SError routing implies a translation
   regime change

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-7-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 11:36:31 -07:00
Marc Zyngier
105485a182 KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes
Booting an EL2 guest on a system only supporting a subset of the
possible page sizes leads to interesting situations.

For example, on a system that only supports 4kB and 64kB, and is
booted with a 4kB kernel, we end-up advertising 16kB support at
stage-2, which is pretty weird.

That's because we consider that any S2 bigger than our base granule
is fair game, irrespective of what the HW actually supports. While this
is not impossible to support (KVM would happily handle it), it is likely
to be confusing for the guest.

Add new checks that will verify that this granule size is actually
supported before publishing it to the guest.

Fixes: e7ef6ed458 ("KVM: arm64: Enforce NV limits on a per-idregs basis")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-03 10:39:24 +01:00
Marc Zyngier
8800b7c4bb KVM: arm64: Add RMW specific sysreg accessor
In a number of cases, we perform a Read-Modify-Write operation on
a system register, meaning that we would apply the RESx masks twice.

Instead, provide a new accessor that performs this RMW operation,
allowing the masks to be applied exactly once per operation.

Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250603070824.1192795-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05 14:18:01 +01:00
Marc Zyngier
6673047405 KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation
When handling a TLBI VA* instruction that potentially targets a
VNCR page mapping, we fail to mask out the top bits that contain
the ASID and TTL fields, hence potentially failing the VA check
in the TLB code.

An additional wrinkle is that we fail to sign extend the VA,
again leading to failed VA checks.

Fix both in one go by sign-extending the VA from bit 48, making
it comparable to the way we interpret VNCR_EL2.BADDR.

Fixes: 4ffa72ad8f ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2")
Link: https://lore.kernel.org/r/20250525175759.780891-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30 09:09:16 +01:00
Marc Zyngier
7f3225fe8b Merge branch kvm-arm64/nv-nv into kvmarm-master/next
* kvm-arm64/nv-nv:
  : .
  : Flick the switch on the NV support by adding the missing piece
  : in the form of the VNCR page management. From the cover letter:
  :
  : "This is probably the most interesting bit of the whole NV adventure.
  : So far, everything else has been a walk in the park, but this one is
  : where the real fun takes place.
  :
  : With FEAT_NV2, most of the NV support revolves around tricking a guest
  : into accessing memory while it tries to access system registers. The
  : hypervisor's job is to handle the context switch of the actual
  : registers with the state in memory as needed."
  : .
  KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
  KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
  KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
  KVM: arm64: Document NV caps and vcpu flags
  KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*
  KVM: arm64: nv: Remove dead code from ERET handling
  KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch
  KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2
  KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address
  KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers
  KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2
  KVM: arm64: nv: Handle VNCR_EL2-triggered faults
  KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2
  KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2
  KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting
  KVM: arm64: nv: Move TLBI range decoding to a helper
  KVM: arm64: nv: Snapshot S1 ASID tagging information during walk
  KVM: arm64: nv: Extract translation helper from the AT code
  KVM: arm64: nv: Allocate VNCR page when required
  arm64: sysreg: Add layout for VNCR_EL2

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23 10:58:57 +01:00
Marc Zyngier
538fbac740 KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
The conversion to kvm_release_faultin_page() missed the requirement
for this to be called within a critical section with mmu_lock held
for write. Move this call up to satisfy this requirement.

Fixes: 069a05e535 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults")
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21 11:40:12 +01:00
Marc Zyngier
beab7d0583 KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
Calling invalidate_vncr_va() without the mmu_lock held for write
is a bad idea, and lockdep tells you about that.

Fixes: 4ffa72ad8f ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2")
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21 11:40:12 +01:00
Marc Zyngier
d43548f422 KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
When translating a VNCR translation fault, we start by marking the
current SW-managed TLB as invalid, so that we can populate it
in place. This is, however, done without the mmu_lock held.

A consequence of this is that another CPU dealing with TLBI
emulation can observe a translation still flagged as valid, but
with invalid walk results (such as pgshift being 0). Bad things
can result from this, such as a BUG() in pgshift_level_to_ttl().

Fix it by taking the mmu_lock for write to perform this local
invalidation, and use invalidate_vncr() instead of open-coding
the write to the 'valid' flag.

Fixes: 069a05e535 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250520144116.3667978-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21 09:53:08 +01:00
Marc Zyngier
4bc0fe0898 KVM: arm64: Add sanitisation for FEAT_FGT2 registers
Just like the FEAT_FGT registers, treat the FGT2 variant the same
way. THis is a large  update, but a fairly mechanical one.

The config dependencies are extracted from the 2025-03 JSON drop.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 11:36:10 +01:00
Marc Zyngier
b2a324ff01 KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits
Similarly to other registers, describe which HCR_EL2 bit depends
on which feature, and use this to compute the RES0 status of these
bits.

An additional complexity stems from the status of some bits such
as E2H and RW, which do not had a RESx status, but still take
a fixed value due to implementation choices in KVM.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 11:35:30 +01:00
Marc Zyngier
beed444841 KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits
Similarly to other registers, describe which HCR_EL2 bit depends
on which feature, and use this to compute the RES0 status of these
bits.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 11:35:30 +01:00
Marc Zyngier
c6cbe6a4c1 KVM: arm64: Use FGT feature maps to drive RES0 bits
Another benefit of mapping bits to features is that it becomes trivial
to define which bits should be handled as RES0.

Let's apply this principle to the guest's view of the FGT registers.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 11:35:00 +01:00
Marc Zyngier
4ffa72ad8f KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2
A TLBI by VA for S1 must take effect on our pseudo-TLB for VNCR
and potentially knock the fixmap mapping. Even worse, that TLBI
must be able to work cross-vcpu.

For that, we track on a per-VM basis if any VNCR is mapped, using
an atomic counter. Whenever a TLBI S1E2 occurs and that this counter
is non-zero, we take the long road all the way back to the core code.

There, we iterate over all vcpus and check whether this particular
invalidation has any damaging effect. If it does, we nuke the pseudo
TLB and the corresponding fixmap.

Yes, this is costly.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-14-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 08:01:19 +01:00
Marc Zyngier
7270cc9157 KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers
During an invalidation triggered by an MMU notifier, we need to
make sure we can drop the *host* mapping that would have been
translated by the stage-2 mapping being invalidated.

For the moment, the invalidation is pretty brutal, as we nuke
the full IPA range, and therefore any VNCR_EL2 mapping.

At some point, we'll be more light-weight, and the code is able
to deal with something more targetted.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-12-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 08:01:19 +01:00
Marc Zyngier
2a359e0725 KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2
Now that we can handle faults triggered through VNCR_EL2, we need
to map the corresponding page at EL2. But where, you'll ask?

Since each CPU in the system can run a vcpu, we need a per-CPU
mapping. For that, we carve a NR_CPUS range in the fixmap, giving
us a per-CPU va at which to map the guest's VNCR's page.

The mapping occurs both on vcpu load and on the back of a fault,
both generating a request that will take care of the mapping.
That mapping will also get dropped on vcpu put.

Yes, this is a bit heavy handed, but it is simple. Eventually,
we may want to have a per-VM, per-CPU mapping, which would avoid
all the TLBI overhead.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-11-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 08:01:19 +01:00
Marc Zyngier
069a05e535 KVM: arm64: nv: Handle VNCR_EL2-triggered faults
As VNCR_EL2.BADDR contains a VA, it is bound to trigger faults.

These faults can have multiple source:

- We haven't mapped anything on the host: we need to compute the
  resulting translation, populate a TLB, and eventually map
  the corresponding page

- The permissions are out of whack: we need to tell the guest about
  this state of affairs

Note that the kernel doesn't support S1POE for itself yet, so
the particular case of a VNCR page mapped with no permissions
or with write-only permissions is not correctly handled yet.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-10-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 08:01:19 +01:00
Marc Zyngier
6fb75733f1 KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2
Plug VNCR_EL2 in the vcpu_sysreg enum, define its RES0/RES1 bits,
and make it accessible to userspace when the VM is configured to
support FEAT_NV2.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-9-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 07:59:46 +01:00
Marc Zyngier
ea8d3cf46d KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2
FEAT_NV2 introduces an interesting problem for NV, as VNCR_EL2.BADDR
is a virtual address in the EL2&0 (or EL2, but we thankfully ignore
this) translation regime.

As we need to replicate such mapping in the real EL2, it means that
we need to remember that there is such a translation, and that any
TLBI affecting EL2 can possibly affect this translation.

It also means that any invalidation driven by an MMU notifier must
be able to shoot down any such mapping.

All in all, we need a data structure that represents this mapping,
and that is extremely close to a TLB. Given that we can only use
one of those per vcpu at any given time, we only allocate one.

No effort is made to keep that structure small. If we need to
start caching multiple of them, we may want to revisit that design
point. But for now, it is kept simple so that we can reason about it.

Oh, and add a braindump of how things are supposed to work, because
I will definitely page this out at some point. Yes, pun intended.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-8-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 07:59:46 +01:00
Marc Zyngier
469c4713d4 KVM: arm64: nv: Allocate VNCR page when required
If running a NV guest on an ARMv8.4-NV capable system, let's
allocate an additional page that will be used by the hypervisor
to fulfill system register accesses.

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 07:59:46 +01:00