linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-03-23 16:06:50 +08:00

Author	SHA1	Message	Date
Oliver Upton	e0b5a7967d	KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available Marc reports that the performance of running an L3 guest has regressed by 60% as a result of setting MDCR_EL2.TDA to hide bad architecture. That's of course terrible for the single user of recursive NV ;-) While there's nothing to be done on non-FGT systems, take advantage of the precise write trap of MDSCR_EL1 and leave the rest of the debug registers untrapped. Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:44:37 +01:00
Paolo Bonzini	924ebaefce	Merge tag 'kvmarm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.18 - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature.	2025-09-30 13:23:28 -04:00
Paolo Bonzini	8cbb0df294	Merge tag 'kvmarm-fixes-6.17-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, round #3 - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when visiting from an MMU notifier - Fixes to the TLB match process and TLB invalidation range for managing the VCNR pseudo-TLB - Prevent SPE from erroneously profiling guests due to UNKNOWN reset values in PMSCR_EL1 - Fix save/restore of host MDCR_EL2 to account for eagerly programming at vcpu_load() on VHE systems - Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios where an xarray's spinlock was nested with a raw spinlock - Permit stage-2 read permission aborts which are possible in the case of NV depending on the guest hypervisor's stage-2 translation - Call raw_spin_unlock() instead of the internal spinlock API - Fix parameter ordering when assigning VBAR_EL1 [Pull into kvm/master to fix conflicts. - Paolo]	2025-09-30 13:23:06 -04:00
Marc Zyngier	46bd74ef07	Merge branch kvm-arm64/el2-feature-control into kvmarm-master/next * kvm-arm64/el2-feature-control: (23 commits) : . : General rework of EL2 features that can be disabled to satisfy : the requirement of migration between heterogeneous hosts: : : - Handle effective RES0 behaviour of undefined registers, making sure : that disabling a feature affects full registeres, and not just : individual control bits. (20250918151402.1665315-1-maz@kernel.org) : : - Allow ID_AA64MMFR1_EL1.{TWED,HCX} to be disabled from userspace. : (20250911114621.3724469-1-yangjinqian1@huawei.com) : : - Turn the NV feature management into a deny-list, and expose : missing features to EL2 guests. : (20250912212258.407350-1-oliver.upton@linux.dev) : . KVM: arm64: nv: Expose up to FEAT_Debugv8p8 to NV-enabled VMs KVM: arm64: nv: Advertise FEAT_TIDCP1 to NV-enabled VMs KVM: arm64: nv: Advertise FEAT_SpecSEI to NV-enabled VMs KVM: arm64: nv: Expose FEAT_TWED to NV-enabled VMs KVM: arm64: nv: Exclude guest's TWED configuration when TWE isn't set KVM: arm64: nv: Expose FEAT_AFP to NV-enabled VMs KVM: arm64: nv: Expose FEAT_ECBHB to NV-enabled VMs KVM: arm64: nv: Expose FEAT_RASv1p1 via RAS_frac KVM: arm64: nv: Expose FEAT_DF2 to NV-enabled VMs KVM: arm64: nv: Don't erroneously claim FEAT_DoubleLock for NV VMs KVM: arm64: nv: Convert masks to denylists in limit_nv_id_reg() KVM: arm64: selftests: Test writes to ID_AA64MMFR1_EL1.{HCX, TWED} KVM: arm64: Make ID_AA64MMFR1_EL1.{HCX, TWED} writable from userspace KVM: arm64: Convert MDCR_EL2 RES0 handling to compute_reg_res0_bits() KVM: arm64: Convert SCTLR_EL1 RES0 handling to compute_reg_res0_bits() KVM: arm64: Enforce absence of FEAT_TCR2 on TCR2_EL2 KVM: arm64: Enforce absence of FEAT_SCTLR2 on SCTLR2_EL{1,2} KVM: arm64: Convert HCR_EL2 RES0 handling to compute_reg_res0_bits() KVM: arm64: Enforce absence of FEAT_HCX on HCRX_EL2 KVM: arm64: Enforce absence of FEAT_FGT2 on FGT2 registers ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-20 12:26:18 +01:00
Marc Zyngier	f01c7baa16	Merge branch kvm-arm64/nv-debug into kvmarm-master/next * kvm-arm64/nv-debug: : . : Fix handling of MDSCR_EL1 in NV context, which is unfortunately : mishandled by the architecture. Patches courtesy of Oliver Upton : (20250917203125.283116-2-oliver.upton@linux.dev) : . KVM: arm64: nv: Apply guest's MDCR traps in nested context KVM: arm64: nv: Trap debug registers when in hyp context Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-20 12:26:11 +01:00
Marc Zyngier	23cf13def0	KVM: arm64: Account for 52bit when computing maximum OA Adjust the computation of the max OA to account for 52bit PAs. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-20 11:05:12 +01:00
Oliver Upton	b8b1d62f17	KVM: arm64: nv: Expose up to FEAT_Debugv8p8 to NV-enabled VMs The changes to the debug architecture up to v8.8 are concerned with external debug, which of course has no direct impact on VMs. Raise the feature limit and document what's preventing us from raising it further. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 14:01:35 +01:00
Oliver Upton	6f2224ef07	KVM: arm64: nv: Advertise FEAT_TIDCP1 to NV-enabled VMs While KVM does not expose IMPDEF features to VMs, FEAT_TIDCP1 is an architecturally-defined EL1 trap of a particular sysreg encoding range. Furthermore, KVM already advertises this feature to non-NV VMs. As there is no interaction with EL2 traps, expose the feature. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 14:01:35 +01:00
Oliver Upton	fe2c9cd439	KVM: arm64: nv: Advertise FEAT_SpecSEI to NV-enabled VMs FEAT_SpecSEI is an informational feature describing whether speculative loads may generate SErrors. Since there are already cases where KVM reinjects an SError into the VM it is already possible this may happen due to a speculative load within the VM. Stop hiding the feature from NV-enabled VMs. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 14:01:35 +01:00
Oliver Upton	952387c9d3	KVM: arm64: nv: Expose FEAT_TWED to NV-enabled VMs KVM now handles HCR_EL2.{TWEDEn,TWEDEL} correctly when computing the effective HCR for a nested context. Advertise the feature. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 14:01:35 +01:00
Oliver Upton	09dc6b42c6	KVM: arm64: nv: Expose FEAT_AFP to NV-enabled VMs FEAT_AFP doesn't intersect with any EL2 trap behavior, expose to NV-enabled VMs. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 14:01:35 +01:00
Oliver Upton	7cbdb25bed	KVM: arm64: nv: Expose FEAT_ECBHB to NV-enabled VMs The exact wording of the restrictions on branch prediction due to FEAT_ECBHB in DDI0487L.b is as follows: When FEAT_ECBHB is implemented, the branch history information created in a context before an exception to a higher Exception level using AArch64 cannot be used by code before that exception to exploitatively control the execution of any indirect branches in code in a different context after the exception. While vEL2 and EL1 are multiplexed at EL1, they exist in different hardware-described contexts as KVM uses different stage-2 MMUs to represent the corresponding translation regimes. Additionally, exception entries into vEL2 always imply a hardware exception entry into literal EL2 for the emulated regime change. Given all of this, and the fact that FEAT_ECBHB places no limitation on the EL of the protected context after the exception, we can claim FEAT_ECBHB on supporting hardware. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 14:01:35 +01:00
Oliver Upton	26785cf28b	KVM: arm64: nv: Expose FEAT_RASv1p1 via RAS_frac KVM already supports FEAT_RASv1p1 for NV-enabled VMs but only when advertised through the canonical field. Stop masking the silly frac field to expose the feature on systems without FEAT_DF. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 14:01:35 +01:00
Oliver Upton	fac4ee7abe	KVM: arm64: nv: Expose FEAT_DF2 to NV-enabled VMs The supporting infrastructure in KVM's abort injection code was merged a while ago, but the author (me!) forgot to relax the NV limitation when FEAT_DF2 got exposed to non-NV VMs. Fix it. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 14:01:35 +01:00
Oliver Upton	49da9872a6	KVM: arm64: nv: Don't erroneously claim FEAT_DoubleLock for NV VMs ID_AA64DFR0_EL1.DoubleLock is one of those annoying signed feature fields where a non-negative value implies that a feature is implemented and a negative value implies that it is not. While the intention of masking this field was likely to hide the feature, KVM actually advertises it, even on unsupporting hardware. Remove FEAT_DoubleLock from the mask, making the NI value visible to the VM. Take care to accept the old, incorrect values for this field as we've lied to userspace. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 14:01:35 +01:00
Oliver Upton	d3c35b7c57	KVM: arm64: nv: Convert masks to denylists in limit_nv_id_reg() Consistently use denylisting of features such that the limitations of KVM's nested implementation are explicitly documented (rather than implied). Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 14:01:35 +01:00
Oliver Upton	3af1105c4f	KVM: arm64: nv: Apply guest's MDCR traps in nested context KVM needs to ensure the guest hypervisor's traps take effect when the vCPU is in a nested context. While supporting infrastructure is in place for most of the EL2 trap registers, MDCR_EL2 is not. Fold the guest's trap configuration into the effective MDCR_EL2. Apply it directly to the in-memory representation as it gets recomputed on every vcpu_load() anyway. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-18 16:46:20 +01:00
Oliver Upton	4a68408842	KVM: arm64: nv: Trap debug registers when in hyp context In case you haven't realized it yet, the architecture is _slightly_ broken in the context of nested virt. Here we have another example of FEAT_NV2 redirecting a sysreg (MDSCR_EL1) to memory that actually affects execution at vEL2. Fortunately, MDCR_EL2.TDA provides the necessary traps to hide this mess at the expense of unnecessarily trapping the breakpoint/watchpoint registers. Yes, FEAT_FGT gives us a precise trap but let's just opt for obvious correctness to start. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-18 16:46:20 +01:00
Dongha Lee	ebb2d8fd81	KVM: arm64: nv: Fix incorrect VNCR invalidation range calculation The code for invalidating VNCR entries in both kvm_invalidate_vncr_ipa() and invalidate_vncr_va() incorrectly uses a bitwise AND with `(size - 1)` instead of `~(size - 1)` to align the start address. This results in masking the address bits instead of aligning them down to the start of the block. This bug may cause stale VNCR TLB entries to remain valid even after a TLBI or MMU notifier, leading to incorrect memory translation and unexpected guest behavior. Credit to Team 0xB6 in bob14: DongHa Lee, Gyujeong Jin, Daehyeon Ko, Geonha Lee, Hyungyu Oh, and Jaewon Yang. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Dongha Lee <p@sswd.pw> Link: https://lore.kernel.org/r/20250906040724.72960-1-p@sswd.pw Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-09-10 02:56:20 -07:00
Geonha Lee	860b21c31d	KVM: arm64: nv: fix VNCR TLB ASID match logic for non-Global entries kvm_vncr_tlb_lookup() is supposed to return true when the cached VNCR TLB entry is valid for the current context. For non-Global entries, that means the entry’s ASID must match the current ASID. The current code returns true when the ASIDs do not match, which inverts the logic. This is a potential vulnerability: - Valid entries are ignored and we fall back to kvm_translate_vncr(), hurting performance. - Mismatched entries are treated as permission faults (-EPERM) instead of triggering a fresh translation. - This can also cause stale translations to be (wrongly) considered valid across address spaces. Flip the predicate so non-Global entries only hit when ASIDs match. Special credit to Team 0xB6 for reporting: DongHa Lee, Gyujeong Jin, Daehyeon Ko, Geonha Lee, Hyungyu Oh, and Jaewon Yang. Signed-off-by: Geonha Lee <w1nsom3gna@korea.ac.kr> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250903150421.90752-1-w1nsom3gna@korea.ac.kr Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-09-10 02:56:19 -07:00
Paolo Bonzini	42a0305ab1	Merge tag 'kvmarm-fixes-6.17-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, take #2 - Correctly handle 'invariant' system registers for protected VMs - Improved handling of VNCR data aborts, including external aborts - Fixes for handling of FEAT_RAS for NV guests, providing a sane fault context during SEA injection and preventing the use of RASv1p1 fault injection hardware - Ensure that page table destruction when a VM is destroyed gives an opportunity to reschedule - Large fix to KVM's infrastructure for managing guest context loaded on the CPU, addressing issues where the output of AT emulation doesn't get reflected to the guest - Fix AT S12 emulation to actually perform stage-2 translation when necessary - Avoid attempting vLPI irqbypass when GICv4 has been explicitly disabled for a VM - Minor KVM + selftest fixes	2025-08-29 12:57:31 -04:00
Fuad Tabba	f4e740309e	KVM: arm64: nv: Handle VNCR_EL2-triggered faults backed by guest_memfd Handle faults for memslots backed by guest_memfd in arm64 nested virtualization triggered by VNCR_EL2. * Introduce is_gmem output parameter to kvm_translate_vncr(), indicating whether the faulted memory slot is backed by guest_memfd. * Dispatch faults backed by guest_memfd to kvm_gmem_get_pfn(). * Update kvm_handle_vncr_abort() to handle potential guest_memfd errors. Some of the guest_memfd errors need to be handled by userspace instead of attempting to (implicitly) retry by returning to the guest. Suggested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250729225455.670324-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-08-27 04:36:50 -04:00
Oliver Upton	69f8fe955d	KVM: arm64: nv: Handle SEAs due to VNCR redirection System register accesses redirected to the VNCR page can also generate external aborts just like any other form of memory access. Route to kvm_handle_guest_sea() for potential APEI handling, falling back to a vSError if the kernel didn't handle the abort. Take the opportunity to throw out the useless kvm_ras.h which provided a helper with a single callsite... Cc: Jiaqi Yan <jiaqiyan@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250729182342.3281742-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-04 22:16:10 -07:00
Marc Zyngier	07f557f60a	KVM: arm64: nv: Properly check ESR_EL2.VNCR on taking a VNCR_EL2 related fault Instead of checking for the ESR_EL2.VNCR bit being set (the only case we should be here), we are actually testing random bits in ESR_EL2.DFSC. 13 obviously being a lucky number, it matches both permission and translation fault status codes, which explains why we never saw it failing. This was found by inspection, while reviewing a vaguely related patch. Whilst we're at it, turn the BUG_ON() into a WARN_ON_ONCE(), as exploding here is just silly. Fixes: `069a05e535` ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults") Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250730101828.1168707-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-04 22:15:29 -07:00
Paolo Bonzini	314b40b3b6	Merge tag 'kvmarm-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, round #1 - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts. - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface. - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally. - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range. - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor. - Convert more system register sanitization to the config-driven implementation. - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls. - Various cleanups and minor fixes.	2025-07-29 12:27:40 -04:00
Oliver Upton	d9b9fa2c32	Merge branch 'kvm-arm64/config-masks' into kvmarm/next * kvm-arm64/config-masks: : More config-driven mask computation, courtesy of Marc Zyngier : : Converts more system registers to the config-driven computation of RESx : masks based on the advertised feature set KVM: arm64: Tighten the definition of FEAT_PMUv3p9 KVM: arm64: Convert MDCR_EL2 to config-driven sanitisation KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation KVM: arm64: Convert TCR2_EL2 to config-driven sanitisation arm64: sysreg: Add THE/ASID2 controls to TCR2_ELx Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-28 08:03:08 -07:00
Marc Zyngier	cd64587f10	KVM: arm64: Convert MDCR_EL2 to config-driven sanitisation As for other registers, convert the determination of the RES0 bits affecting MDCR_EL2 to be driven by a table extracted from the 2025-06 JSON drop Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250714115503.3334242-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-15 20:39:42 -07:00
Marc Zyngier	6bd4a274b0	KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation As for other registers, convert the determination of the RES0 bits affecting SCTLR_EL1 to be driven by a table extracted from the 2025-06 JSON drop Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250714115503.3334242-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-15 20:39:42 -07:00
Marc Zyngier	001e032c0f	KVM: arm64: Convert TCR2_EL2 to config-driven sanitisation As for other registers, convert the determination of the RES0 bits affecting TCR2_EL2 to be driven by a table extracted from the 2025-06 JSON drop. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250714115503.3334242-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-15 20:39:42 -07:00
Oliver Upton	abc693fef3	KVM: arm64: Describe SCTLR2_ELx RESx masks External abort injection will soon rely on a sanitised view of SCTLR2_ELx to determine exception routing. Compute the RESx masks. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-15-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-08 11:36:35 -07:00
Oliver Upton	a99456abd8	KVM: arm64: nv: Advertise support for FEAT_RAS Now that the missing bits for vSError injection/deferral have been added we can merrily claim support for FEAT_RAS. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-10-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-08 11:36:34 -07:00
Oliver Upton	77ee70a073	KVM: arm64: nv: Honor SError exception routing / masking To date KVM has used HCR_EL2.VSE to track the state of a pending SError for the guest. With this bit set, hardware respects the EL1 exception routing / masking rules and injects the vSError when appropriate. This isn't correct for NV guests as hardware is oblivious to vEL2's intentions for SErrors. Better yet, with FEAT_NV2 the guest can change the routing behind our back as HCR_EL2 is redirected to memory. Cope with this mess by: - Using a flag (instead of HCR_EL2.VSE) to track the pending SError state when SErrors are unconditionally masked for the current context - Resampling the routing / masking of a pending SError on every guest entry/exit - Emulating exception entry when SError routing implies a translation regime change Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-08 11:36:31 -07:00
Marc Zyngier	105485a182	KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes Booting an EL2 guest on a system only supporting a subset of the possible page sizes leads to interesting situations. For example, on a system that only supports 4kB and 64kB, and is booted with a 4kB kernel, we end-up advertising 16kB support at stage-2, which is pretty weird. That's because we consider that any S2 bigger than our base granule is fair game, irrespective of what the HW actually supports. While this is not impossible to support (KVM would happily handle it), it is likely to be confusing for the guest. Add new checks that will verify that this granule size is actually supported before publishing it to the guest. Fixes: `e7ef6ed458` ("KVM: arm64: Enforce NV limits on a per-idregs basis") Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-07-03 10:39:24 +01:00
Marc Zyngier	8800b7c4bb	KVM: arm64: Add RMW specific sysreg accessor In a number of cases, we perform a Read-Modify-Write operation on a system register, meaning that we would apply the RESx masks twice. Instead, provide a new accessor that performs this RMW operation, allowing the masks to be applied exactly once per operation. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-06-05 14:18:01 +01:00
Marc Zyngier	6673047405	KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation When handling a TLBI VA* instruction that potentially targets a VNCR page mapping, we fail to mask out the top bits that contain the ASID and TTL fields, hence potentially failing the VA check in the TLB code. An additional wrinkle is that we fail to sign extend the VA, again leading to failed VA checks. Fix both in one go by sign-extending the VA from bit 48, making it comparable to the way we interpret VNCR_EL2.BADDR. Fixes: `4ffa72ad8f` ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2") Link: https://lore.kernel.org/r/20250525175759.780891-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-30 09:09:16 +01:00
Marc Zyngier	7f3225fe8b	Merge branch kvm-arm64/nv-nv into kvmarm-master/next * kvm-arm64/nv-nv: : . : Flick the switch on the NV support by adding the missing piece : in the form of the VNCR page management. From the cover letter: : : "This is probably the most interesting bit of the whole NV adventure. : So far, everything else has been a walk in the park, but this one is : where the real fun takes place. : : With FEAT_NV2, most of the NV support revolves around tricking a guest : into accessing memory while it tries to access system registers. The : hypervisor's job is to handle the context switch of the actual : registers with the state in memory as needed." : . KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating KVM: arm64: Document NV caps and vcpu flags KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2* KVM: arm64: nv: Remove dead code from ERET handling KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2 KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2 KVM: arm64: nv: Handle VNCR_EL2-triggered faults KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2 KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting KVM: arm64: nv: Move TLBI range decoding to a helper KVM: arm64: nv: Snapshot S1 ASID tagging information during walk KVM: arm64: nv: Extract translation helper from the AT code KVM: arm64: nv: Allocate VNCR page when required arm64: sysreg: Add layout for VNCR_EL2 Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-23 10:58:57 +01:00
Marc Zyngier	538fbac740	KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section The conversion to kvm_release_faultin_page() missed the requirement for this to be called within a critical section with mmu_lock held for write. Move this call up to satisfy this requirement. Fixes: `069a05e535` ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults") Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-21 11:40:12 +01:00
Marc Zyngier	beab7d0583	KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held Calling invalidate_vncr_va() without the mmu_lock held for write is a bad idea, and lockdep tells you about that. Fixes: `4ffa72ad8f` ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2") Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-21 11:40:12 +01:00
Marc Zyngier	d43548f422	KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating When translating a VNCR translation fault, we start by marking the current SW-managed TLB as invalid, so that we can populate it in place. This is, however, done without the mmu_lock held. A consequence of this is that another CPU dealing with TLBI emulation can observe a translation still flagged as valid, but with invalid walk results (such as pgshift being 0). Bad things can result from this, such as a BUG() in pgshift_level_to_ttl(). Fix it by taking the mmu_lock for write to perform this local invalidation, and use invalidate_vncr() instead of open-coding the write to the 'valid' flag. Fixes: `069a05e535` ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults") Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250520144116.3667978-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-21 09:53:08 +01:00
Marc Zyngier	4bc0fe0898	KVM: arm64: Add sanitisation for FEAT_FGT2 registers Just like the FEAT_FGT registers, treat the FGT2 variant the same way. THis is a large update, but a fairly mechanical one. The config dependencies are extracted from the 2025-03 JSON drop. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 11:36:10 +01:00
Marc Zyngier	b2a324ff01	KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits Similarly to other registers, describe which HCR_EL2 bit depends on which feature, and use this to compute the RES0 status of these bits. An additional complexity stems from the status of some bits such as E2H and RW, which do not had a RESx status, but still take a fixed value due to implementation choices in KVM. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 11:35:30 +01:00
Marc Zyngier	beed444841	KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits Similarly to other registers, describe which HCR_EL2 bit depends on which feature, and use this to compute the RES0 status of these bits. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 11:35:30 +01:00
Marc Zyngier	c6cbe6a4c1	KVM: arm64: Use FGT feature maps to drive RES0 bits Another benefit of mapping bits to features is that it becomes trivial to define which bits should be handled as RES0. Let's apply this principle to the guest's view of the FGT registers. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 11:35:00 +01:00
Marc Zyngier	4ffa72ad8f	KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2 A TLBI by VA for S1 must take effect on our pseudo-TLB for VNCR and potentially knock the fixmap mapping. Even worse, that TLBI must be able to work cross-vcpu. For that, we track on a per-VM basis if any VNCR is mapped, using an atomic counter. Whenever a TLBI S1E2 occurs and that this counter is non-zero, we take the long road all the way back to the core code. There, we iterate over all vcpus and check whether this particular invalidation has any damaging effect. If it does, we nuke the pseudo TLB and the corresponding fixmap. Yes, this is costly. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-14-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 08:01:19 +01:00
Marc Zyngier	7270cc9157	KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers During an invalidation triggered by an MMU notifier, we need to make sure we can drop the host mapping that would have been translated by the stage-2 mapping being invalidated. For the moment, the invalidation is pretty brutal, as we nuke the full IPA range, and therefore any VNCR_EL2 mapping. At some point, we'll be more light-weight, and the code is able to deal with something more targetted. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-12-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 08:01:19 +01:00
Marc Zyngier	2a359e0725	KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2 Now that we can handle faults triggered through VNCR_EL2, we need to map the corresponding page at EL2. But where, you'll ask? Since each CPU in the system can run a vcpu, we need a per-CPU mapping. For that, we carve a NR_CPUS range in the fixmap, giving us a per-CPU va at which to map the guest's VNCR's page. The mapping occurs both on vcpu load and on the back of a fault, both generating a request that will take care of the mapping. That mapping will also get dropped on vcpu put. Yes, this is a bit heavy handed, but it is simple. Eventually, we may want to have a per-VM, per-CPU mapping, which would avoid all the TLBI overhead. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-11-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 08:01:19 +01:00
Marc Zyngier	069a05e535	KVM: arm64: nv: Handle VNCR_EL2-triggered faults As VNCR_EL2.BADDR contains a VA, it is bound to trigger faults. These faults can have multiple source: - We haven't mapped anything on the host: we need to compute the resulting translation, populate a TLB, and eventually map the corresponding page - The permissions are out of whack: we need to tell the guest about this state of affairs Note that the kernel doesn't support S1POE for itself yet, so the particular case of a VNCR page mapped with no permissions or with write-only permissions is not correctly handled yet. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-10-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 08:01:19 +01:00
Marc Zyngier	6fb75733f1	KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2 Plug VNCR_EL2 in the vcpu_sysreg enum, define its RES0/RES1 bits, and make it accessible to userspace when the VM is configured to support FEAT_NV2. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 07:59:46 +01:00
Marc Zyngier	ea8d3cf46d	KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 FEAT_NV2 introduces an interesting problem for NV, as VNCR_EL2.BADDR is a virtual address in the EL2&0 (or EL2, but we thankfully ignore this) translation regime. As we need to replicate such mapping in the real EL2, it means that we need to remember that there is such a translation, and that any TLBI affecting EL2 can possibly affect this translation. It also means that any invalidation driven by an MMU notifier must be able to shoot down any such mapping. All in all, we need a data structure that represents this mapping, and that is extremely close to a TLB. Given that we can only use one of those per vcpu at any given time, we only allocate one. No effort is made to keep that structure small. If we need to start caching multiple of them, we may want to revisit that design point. But for now, it is kept simple so that we can reason about it. Oh, and add a braindump of how things are supposed to work, because I will definitely page this out at some point. Yes, pun intended. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 07:59:46 +01:00
Marc Zyngier	469c4713d4	KVM: arm64: nv: Allocate VNCR page when required If running a NV guest on an ARMv8.4-NV capable system, let's allocate an additional page that will be used by the hypervisor to fulfill system register accesses. Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 07:59:46 +01:00

1 2 3

124 Commits