linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-03-27 09:56:48 +08:00

Author	SHA1	Message	Date
Sebastian Ene	103e17aac0	KVM: arm64: Check the untrusted offset in FF-A memory share Verify the offset to prevent OOB access in the hypervisor FF-A buffer in case an untrusted large enough value [U32_MAX - sizeof(struct ffa_composite_mem_region) + 1, U32_MAX] is set from the host kernel. Signed-off-by: Sebastian Ene <sebastianene@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251017075710.2605118-1-sebastianene@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-30 16:14:58 +00:00
Vincent Donnefort	f71f7afd0a	KVM: arm64: Check range args for pKVM mem transitions There's currently no verification for host issued ranges in most of the pKVM memory transitions. The end boundary might therefore be subject to overflow and later checks could be evaded. Close this loophole with an additional pfn_range_is_valid() check on a per public function basis. Once this check has passed, it is safe to convert pfn and nr_pages into a phys_addr_t and a size. host_unshare_guest transition is already protected via __check_host_shared_guest(), while assert_host_shared_guest() callers are already ignoring host checks. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://patch.msgid.link/20251016164541.3771235-1-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-30 16:14:37 +00:00
Oliver Upton	fb10ddf35c	KVM: arm64: Compute per-vCPU FGTs at vcpu_load() To date KVM has used the fine-grained traps for the sake of UNDEF enforcement (so-called FGUs), meaning the constituent parts could be computed on a per-VM basis and folded into the effective value when programmed. Prepare for traps changing based on the vCPU context by computing the whole mess of them at vcpu_load(). Aggressively inline all the helpers to preserve the build-time checks that were there before. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:44:37 +01:00
Paolo Bonzini	924ebaefce	Merge tag 'kvmarm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.18 - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature.	2025-09-30 13:23:28 -04:00
Marc Zyngier	181ce6b01a	Merge branch kvm-arm64/misc-6.18 into kvmarm-master/next * kvm-arm64/misc-6.18: : . : . : Misc improvements and bug fixes: : : - Fix XN handling in the S2 page table dumper : (20250809135356.1003520-1-r09922117@csie.ntu.edu.tw) : : - Fix sanitity checks for huge mapping with pKVM running np guests : (20250815162655.121108-1-ben.horgan@arm.com) : : - Fix use of TRBE when KVM is disabled, and Linux running under : a lesser hypervisor (20250902-etm_crash-v2-1-aa9713a7306b@oss.qualcomm.com) : : - Fix out of date MTE-related comments (20250915155234.196288-1-alexandru.elisei@arm.com) : : - Fix PSCI BE support when running a NV guest (20250916161103.1040727-1-maz@kernel.org) : : - Fix page reference leak when refusing to map a page due to mismatched attributes : (20250917130737.2139403-1-tabba@google.com) : : - Add trap handling for PMSDSFR_EL1 : (20250901-james-perf-feat_spe_eft-v8-7-2e2738f24559@linaro.org) : : - Add advertisement from FEAT_LSFE (Large System Float Extension) : (20250918-arm64-lsfe-v4-1-0abc712101c7@kernel.org) : . KVM: arm64: Expose FEAT_LSFE to guests KVM: arm64: Add trap configs for PMSDSFR_EL1 KVM: arm64: Fix page leak in user_mem_abort() KVM: arm64: Fix kvm_vcpu_{set,is}_be() to deal with EL2 state KVM: arm64: Update stale comment for sanitise_mte_tags() KVM: arm64: Return early from trace helpers when KVM isn't available KVM: arm64: Fix debug checking for np-guests using huge mappings KVM: arm64: ptdump: Don't test PTE_VALID alongside other attributes Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-20 12:26:29 +01:00
Marc Zyngier	46bd74ef07	Merge branch kvm-arm64/el2-feature-control into kvmarm-master/next * kvm-arm64/el2-feature-control: (23 commits) : . : General rework of EL2 features that can be disabled to satisfy : the requirement of migration between heterogeneous hosts: : : - Handle effective RES0 behaviour of undefined registers, making sure : that disabling a feature affects full registeres, and not just : individual control bits. (20250918151402.1665315-1-maz@kernel.org) : : - Allow ID_AA64MMFR1_EL1.{TWED,HCX} to be disabled from userspace. : (20250911114621.3724469-1-yangjinqian1@huawei.com) : : - Turn the NV feature management into a deny-list, and expose : missing features to EL2 guests. : (20250912212258.407350-1-oliver.upton@linux.dev) : . KVM: arm64: nv: Expose up to FEAT_Debugv8p8 to NV-enabled VMs KVM: arm64: nv: Advertise FEAT_TIDCP1 to NV-enabled VMs KVM: arm64: nv: Advertise FEAT_SpecSEI to NV-enabled VMs KVM: arm64: nv: Expose FEAT_TWED to NV-enabled VMs KVM: arm64: nv: Exclude guest's TWED configuration when TWE isn't set KVM: arm64: nv: Expose FEAT_AFP to NV-enabled VMs KVM: arm64: nv: Expose FEAT_ECBHB to NV-enabled VMs KVM: arm64: nv: Expose FEAT_RASv1p1 via RAS_frac KVM: arm64: nv: Expose FEAT_DF2 to NV-enabled VMs KVM: arm64: nv: Don't erroneously claim FEAT_DoubleLock for NV VMs KVM: arm64: nv: Convert masks to denylists in limit_nv_id_reg() KVM: arm64: selftests: Test writes to ID_AA64MMFR1_EL1.{HCX, TWED} KVM: arm64: Make ID_AA64MMFR1_EL1.{HCX, TWED} writable from userspace KVM: arm64: Convert MDCR_EL2 RES0 handling to compute_reg_res0_bits() KVM: arm64: Convert SCTLR_EL1 RES0 handling to compute_reg_res0_bits() KVM: arm64: Enforce absence of FEAT_TCR2 on TCR2_EL2 KVM: arm64: Enforce absence of FEAT_SCTLR2 on SCTLR2_EL{1,2} KVM: arm64: Convert HCR_EL2 RES0 handling to compute_reg_res0_bits() KVM: arm64: Enforce absence of FEAT_HCX on HCRX_EL2 KVM: arm64: Enforce absence of FEAT_FGT2 on FGT2 registers ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-20 12:26:18 +01:00
Marc Zyngier	d9476fd356	Merge branch kvm-arm64/gic-v5-nv into kvmarm-master/next * kvm-arm64/gic-v5-nv: : . : Add NV support to GICv5 in GICv3 emulation mode, ensuring that the v3 : guest support is identical to that of a pure v3 platform. : : Patches courtesy of Sascha Bischoff (20250828105925.3865158-1-sascha.bischoff@arm.com) : . irqchip/gic-v5: Drop has_gcie_v3_compat from gic_kvm_info KVM: arm64: Use ARM64_HAS_GICV5_LEGACY for GICv5 probing arm64: cpucaps: Add GICv5 Legacy vCPU interface (GCIE_LEGACY) capability KVM: arm64: Enable nested for GICv5 host with FEAT_GCIE_LEGACY KVM: arm64: Don't access ICC_SRE_EL2 if GICv3 doesn't support v2 compatibility Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-20 12:26:05 +01:00
Oliver Upton	05d9f34083	KVM: arm64: nv: Exclude guest's TWED configuration when TWE isn't set Ignore the guest hypervisor's configured TWE delay if it hasn't actually requested WFE traps. Otherwise, OR'ing these fields into the effective HCR when the guest sets TWE is safe as KVM doesn't use FEAT_TWED and leaves the fields initialized to 0. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-19 14:01:35 +01:00
Marc Zyngier	9664d5810e	KVM: arm64: Don't access ICC_SRE_EL2 if GICv3 doesn't support v2 compatibility We currently access ICC_SRE_EL2 at each load/put on VHE, and on each entry/exit on nVHE. Both are quite onerous on NV, as this register always traps. We do this to make sure the EL1 guest doesn't flip between v2 and v3 behind our back. But all modern implementations have dropped v2, and this is just overhead. At the same time, the GICv5 spec has been fixed to allow access to ICC_SRE_EL2 in legacy mode. Use this opportunity to replace the GICv5 checks for v2 compat checks, with an ad-hoc static key. Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-17 17:40:42 +01:00
Marc Zyngier	32314d940e	Merge branch kvm-arm64/dump-instr into kvmarm-master/next * kvm-arm64/dump-instr: : . : Dump the isntruction stream on panic, just like the rest of the kernel : already does. : : Patches courtesy of Mostafa Saleh (20250909133631.3844423-1-smostafa@google.com) : . KVM: arm64: Map hyp text as RO and dump instr on panic KVM: arm64: Dump instruction on hyp panic Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-17 17:30:44 +01:00
Mostafa Saleh	6f1ece1e86	KVM: arm64: Map hyp text as RO and dump instr on panic Map the hyp text section as RO, there are no secrets there and that allows the kernel extract info for debugging. As in case of panic we can now dump the faulting instructions similar to the kernel. Signed-off-by: Mostafa Saleh <smostafa@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 13:04:23 +01:00
Marc Zyngier	3064cee8c3	Merge branch kvm-arm64/pkvm_vm_handle into kvmarm-master/next * kvm-arm64/pkvm_vm_handle: : pKVM VM handle allocation fixes, courtesy of Fuad Tabba. : : From the cover letter (20250909072437.4110547-1-tabba@google.com): : : "In pKVM, this handle is allocated when the VM is initialized at the : hypervisor, which is on the first vCPU run. However, the host starts : initializing the VM and setting up its data structures earlier. MMU : notifiers for the VMs are also registered before VM initialization at : the hypervisor, and rely on the handle to identify the VM. : : Therefore, there is a potential gap between when the VM is (partially) : setup at the host, but still without a valid pKVM handle to identify it : when communicating with the hypervisor." KVM: arm64: Reserve pKVM handle during pkvm_init_host_vm() KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization KVM: arm64: Consolidate pKVM hypervisor VM initialization logic KVM: arm64: Separate allocation and insertion of pKVM VM table entries KVM: arm64: Decouple hyp VM creation state from its handle KVM: arm64: Clarify comments to distinguish pKVM mode from protected VMs KVM: arm64: Rename 'host_kvm' to 'kvm' in pKVM host code KVM: arm64: Rename pkvm.enabled to pkvm.is_protected KVM: arm64: Add build-time check for duplicate DECLARE_REG use Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:49:04 +01:00
Fuad Tabba	256b4668cd	KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization The existing __pkvm_init_vm hypercall performs both the reservation of a VM table entry and the initialization of the hypervisor VM state in a single operation. This design prevents the host from obtaining a VM handle from the hypervisor until all preparation for the creation and the initialization of the VM is done, which is on the first vCPU run operation. To support more flexible VM lifecycle management, the host needs the ability to reserve a handle early, before the first vCPU run. Refactor the hypercall interface to enable this, splitting the single hypercall into a two-stage process: - __pkvm_reserve_vm: A new hypercall that allocates a slot in the hypervisor's vm_table, marks it as reserved, and returns a unique handle to the host. - __pkvm_unreserve_vm: A corresponding cleanup hypercall to safely release the reservation if the host fails to proceed with full initialization. - __pkvm_init_vm: The existing hypercall is modified to no longer allocate a slot. It now expects a pre-reserved handle and commits the donated VM memory to that slot. For now, the host-side code in __pkvm_create_hyp_vm calls the new reserve and init hypercalls back-to-back to maintain existing behavior. This paves the way for subsequent patches to separate the reservation and initialization steps in the VM's lifecycle. Signed-off-by: Fuad Tabba <tabba@google.com> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:46:55 +01:00
Fuad Tabba	814fd6beac	KVM: arm64: Consolidate pKVM hypervisor VM initialization logic The insert_vm_table_entry() function was performing tasks beyond its primary responsibility. In addition to inserting a VM pointer into the vm_table, it was also initializing several fields within 'struct pkvm_hyp_vm', such as the VMID and stage-2 MMU pointers. This mixing of concerns made the code harder to follow. As another preparatory step towards allowing a VM table entry to be reserved before the VM is fully created, this logic must be cleaned up. By separating table insertion from state initialization, we can control the timing of the initialization step more precisely in subsequent patches. Refactor the code to consolidate all initialization logic into init_pkvm_hyp_vm(): - Move the initialization of the handle, VMID, and MMU fields from insert_vm_table_entry() to init_pkvm_hyp_vm(). - Simplify insert_vm_table_entry() to perform only one action: placing the provided pkvm_hyp_vm pointer into the vm_table. - Update the calling sequence in __pkvm_init_vm() to first allocate an entry in the VM table, initialize the VM, and then insert the VM into the VM table. This is all protected by the vm_table_lock for now. Subsequent patches will adjust the sequence and not hold the vm_table_lock while initializing the VM at the hypervisor (init_pkvm_hyp_vm()). Signed-off-by: Fuad Tabba <tabba@google.com> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:46:55 +01:00
Fuad Tabba	1abc1ad529	KVM: arm64: Separate allocation and insertion of pKVM VM table entries The current insert_vm_table_entry() function performs two actions at once: it finds a free slot in the pKVM VM table and populates it with the pkvm_hyp_vm pointer. Refactor this function as a preparatory step for future work that will require reserving a VM slot and its corresponding handle earlier in the VM lifecycle, before the pkvm_hyp_vm structure is initialized and ready to be inserted. Split the function into a two-phase process: - A new allocate_vm_table_entry() function finds an empty slot, marks it as reserved with a RESERVED_ENTRY placeholder, and returns a handle derived from the slot's index. - The insert_vm_table_entry() function is repurposed to take the handle, validate that the corresponding slot is in the reserved state, and then populate it with the pkvm_hyp_vm pointer. Signed-off-by: Fuad Tabba <tabba@google.com> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:46:55 +01:00
Fuad Tabba	3c45b67625	KVM: arm64: Decouple hyp VM creation state from its handle Currently, the presence of a pKVM handle (pkvm.handle != 0) is used to determine if the corresponding hypervisor (EL2) VM has been created and initialized. This couples the handle's lifecycle with the VM's creation state. This coupling will become problematic with upcoming changes that will allocate the pKVM handle earlier in the VM's life, before the VM is instantiated at the hypervisor. To prepare for this and make the state tracking explicit, decouple the two concepts. Introduce a new boolean flag, 'pkvm.is_created', to track whether the hypervisor-side VM has been created and initialized. A new helper, pkvm_hyp_vm_is_created(), is added to check this flag. All call sites that previously checked for the handle's existence are converted to use the new, explicit check. The 'is_created' flag is set to true upon successful creation in the hypervisor (EL2) and cleared upon destruction. Signed-off-by: Fuad Tabba <tabba@google.com> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:46:55 +01:00
Fuad Tabba	070362648f	KVM: arm64: Clarify comments to distinguish pKVM mode from protected VMs The hypervisor code for protected KVM contains comments that are imprecise and at times flat-out wrong. They often refer to a "protected VM" in contexts where the code or data structure applies to _any_ VM managed by the hypervisor when pKVM is enabled. For instance, the 'vm_table' holds handles for all VMs known to the hypervisor, not exclusively for those that are configured as protected. This inaccurate terminology can make the code scope harder to understand for future (and current) developers. Clarify the comments throughout the pKVM hypervisor code to make a clear distinction between the pKVM feature itself (i.e., "protected mode") and the VMs that are specifically configured to be protected. This involves replacing ambiguous uses of "protected VM" with more accurate phrasing. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:46:55 +01:00
Fuad Tabba	58dfb66b1e	KVM: arm64: Rename pkvm.enabled to pkvm.is_protected The 'pkvm.enabled' field in struct kvm_protected_vm is confusingly named. Its purpose is to indicate whether a VM is a _protected_ VM under pKVM, and not whether the VM itself is enabled or running. For a non-protected VM, the VM can be fully active and running, yet this field would be false. This ambiguity can lead to incorrect assumptions about the VM's operational state and makes the code harder to reason about. Rename the field to 'is_protected' to make it unambiguous that the flag tracks the protected status of the VM. No functional change intended. Reviewed-by: Kunwu Chan <kunwu.chan@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Kunwu Chan <chentao@kylinos.cn> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:46:55 +01:00
Fuad Tabba	f9ac33e45d	KVM: arm64: Add build-time check for duplicate DECLARE_REG use The DECLARE_REG() macro provides a convenient way to create a local variable initialized from a cpu context in the hyp trap handlers. However, a common error is to use the macro multiple times in the same scope with the same register index, but for different logical purposes. This results in valid C code that compiles without error, but introduces subtle bugs where a developer expects two different variables to hold values from two different registers, when in fact they are both sourced from the same one. To prevent this entire class of bugs, modify the DECLARE_REG() macro to declare a dummy variable whose name is derived from the register index. If the macro is used again with the same index in the same scope, the compiler will fail with a "redeclaration of variable" error, turning a subtle runtime bug into an obvious build-time failure. Signed-off-by: Fuad Tabba <tabba@google.com> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:46:55 +01:00
Ben Horgan	2ba972bf71	KVM: arm64: Fix debug checking for np-guests using huge mappings When running with transparent huge pages and CONFIG_NVHE_EL2_DEBUG then the debug checking in assert_host_shared_guest() fails on the launch of an np-guest. This WARN_ON() causes a panic and generates the stack below. In __pkvm_host_relax_perms_guest() the debug checking assumes the mapping is a single page but it may be a block map. Update the checking so that the size is not checked and just assumes the correct size. While we're here make the same fix in __pkvm_host_mkyoung_guest(). Info: # lkvm run -k /share/arch/arm64/boot/Image -m 704 -c 8 --name guest-128 Info: Removed ghost socket file "/.lkvm//guest-128.sock". [ 1406.521757] kvm [141]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:1088! [ 1406.521804] kvm [141]: nVHE call trace: [ 1406.521828] kvm [141]: [<ffff8000811676b4>] __kvm_nvhe_hyp_panic+0xb4/0xe8 [ 1406.521946] kvm [141]: [<ffff80008116d12c>] __kvm_nvhe_assert_host_shared_guest+0xb0/0x10c [ 1406.522049] kvm [141]: [<ffff80008116f068>] __kvm_nvhe___pkvm_host_relax_perms_guest+0x48/0x104 [ 1406.522157] kvm [141]: [<ffff800081169df8>] __kvm_nvhe_handle___pkvm_host_relax_perms_guest+0x64/0x7c [ 1406.522250] kvm [141]: [<ffff800081169f0c>] __kvm_nvhe_handle_trap+0x8c/0x1a8 [ 1406.522333] kvm [141]: [<ffff8000811680fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4 [ 1406.522454] kvm [141]: ---[ end nVHE call trace ]--- [ 1406.522477] kvm [141]: Hyp Offset: 0xfffece8013600000 [ 1406.522554] Kernel panic - not syncing: HYP panic: [ 1406.522554] PS:834003c9 PC:0000b1806db6d170 ESR:00000000f2000800 [ 1406.522554] FAR:ffff8000804be420 HPFAR:0000000000804be0 PAR:0000000000000000 [ 1406.522554] VCPU:0000000000000000 [ 1406.523337] CPU: 3 UID: 0 PID: 141 Comm: kvm-vcpu-0 Not tainted 6.16.0-rc7 #97 PREEMPT [ 1406.523485] Hardware name: FVP Base RevC (DT) [ 1406.523566] Call trace: [ 1406.523629] show_stack+0x18/0x24 (C) [ 1406.523753] dump_stack_lvl+0xd4/0x108 [ 1406.523899] dump_stack+0x18/0x24 [ 1406.524040] panic+0x3d8/0x448 [ 1406.524184] nvhe_hyp_panic_handler+0x10c/0x23c [ 1406.524325] kvm_handle_guest_abort+0x68c/0x109c [ 1406.524500] handle_exit+0x60/0x17c [ 1406.524630] kvm_arch_vcpu_ioctl_run+0x2e0/0x8c0 [ 1406.524794] kvm_vcpu_ioctl+0x1a8/0x9cc [ 1406.524919] __arm64_sys_ioctl+0xac/0x104 [ 1406.525067] invoke_syscall+0x48/0x10c [ 1406.525189] el0_svc_common.constprop.0+0x40/0xe0 [ 1406.525322] do_el0_svc+0x1c/0x28 [ 1406.525441] el0_svc+0x38/0x120 [ 1406.525588] el0t_64_sync_handler+0x10c/0x138 [ 1406.525750] el0t_64_sync+0x1ac/0x1b0 [ 1406.525876] SMP: stopping secondary CPUs [ 1406.525965] Kernel Offset: disabled [ 1406.526032] CPU features: 0x0000,00000080,8e134ca1,9446773f [ 1406.526130] Memory Limit: none [ 1406.959099] ---[ end Kernel panic - not syncing: HYP panic: [ 1406.959099] PS:834003c9 PC:0000b1806db6d170 ESR:00000000f2000800 [ 1406.959099] FAR:ffff8000804be420 HPFAR:0000000000804be0 PAR:0000000000000000 [ 1406.959099] VCPU:0000000000000000 ] Signed-off-by: Ben Horgan <ben.horgan@arm.com> Fixes: `f28f1d02f4` ("KVM: arm64: Add a range to __pkvm_host_unshare_guest()") Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Quentin Perret <qperret@google.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: stable@vger.kernel.org Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:29:14 +01:00
Marc Zyngier	ebdda44a2b	Merge branch kvm-arm64/ffa-1.2 into kvmarm-master/next * kvm-arm64/ffa-1.2: : . : FFA 1.2 support for pKVM, courtesy of Per Larsen. : : From the cover letter at [1]: : : "The FF-A 1.2 specification introduces a new SEND_DIRECT2 ABI which : allows registers x4-x17 to be used for the message payload. This patch : set prevents the host from using a lower FF-A version than what has : already been negotiated with the hypervisor. This is necessary because : the hypervisor does not have the necessary compatibility paths to : translate from the hypervisor FF-A version to a previous version." : : [1] https://lore.kernel.org/r/20250820-virtio-msg-ffa-v11-0-497ef43550a3@google.com : . KVM: arm64: Bump the supported version of FF-A to 1.2 KVM: arm64: Mask response to FFA_FEATURE call KVM: arm64: Mark optional FF-A 1.2 interfaces as unsupported KVM: arm64: Mark FFA_NOTIFICATION_* calls as unsupported KVM: arm64: Use SMCCC 1.2 for FF-A initialization and in host handler KVM: arm64: Correct return value on host version downgrade attempt Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-15 10:27:28 +01:00
Oliver Upton	e6157256ee	Revert "KVM: arm64: Split kvm_pgtable_stage2_destroy()" This reverts commit `0e89ca13ee`. The functional change that depended on this refactoring has been found to be quite problematic. Reverting the whole pile to start fresh when new fixes are available. Message-ID: <20250910180930.3679473-3-oliver.upton@linux.dev> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-09-10 11:11:22 -07:00
Fuad Tabba	2dc720e606	KVM: arm64: Fix parameter ordering for VBAR_EL1 assignment The __vcpu_assign_sys_reg() helper expects the register ID as the second argument and the value to be assigned as the third. However, the existing code was passing these parameters in the incorrect order. Fix the function call to properly read the live value of VBAR_EL1 from the guest and update the vCPU value immediately before pending the exception. This ensures the vCPU's value is the same as the guest's and that the exception will be handled at the correct address upon resuming the guest. Fixes: `798eb59787` ("KVM: arm64: Sync protected guest VBAR_EL1 on injecting an undef exception") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250908163557.2419780-1-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-09-10 02:56:20 -07:00
Alexandru Elisei	da2e743419	KVM: arm64: VHE: Save and restore host MDCR_EL2 value correctly Prior to commit `75a5fbaf66` ("KVM: arm64: Compute MDCR_EL2 at vcpu_load()"), host MDCR_EL2 was saved correctly: kvm_arch_vcpu_load() kvm_vcpu_load_debug() /* Doesn't touch hardware MDCR_EL2. / kvm_vcpu_load_vhe() __activate_traps_common() / Saves host MDCR_EL2. / host_data_ptr(host_debug_state.mdcr_el2) = read_sysreg(mdcr_el2) /* Writes VCPU MDCR_EL2. / write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2) The MDCR_EL2 value saved previously was restored in kvm_arch_vcpu_put() -> kvm_vcpu_put_vhe(). After the aforementioned commit, host MDCR_EL2 is never saved: kvm_arch_vcpu_load() kvm_vcpu_load_debug() / Writes VCPU MDCR_EL2 / kvm_vcpu_load_vhe() __activate_traps_common() / Saves VCPU MDCR_EL2. / host_data_ptr(host_debug_state.mdcr_el2) = read_sysreg(mdcr_el2) /* Writes VCPU MDCR_EL2 a second time. */ write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2) kvm_arch_vcpu_put() -> kvm_vcpu_put_vhe() then restores the VCPU MDCR_EL2 value. Also VCPU's MDCR_EL2 value gets written to hardware twice now. Fix this by saving the host MDCR_EL2 in kvm_arch_vcpu_load() before it gets overwritten by the VCPU's MDCR_EL2 value, and restore it on VCPU put. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250902130833.338216-3-alexandru.elisei@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-09-10 02:56:19 -07:00
Per Larsen	162190f2cc	KVM: arm64: Bump the supported version of FF-A to 1.2 FF-A version 1.2 introduces the DIRECT_REQ2 ABI. Bump the FF-A version preferred by the hypervisor to enable implementation of the 1.2-only FFA_MSG_SEND_DIRECT_REQ2 and FFA_MSG_SEND_RESP2 messaging interfaces. Co-developed-by: Ayrton Munoz <ayrton@google.com> Signed-off-by: Ayrton Munoz <ayrton@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Per Larsen <perlarsen@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-08 19:30:59 +01:00
Per Larsen	3f59529174	KVM: arm64: Mask response to FFA_FEATURE call The minimum size and alignment boundary for FFA_RXTX_MAP is returned in bit[1:0]. Mask off any other bits in w2 when reading the minimum buffer size in hyp_ffa_post_init. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Per Larsen <perlarsen@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-08 19:30:59 +01:00
Per Larsen	8d24683e3e	KVM: arm64: Mark optional FF-A 1.2 interfaces as unsupported Mark FF-A 1.2 interfaces as unsupported lest they get proxied. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Per Larsen <perlarsen@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-08 19:30:59 +01:00
Per Larsen	79195f3424	KVM: arm64: Mark FFA_NOTIFICATION_* calls as unsupported Prevent FFA_NOTIFICATION_* interfaces from being passed through to TZ. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Per Larsen <perlarsen@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-08 19:30:59 +01:00
Per Larsen	6f4c348b1d	KVM: arm64: Use SMCCC 1.2 for FF-A initialization and in host handler SMCCC 1.1 and prior allows four registers to be sent back as a result of an FF-A interface. SMCCC 1.2 increases the number of results that can be sent back to 8 and 16 for 32-bit and 64-bit SMC/HVCs respectively. FF-A 1.0 references SMCCC 1.2 (reference [4] on page xi) and FF-A 1.2 explicitly requires SMCCC 1.2 so it should be safe to use this version unconditionally. Moreover, it is simpler to implement FF-A features without having to worry about compatibility with SMCCC 1.1 and older. SMCCC 1.2 requires that SMC32/HVC32 from aarch64 mode preserves x8-x30 but given that there is no reliable way to distinguish 32-bit/64-bit calls, we assume SMC64 unconditionally. This has the benefit of being consistent with the handling of calls that are passed through, i.e., not proxied. (A cleaner solution will become available in FF-A 1.3.) Update the FF-A initialization and host handler code to use SMCCC 1.2. Signed-off-by: Per Larsen <perlarsen@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-08 19:30:59 +01:00
Per Larsen	f414269392	KVM: arm64: Correct return value on host version downgrade attempt Once the hypervisor negotiates the FF-A version with the host, it should remain locked-in. However, it is possible to load FF-A as a module first supporting version 1.1 and then 1.0. Without this patch, the FF-A 1.0 driver will use 1.0 data structures to make calls which the hypervisor will incorrectly interpret as 1.1 data structures. With this patch, negotiation will fail. This patch does not change existing functionality in the case where a FF-A 1.2 driver is loaded after a 1.1 driver; the 1.2 driver will need to use 1.1 in order to proceed. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Per Larsen <perlarsen@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-08 19:30:59 +01:00
Paolo Bonzini	42a0305ab1	Merge tag 'kvmarm-fixes-6.17-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, take #2 - Correctly handle 'invariant' system registers for protected VMs - Improved handling of VNCR data aborts, including external aborts - Fixes for handling of FEAT_RAS for NV guests, providing a sane fault context during SEA injection and preventing the use of RASv1p1 fault injection hardware - Ensure that page table destruction when a VM is destroyed gives an opportunity to reschedule - Large fix to KVM's infrastructure for managing guest context loaded on the CPU, addressing issues where the output of AT emulation doesn't get reflected to the guest - Fix AT S12 emulation to actually perform stage-2 translation when necessary - Avoid attempting vLPI irqbypass when GICv4 has been explicitly disabled for a VM - Minor KVM + selftest fixes	2025-08-29 12:57:31 -04:00
Marc Zyngier	e3f6836a63	KVM: arm64: Simplify sysreg access on exception delivery Distinguishing between NV and VHE is slightly pointless, and only serves as an extra complication, or a way to introduce bugs, such as the way SPSR_EL1 gets written without checking for the state being resident. Get rid if this silly distinction, and fix the bug in one go. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250817121926.217900-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-28 11:39:48 -07:00
Marc Zyngier	b720269334	KVM: arm64: Check for SYSREGS_ON_CPU before accessing the 32bit state Just like `c6e35dff58` ("KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state") fixed the 64bit state access, add a check for the 32bit state actually being on the CPU before writing it. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250817121926.217900-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-28 11:39:48 -07:00
Marc Zyngier	9049fb1227	KVM: arm64: Ignore HCR_EL2.FIEN set by L1 guest's EL2 An EL2 guest can set HCR_EL2.FIEN, which gives access to the RASv1p1 fault injection mechanism. This would allow an EL1 guest to inject error records into the system, which does sound like a terrible idea. Prevent this situation by added FIEN to the list of bits we silently exclude from being inserted into the host configuration. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250817202158.395078-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-21 16:28:46 -07:00
Raghavendra Rao Ananta	0e89ca13ee	KVM: arm64: Split kvm_pgtable_stage2_destroy() Split kvm_pgtable_stage2_destroy() into two: - kvm_pgtable_stage2_destroy_range(), that performs the page-table walk and free the entries over a range of addresses. - kvm_pgtable_stage2_destroy_pgd(), that frees the PGD. This refactoring enables subsequent patches to free large page-tables in chunks, calling cond_resched() between each chunk, to yield the CPU as necessary. Existing callers of kvm_pgtable_stage2_destroy(), that probably cannot take advantage of this (such as nVMHE), will continue to function as is. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Suggested-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250820162242.2624752-2-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-21 16:27:03 -07:00
Fuad Tabba	f1edb15920	arm64: vgic-v2: Fix guest endianness check in hVHE mode In hVHE when running at the hypervisor, SCTLR_EL1 refers to the hypervisor's System Control Register rather than the guest's. Make sure to access the guest's register to determine its endianness. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250807120133.871892-4-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-08 01:29:32 -07:00
Fuad Tabba	798eb59787	KVM: arm64: Sync protected guest VBAR_EL1 on injecting an undef exception In pKVM, a race condition can occur if a guest updates its VBAR_EL1 register and, before a vCPU exit synchronizes this change, the hypervisor needs to inject an undefined exception into a protected guest. In this scenario, the vCPU still holds the stale VBAR_EL1 value from before the guest's update. When pKVM injects the exception, it ends up using the stale value. Explicitly read the live value of VBAR_EL1 from the guest and update the vCPU value immediately before pending the exception. This ensures the vCPU's value is the same as the guest's and that the exception will be handled at the correct address upon resuming the guest. Reported-by: Keir Fraser <keirf@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250807120133.871892-3-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-08 01:29:32 -07:00
Fuad Tabba	eaa43934b4	KVM: arm64: Handle AIDR_EL1 and REVIDR_EL1 in host for protected VMs Since commit `17efc1acee` ("arm64: Expose AIDR_EL1 via sysfs"), AIDR_EL1 is read early during boot. Therefore, a guest running as a protected VM will fail to boot because when it attempts to access AIDR_EL1, access to that register is restricted in pKVM for protected guests. Similar to how MIDR_EL1 is handled by the host for protected VMs, let the host handle accesses to AIDR_EL1 as well as REVIDR_EL1. However note that, unlike MIDR_EL1, AIDR_EL1 and REVIDR_EL1 are trapped by HCR_EL2.TID1. Therefore, explicitly mark them as handled by the host for protected VMs. TID1 is always set in pKVM, because it needs to restrict access to SMIDR_EL1, which is also trapped by that bit. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250807120133.871892-2-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-08 01:29:31 -07:00
Arnd Bergmann	700d6868fe	kvm: arm64: use BUG() instead of BUG_ON(1) The BUG_ON() macro adds a little bit of complexity over BUG(), and in some cases this ends up confusing the compiler's control flow analysis in a way that results in a warning. This one now shows up with clang-21: arch/arm64/kvm/vgic/vgic-mmio.c:1094:3: error: variable 'len' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] 1094 \| BUG_ON(1); Change both instances of BUG_ON(1) to a plain BUG() in the arm64 kvm code, to avoid the false-positive warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20250807072132.4170088-1-arnd@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-08 01:28:57 -07:00
Linus Torvalds	63eb28bb14	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor - Convert more system register sanitization to the config-driven implementation - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls - Various cleanups and minor fixes LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n) - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list) - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC - Various cleanups and fixes x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still) - Fix a variety of flaws and bugs in the AVIC device posted IRQ code - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI - Track irqbypass's "token" as "struct eventfd_ctx " instead of a "void ", to avoid making a simple concept unnecessarily difficult to understand - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs - Clean up and document/comment the irqfd assignment code - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation Selftests: - Fix a comment typo - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing) - Skip tests that hit EACCES when attempting to access a file, and print a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits) Documentation: KVM: Use unordered list for pre-init VGIC registers RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map() RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs RISC-V: perf/kvm: Add reporting of interrupt events RISC-V: KVM: Enable ring-based dirty memory tracking RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap RISC-V: KVM: Delegate illegal instruction fault to VS mode RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs RISC-V: KVM: Factor-out g-stage page table management RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence RISC-V: KVM: Introduce struct kvm_gstage_mapping RISC-V: KVM: Factor-out MMU related declarations into separate headers RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect() RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range() RISC-V: KVM: Don't flush TLB when PTE is unchanged RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize() RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init() RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list ...	2025-07-30 17:14:01 -07:00
Linus Torvalds	6fb44438a5	Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "A quick summary: perf support for Branch Record Buffer Extensions (BRBE), typical PMU hardware updates, small additions to MTE for store-only tag checking and exposing non-address bits to signal handlers, HAVE_LIVEPATCH enabled on arm64, VMAP_STACK forced on. There is also a TLBI optimisation on hardware that does not require break-before-make when changing the user PTEs between contiguous and non-contiguous. More details: Perf and PMU updates: - Add support for new (v3) Hisilicon SLLC and DDRC PMUs - Add support for Arm-NI PMU integrations that share interrupts between clock domains within a given instance - Allow SPE to be configured with a lower sample period than the minimum recommendation advertised by PMSIDR_EL1.Interval - Add suppport for Arm's "Branch Record Buffer Extension" (BRBE) - Adjust the perf watchdog period according to cpu frequency changes - Minor driver fixes and cleanups Hardware features: - Support for MTE store-only checking (FEAT_MTE_STORE_ONLY) - Support for reporting the non-address bits during a synchronous MTE tag check fault (FEAT_MTE_TAGGED_FAR) - Optimise the TLBI when folding/unfolding contiguous PTEs on hardware with FEAT_BBM (break-before-make) level 2 and no TLB conflict aborts Software features: - Enable HAVE_LIVEPATCH after implementing arch_stack_walk_reliable() and using the text-poke API for late module relocations - Force VMAP_STACK always on and change arm64_efi_rt_init() to use arch_alloc_vmap_stack() in order to avoid KASAN false positives ACPI: - Improve SPCR handling and messaging on systems lacking an SPCR table Debug: - Simplify the debug exception entry path - Drop redundant DBG_MDSCR_* macros Kselftests: - Cleanups and improvements for SME, SVE and FPSIMD tests Miscellaneous: - Optimise loop to reduce redundant operations in contpte_ptep_get() - Remove ISB when resetting POR_EL0 during signal handling - Mark the kernel as tainted on SEA and SError panic - Remove redundant gcs_free() call" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits) arm64/gcs: task_gcs_el0_enable() should use passed task arm64: Kconfig: Keep selects somewhat alphabetically ordered arm64: signal: Remove ISB when resetting POR_EL0 kselftest/arm64: Handle attempts to disable SM on SME only systems kselftest/arm64: Fix SVE write data generation for SME only systems kselftest/arm64: Test SME on SME only systems in fp-ptrace kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace kselftest/arm64: Allow sve-ptrace to run on SME only systems arm64/mm: Drop redundant addr increment in set_huge_pte_at() kselftest/arm4: Provide local defines for AT_HWCAP3 arm64: Mark kernel as tainted on SAE and SError panic arm64/gcs: Don't call gcs_free() when releasing task_struct drivers/perf: hisi: Support PMUs with no interrupt drivers/perf: hisi: Relax the event number check of v2 PMUs drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver drivers/perf: hisi: Simplify the probe process for each DDRC version perf/arm-ni: Support sharing IRQs within an NI instance perf/arm-ni: Consolidate CPU affinity handling ...	2025-07-29 20:21:54 -07:00
Paolo Bonzini	314b40b3b6	Merge tag 'kvmarm-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, round #1 - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts. - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface. - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally. - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range. - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor. - Convert more system register sanitization to the config-driven implementation. - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls. - Various cleanups and minor fixes.	2025-07-29 12:27:40 -04:00
Linus Torvalds	8e736a2eea	Merge tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - Introduce and start using TRAILING_OVERLAP() helper for fixing embedded flex array instances (Gustavo A. R. Silva) - mux: Convert mux_control_ops to a flex array member in mux_chip (Thorsten Blum) - string: Group str_has_prefix() and strstarts() (Andy Shevchenko) - Remove KCOV instrumentation from __init and __head (Ritesh Harjani, Kees Cook) - Refactor and rename stackleak feature to support Clang - Add KUnit test for seq_buf API - Fix KUnit fortify test under LTO * tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits) sched/task_stack: Add missing const qualifier to end_of_stack() kstack_erase: Support Clang stack depth tracking kstack_erase: Add -mgeneral-regs-only to silence Clang warnings init.h: Disable sanitizer coverage for __init and __head kstack_erase: Disable kstack_erase for all of arm compressed boot code x86: Handle KCOV __init vs inline mismatches arm64: Handle KCOV __init vs inline mismatches s390: Handle KCOV __init vs inline mismatches arm: Handle KCOV __init vs inline mismatches mips: Handle KCOV __init vs inline mismatch powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON configs/hardening: Enable CONFIG_KSTACK_ERASE stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth stackleak: Rename STACKLEAK to KSTACK_ERASE seq_buf: Introduce KUnit tests string: Group str_has_prefix() and strstarts() kunit/fortify: Add back "volatile" for sizeof() constants acpi: nfit: intel: avoid multiple -Wflex-array-member-not-at-end warnings ...	2025-07-28 17:16:12 -07:00
Oliver Upton	1f315e99bd	Merge branch 'kvm-arm64/gcie-legacy' into kvmarm/next * kvm-arm64/gcie-legacy: : Support for GICv3 emulation on GICv5, courtesy of Sascha Bischoff : : FEAT_GCIE_LEGACY adds the necessary hardware for GICv5 systems to : support the legacy GICv3 for VMs, including a backwards-compatible VGIC : implementation that we all know and love. : : As a starting point for GICv5 enablement in KVM, enable + use the : GICv3-compatible feature when running VMs on GICv5 hardware. KVM: arm64: gic-v5: Probe for GICv5 KVM: arm64: gic-v5: Support GICv3 compat arm64/sysreg: Add ICH_VCTLR_EL2 irqchip/gic-v5: Populate struct gic_kvm_info irqchip/gic-v5: Skip deactivate for forwarded PPI interrupts Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-26 08:50:06 -07:00
Kees Cook	57fbad15c2	stackleak: Rename STACKLEAK to KSTACK_ERASE In preparation for adding Clang sanitizer coverage stack depth tracking that can support stack depth callbacks: - Add the new top-level CONFIG_KSTACK_ERASE option which will be implemented either with the stackleak GCC plugin, or with the Clang stack depth callback support. - Rename CONFIG_GCC_PLUGIN_STACKLEAK as needed to CONFIG_KSTACK_ERASE, but keep it for anything specific to the GCC plugin itself. - Rename all exposed "STACKLEAK" names and files to "KSTACK_ERASE" (named for what it does rather than what it protects against), but leave as many of the internals alone as possible to avoid even more churn. While here, also split "prev_lowest_stack" into CONFIG_KSTACK_ERASE_METRICS, since that's the only place it is referenced from. Suggested-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250717232519.2984886-1-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>	2025-07-21 21:35:01 -07:00
Marc Zyngier	303084ad12	KVM: arm64: Filter out HCR_EL2 bits when running in hypervisor context Most HCR_EL2 bits are not supposed to affect EL2 at all, but only the guest. However, we gladly merge these bits with the host's HCR_EL2 configuration, irrespective of entering L1 or L2. This leads to some funky behaviour, such as L1 trying to inject a virtual SError for L2, and getting a taste of its own medecine. Not quite what the architecture anticipated. In the end, the only bits that matter are those we have defined as invariants, either because we've made them RESx (E2H, HCD...), or that we actively refuse to merge because the mess with KVM's own logic. Use the sanitisation infrastructure to get the RES1 bits, and let things rip in a safer way. Fixes: `04ab519bb8` ("KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250721101955.535159-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-21 09:35:57 -07:00
Marc Zyngier	c6e35dff58	KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state Mark Brown reports that since we commit to making exceptions visible without the vcpu being loaded, the external abort selftest fails. Upon investigation, it turns out that the code that makes registers affected by an exception visible to the guest is completely broken on VHE, as we don't check whether the system registers are loaded on the CPU at this point. We managed to get away with this so far, but that's obviously as bad as it gets, Add the required checksm and document the absolute need to check for the SYSREGS_ON_CPU flag before calling into any of the __vcpu_write_sys_reg_to_cpu()__vcpu_read_sys_reg_from_cpu() helpers. Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/18535df8-e647-4643-af9a-bb780af03a70@sirena.org.uk Link: https://lore.kernel.org/r/20250720102229.179114-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-21 09:34:57 -07:00
Sascha Bischoff	c017e49ed1	KVM: arm64: gic-v5: Support GICv3 compat Add support for GICv3 compat mode (FEAT_GCIE_LEGACY) which allows a GICv5 host to run GICv3-based VMs. This change enables the VHE/nVHE/hVHE/protected modes, but does not support nested virtualization. A lazy-disable approach is taken for compat mode; it is enabled on the vgic_v3_load path but not disabled on the vgic_v3_put path. A non-GICv3 VM, i.e., one based on GICv5, is responsible for disabling compat mode on the corresponding vgic_v5_load path. Currently, GICv5 is not supported, and hence compat mode is not disabled again once it is enabled, and this function is intentionally omitted from the code. Co-authored-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://lore.kernel.org/r/20250627100847.1022515-5-sascha.bischoff@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-08 14:41:06 -07:00
Oliver Upton	fff97df2a0	KVM: arm64: Route SEAs to the SError vector when EASE is set One of the finest additions of FEAT_DoubleFault2 is the ability for software to request synchronous external aborts be taken to the SError vector, which of coure are asynchronous in nature. Opinions be damned, implement the architecture and send SEAs to the SError vector if EASE is set for the target context. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-18-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-08 11:36:35 -07:00
Oliver Upton	02dd33ec88	KVM: arm64: Context switch SCTLR2_ELx when advertised to the guest Restore SCTLR2_EL1 with the correct value for the given context when FEAT_SCTLR2 is advertised to the guest. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-13-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-08 11:36:35 -07:00

1 2 3 4 5 ...

1265 Commits