linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00

Author	SHA1	Message	Date
Brijesh Singh	d3d1af85e2	KVM: SVM: Add KVM_SEND_UPDATE_DATA command The command is used for encrypting the guest memory region using the encryption context created with KVM_SEV_SEND_START. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by : Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <d6a6ea740b0c668b30905ae31eac5ad7da048bb3.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-21 12:20:03 -04:00
Brijesh Singh	4cfdd47d6d	KVM: SVM: Add KVM_SEV SEND_START command The command is used to create an outgoing SEV guest encryption context. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <2f1686d0164e0f1b3d6a41d620408393e0a48376.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-21 12:20:03 -04:00
Nathan Tempelman	54526d1fd5	KVM: x86: Support KVM VMs sharing SEV context Add a capability for userspace to mirror SEV encryption context from one vm to another. On our side, this is intended to support a Migration Helper vCPU, but it can also be used generically to support other in-guest workloads scheduled by the host. The intention is for the primary guest and the mirror to have nearly identical memslots. The primary benefits of this are that: 1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they can't accidentally clobber each other. 2) The VMs can have different memory-views, which is necessary for post-copy migration (the migration vCPUs on the target need to read and write to pages, when the primary guest would VMEXIT). This does not change the threat model for AMD SEV. Any memory involved is still owned by the primary guest and its initial state is still attested to through the normal SEV_LAUNCH_* flows. If userspace wanted to circumvent SEV, they could achieve the same effect by simply attaching a vCPU to the primary VM. This patch deliberately leaves userspace in charge of the memslots for the mirror, as it already has the power to mess with them in the primary guest. This patch does not support SEV-ES (much less SNP), as it does not handle handing off attested VMSAs to the mirror. For additional context, we need a Migration Helper because SEV PSP migration is far too slow for our live migration on its own. Using an in-guest migrator lets us speed this up significantly. Signed-off-by: Nathan Tempelman <natet@google.com> Message-Id: <20210408223214.2582277-1-natet@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-21 12:20:02 -04:00
Krish Sadhukhan	ee695f22b5	nSVM: Check addresses of MSR and IO permission maps According to section "Canonicalization and Consistency Checks" in APM vol 2, the following guest state is illegal: "The MSR or IOIO intercept tables extend to a physical address that is greater than or equal to the maximum supported physical address." Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20210412215611.110095-5-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-21 12:20:01 -04:00
Krish Sadhukhan	47903dc10e	KVM: SVM: Define actual size of IOPM and MSRPM tables Define the actual size of the IOPM and MSRPM tables so that the actual size can be used when initializing them and when checking the consistency of their physical address. These #defines are placed in svm.h so that they can be shared. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20210412215611.110095-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-20 04:18:56 -04:00
Sean Christopherson	44f1b5586d	KVM: SVM: Enhance and clean up the vmcb tracking comment in pre_svm_run() Explicitly document why a vmcb must be marked dirty and assigned a new asid when it will be run on a different cpu. The "what" is relatively obvious, whereas the "why" requires reading the APM and/or KVM code. Opportunistically remove a spurious period and several unnecessary newlines in the comment. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-20 04:18:50 -04:00
Sean Christopherson	554cf31474	KVM: SVM: Add a comment to clarify what vcpu_svm.vmcb points at Add a comment above the declaration of vcpu_svm.vmcb to call out that it is simply a shorthand for current_vmcb->ptr. The myriad accesses to svm->vmcb are quite confusing without this crucial detail. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-20 04:18:49 -04:00
Sean Christopherson	d1788191fd	KVM: SVM: Drop vcpu_svm.vmcb_pa Remove vmcb_pa from vcpu_svm and simply read current_vmcb->pa directly in the one path where it is consumed. Unlike svm->vmcb, use of the current vmcb's address is very limited, as evidenced by the fact that its use can be trimmed to a single dereference. Opportunistically add a comment about using vmcb01 for VMLOAD/VMSAVE, at first glance using vmcb01 instead of vmcb_pa looks wrong. No functional change intended. Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-20 04:18:49 -04:00
Sean Christopherson	17e5e964ee	KVM: SVM: Don't set current_vmcb->cpu when switching vmcb Do not update the new vmcb's last-run cpu when switching to a different vmcb. If the vCPU is migrated between its last run and a vmcb switch, e.g. for nested VM-Exit, then setting the cpu without marking the vmcb dirty will lead to KVM running the vCPU on a different physical cpu with stale clean bit settings. vcpu->cpu current_vmcb->cpu hardware pre_svm_run() cpu0 cpu0 cpu0,clean kvm_arch_vcpu_load() cpu1 cpu0 cpu0,clean svm_switch_vmcb() cpu1 cpu1 cpu0,clean pre_svm_run() cpu1 cpu1 kaboom Simply delete the offending code; unlike VMX, which needs to update the cpu at switch time due to the need to do VMPTRLD, SVM only cares about which cpu last ran the vCPU. Fixes: `af18fa775d` ("KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb") Cc: Cathy Avery <cavery@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-20 04:18:49 -04:00
Tom Lendacky	a3ba26ecfb	KVM: SVM: Make sure GHCB is mapped before updating Access to the GHCB is mainly in the VMGEXIT path and it is known that the GHCB will be mapped. But there are two paths where it is possible the GHCB might not be mapped. The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform the caller of the AP Reset Hold NAE event that a SIPI has been delivered. However, if a SIPI is performed without a corresponding AP Reset Hold, then the GHCB might not be mapped (depending on the previous VMEXIT), which will result in a NULL pointer dereference. The svm_complete_emulated_msr() routine will update the GHCB to inform the caller of a RDMSR/WRMSR operation about any errors. While it is likely that the GHCB will be mapped in this situation, add a safe guard in this path to be certain a NULL pointer dereference is not encountered. Fixes: `f1c6366e30` ("KVM: SVM: Add required changes to support intercepts under SEV-ES") Fixes: `647daca25d` ("KVM: SVM: Add support for booting APs in an SEV-ES guest") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Message-Id: <a5d3ebb600a91170fc88599d5a575452b3e31036.1617979121.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 18:04:47 -04:00
Maxim Levitsky	4020da3b9f	KVM: x86: pending exceptions must not be blocked by an injected event Injected interrupts/nmi should not block a pending exception, but rather be either lost if nested hypervisor doesn't intercept the pending exception (as in stock x86), or be delivered in exitintinfo/IDT_VECTORING_INFO field, as a part of a VMexit that corresponds to the pending exception. The only reason for an exception to be blocked is when nested run is pending (and that can't really happen currently but still worth checking for). Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401143817.1030695-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:31:02 -04:00
Maxim Levitsky	232f75d3b4	KVM: nSVM: call nested_svm_load_cr3 on nested state load While KVM's MMU should be fully reset by loading of nested CR0/CR3/CR4 by KVM_SET_SREGS, we are not in nested mode yet when we do it and therefore only root_mmu is reset. On regular nested entries we call nested_svm_load_cr3 which both updates the guest's CR3 in the MMU when it is needed, and it also initializes the mmu again which makes it initialize the walk_mmu as well when nested paging is enabled in both host and guest. Since we don't call nested_svm_load_cr3 on nested state load, the walk_mmu can be left uninitialized, which can lead to a NULL pointer dereference while accessing it if we happen to get a nested page fault right after entering the nested guest first time after the migration and we decide to emulate it, which leads to the emulator trying to access walk_mmu->gva_to_gpa which is NULL. Therefore we should call this function on nested state load as well. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401141814.1029036-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:31:01 -04:00
Maxim Levitsky	adc2a23734	KVM: nSVM: improve SYSENTER emulation on AMD Currently to support Intel->AMD migration, if CPU vendor is GenuineIntel, we emulate the full 64 value for MSR_IA32_SYSENTER_{EIP\|ESP} msrs, and we also emulate the sysenter/sysexit instruction in long mode. (Emulator does still refuse to emulate sysenter in 64 bit mode, on the ground that the code for that wasn't tested and likely has no users) However when virtual vmload/vmsave is enabled, the vmload instruction will update these 32 bit msrs without triggering their msr intercept, which will lead to having stale values in kvm's shadow copy of these msrs, which relies on the intercept to be up to date. Fix/optimize this by doing the following: 1. Enable the MSR intercepts for SYSENTER MSRs iff vendor=GenuineIntel (This is both a tiny optimization and also ensures that in case the guest cpu vendor is AMD, the msrs will be 32 bit wide as AMD defined). 2. Store only high 32 bit part of these msrs on interception and combine it with hardware msr value on intercepted read/writes iff vendor=GenuineIntel. 3. Disable vmload/vmsave virtualization if vendor=GenuineIntel. (It is somewhat insane to set vendor=GenuineIntel and still enable SVM for the guest but well whatever). Then zero the high 32 bit parts when kvm intercepts and emulates vmload. Thanks a lot to Paulo Bonzini for helping me with fixing this in the most correct way. This patch fixes nested migration of 32 bit nested guests, that was broken because incorrect cached values of SYSENTER msrs were stored in the migration stream if L1 changed these msrs with vmload prior to L2 entry. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401111928.996871-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:59 -04:00
Sean Christopherson	eba04b20e4	KVM: x86: Account a variety of miscellaneous allocations Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are clearly associated with a single task/VM. Note, there are a several SEV allocations that aren't accounted, but those can (hopefully) be fixed by using the local stack for memory. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331023025.2485960-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:58 -04:00
Sean Christopherson	8727906fde	KVM: SVM: Do not allow SEV/SEV-ES initialization after vCPUs are created Reject KVM_SEV_INIT and KVM_SEV_ES_INIT if they are attempted after one or more vCPUs have been created. KVM assumes a VM is tagged SEV/SEV-ES prior to vCPU creation, e.g. init_vmcb() needs to mark the VMCB as SEV enabled, and svm_create_vcpu() needs to allocate the VMSA. At best, creating vCPUs before SEV/SEV-ES init will lead to unexpected errors and/or behavior, and at worst it will crash the host, e.g. sev_launch_update_vmsa() will dereference a null svm->vmsa pointer. Fixes: `1654efcbc4` ("KVM: SVM: Add KVM_SEV_INIT command") Fixes: `ad73109ae7` ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:58 -04:00
Sean Christopherson	9fa1521daa	KVM: SVM: Do not set sev->es_active until KVM_SEV_ES_INIT completes Set sev->es_active only after the guts of KVM_SEV_ES_INIT succeeds. If the command fails, e.g. because SEV is already active or there are no available ASIDs, then es_active will be left set even though the VM is not fully SEV-ES capable. Refactor the code so that "es_active" is passed on the stack instead of being prematurely shoved into sev_info, both to avoid having to unwind sev_info and so that it's more obvious what actually consumes es_active in sev_guest_init() and its helpers. Fixes: `ad73109ae7` ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:57 -04:00
Sean Christopherson	c36b16d29f	KVM: SVM: Use online_vcpus, not created_vcpus, to iterate over vCPUs Use the kvm_for_each_vcpu() helper to iterate over vCPUs when encrypting VMSAs for SEV, which effectively switches to use online_vcpus instead of created_vcpus. This fixes a possible null-pointer dereference as created_vcpus does not guarantee a vCPU exists, since it is updated at the very beginning of KVM_CREATE_VCPU. created_vcpus exists to allow the bulk of vCPU creation to run in parallel, while still correctly restricting the max number of max vCPUs. Fixes: `ad73109ae7` ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:57 -04:00
Krish Sadhukhan	9a7de6ecc3	KVM: nSVM: If VMRUN is single-stepped, queue the #DB intercept in nested_svm_vmexit() According to APM, the #DB intercept for a single-stepped VMRUN must happen after the completion of that instruction, when the guest does #VMEXIT to the host. However, in the current implementation of KVM, the #DB intercept for a single-stepped VMRUN happens after the completion of the instruction that follows the VMRUN instruction. When the #DB intercept handler is invoked, it shows the RIP of the instruction that follows VMRUN, instead of of VMRUN itself. This is an incorrect RIP as far as single-stepping VMRUN is concerned. This patch fixes the problem by checking, in nested_svm_vmexit(), for the condition that the VMRUN instruction is being single-stepped and if so, queues the pending #DB intercept so that the #DB is accounted for before we execute L1's next instruction. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oraacle.com> Message-Id: <20210323175006.73249-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:52 -04:00
Vipin Sharma	7aef27f0b2	svm/sev: Register SEV and SEV-ES ASIDs to the misc controller Secure Encrypted Virtualization (SEV) and Secure Encrypted Virtualization - Encrypted State (SEV-ES) ASIDs are used to encrypt KVMs on AMD platform. These ASIDs are available in the limited quantities on a host. Register their capacity and usage to the misc controller for tracking via cgroups. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Rientjes <rientjes@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-04-04 13:34:46 -04:00
Paolo Bonzini	cb9b6a1b19	Merge branch 'kvm-fix-svm-races' into HEAD	2021-04-01 05:19:48 -04:00
Paolo Bonzini	6ebae23c07	Merge branch 'kvm-fix-svm-races' into kvm-master	2021-04-01 05:14:05 -04:00
Paolo Bonzini	3c346c0c60	KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit Fixing nested_vmcb_check_save to avoid all TOC/TOU races is a bit harder in released kernels, so do the bare minimum by avoiding that EFER.SVME is cleared. This is problematic because svm_set_efer frees the data structures for nested virtualization if EFER.SVME is cleared. Also check that EFER.SVME remains set after a nested vmexit; clearing it could happen if the bit is zero in the save area that is passed to KVM_SET_NESTED_STATE (the save area of the nested state corresponds to the nested hypervisor's state and is restored on the next nested vmexit). Cc: stable@vger.kernel.org Fixes: `2fcf4876ad` ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-01 05:11:35 -04:00
Paolo Bonzini	a58d9166a7	KVM: SVM: load control fields from VMCB12 before checking them Avoid races between check and use of the nested VMCB controls. This for example ensures that the VMRUN intercept is always reflected to the nested hypervisor, instead of being processed by the host. Without this patch, it is possible to end up with svm->nested.hsave pointing to the MSR permission bitmap for nested guests. This bug is CVE-2021-29657. Reported-by: Felix Wilhelm <fwilhelm@google.com> Cc: stable@vger.kernel.org Fixes: `2fcf4876ad` ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-01 05:09:31 -04:00
Vitaly Kuznetsov	1973cadd4c	KVM: x86/vPMU: Forbid writing to MSR_F15H_PERF MSRs when guest doesn't have X86_FEATURE_PERFCTR_CORE MSR_F15H_PERF_CTL0-5, MSR_F15H_PERF_CTR0-5 MSRs are only available when X86_FEATURE_PERFCTR_CORE CPUID bit was exposed to the guest. KVM, however, allows these MSRs unconditionally because kvm_pmu_is_valid_msr() -> amd_msr_idx_to_pmc() check always passes and because kvm_pmu_set_msr() -> amd_pmu_set_msr() doesn't fail. In case of a counter (CTRn), no big harm is done as we only increase internal PMC's value but in case of an eventsel (CTLn), we go deep into perf internals with a non-existing counter. Note, kvm_get_msr_common() just returns '0' when these MSRs don't exist and this also seems to contradict architectural behavior which is #GP (I did check one old Opteron host) but changing this status quo is a bit scarier. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210323084515.1346540-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-30 13:07:10 -04:00
Ingo Molnar	163b099146	x86: Fix various typos in comments, take #2 Fix another ~42 single-word typos in arch/x86/ code comments, missed a few in the first pass, in particular in .S files. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org	2021-03-21 23:50:28 +01:00
Ingo Molnar	d9f6e12fb0	x86: Fix various typos in comments Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org	2021-03-18 15:31:53 +01:00
Sean Christopherson	4a98623d5d	KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging Set the PAE roots used as decrypted to play nice with SME when KVM is using shadow paging. Explicitly skip setting the C-bit when loading CR3 for PAE shadow paging, even though it's completely ignored by the CPU. The extra documentation is nice to have. Note, there are several subtleties at play with NPT. In addition to legacy shadow paging, the PAE roots are used for SVM's NPT when either KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit NPT. However, 32-bit Linux, and thus KVM, doesn't support SME. And 64-bit KVM can happily set the C-bit in CR3. This also means that keeping __sme_set(root) for 32-bit KVM when NPT is enabled is conceptually wrong, but functionally ok since SME is 64-bit only. Leave it as is to avoid unnecessary pollution. Fixes: `d0ec49d4de` ("kvm/x86/svm: Support Secure Memory Encryption within KVM") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210309224207.1218275-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:44:08 -04:00
Sean Christopherson	e83bc09caf	KVM: x86: Get active PCID only when writing a CR3 value Retrieve the active PCID only when writing a guest CR3 value, i.e. don't get the PCID when using EPT or NPT. The PCID is especially problematic for EPT as the bits have different meaning, and so the PCID and must be manually stripped, which is annoying and unnecessary. And on VMX, getting the active PCID also involves reading the guest's CR3 and CR4.PCIDE, i.e. may add pointless VMREADs. Opportunistically rename the pgd/pgd_level params to root_hpa and root_level to better reflect their new roles. Keep the function names, as "load the guest PGD" is still accurate/correct. Last, and probably least, pass root_hpa as a hpa_t/u64 instead of an unsigned long. The EPTP holds a 64-bit value, even in 32-bit mode, so in theory EPT could support HIGHMEM for 32-bit KVM. Never mind that doing so would require changing the MMU page allocators and reworking the MMU to use kmap(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-2-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:56 -04:00
Uros Bizjak	7531b47c8a	KVM/SVM: Move vmenter.S exception fixups out of line Avoid jump by moving exception fixups out of line. Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20210226125621.111723-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:56 -04:00
Sean Christopherson	8120337a4c	KVM: x86/mmu: Stop using software available bits to denote MMIO SPTEs Stop tagging MMIO SPTEs with specific available bits and instead detect MMIO SPTEs by checking for their unique SPTE value. The value is guaranteed to be unique on shadow paging and NPT as setting reserved physical address bits on any other type of SPTE would consistute a KVM bug. Ditto for EPT, as creating a WX non-MMIO would also be a bug. Note, this approach is also future-compatibile with TDX, which will need to reflect MMIO EPT violations as #VEs into the guest. To create an EPT violation instead of a misconfig, TDX EPTs will need to have RWX=0, But, MMIO SPTEs will also be the only case where KVM clears SUPPRESS_VE, so MMIO SPTEs will still be guaranteed to have a unique value within a given MMU context. The main motivation is to make it easier to reason about which types of SPTEs use which available bits. As a happy side effect, this frees up two more bits for storing the MMIO generation. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:41 -04:00
Cathy Avery	8173396e94	KVM: nSVM: Optimize vmcb12 to vmcb02 save area copies Use the vmcb12 control clean field to determine which vmcb12.save registers were marked dirty in order to minimize register copies when switching from L1 to L2. Those vmcb12 registers marked as dirty need to be copied to L0's vmcb02 as they will be used to update the vmcb state cache for the L2 VMRUN. In the case where we have a different vmcb12 from the last L2 VMRUN all vmcb12.save registers must be copied over to L2.save. Tested: kvm-unit-tests kvm selftests Fedora L1 L2 Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20210301200844.2000-1-cavery@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:26 -04:00
Babu Moger	d00b99c514	KVM: SVM: Add support for Virtual SPEC_CTRL Newer AMD processors have a feature to virtualize the use of the SPEC_CTRL MSR. Presence of this feature is indicated via CPUID function 0x8000000A_EDX[20]: GuestSpecCtrl. Hypervisors are not required to enable this feature since it is automatically enabled on processors that support it. A hypervisor may wish to impose speculation controls on guest execution or a guest may want to impose its own speculation controls. Therefore, the processor implements both host and guest versions of SPEC_CTRL. When in host mode, the host SPEC_CTRL value is in effect and writes update only the host version of SPEC_CTRL. On a VMRUN, the processor loads the guest version of SPEC_CTRL from the VMCB. When the guest writes SPEC_CTRL, only the guest version is updated. On a VMEXIT, the guest version is saved into the VMCB and the processor returns to only using the host SPEC_CTRL for speculation control. The guest SPEC_CTRL is located at offset 0x2E0 in the VMCB. The effective SPEC_CTRL setting is the guest SPEC_CTRL setting or'ed with the hypervisor SPEC_CTRL setting. This allows the hypervisor to ensure a minimum SPEC_CTRL if desired. This support also fixes an issue where a guest may sometimes see an inconsistent value for the SPEC_CTRL MSR on processors that support this feature. With the current SPEC_CTRL support, the first write to SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it will be 0x0, instead of the actual expected value. There isn’t a security concern here, because the host SPEC_CTRL value is or’ed with the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value. KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL MSR just before the VMRUN, so it will always have the actual value even though it doesn’t appear that way in the guest. The guest will only see the proper value for the SPEC_CTRL register if the guest was to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL support, the save area spec_ctrl is properly saved and restored. So, the guest will always see the proper value when it is read back. Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <161188100955.28787.11816849358413330720.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:25 -04:00
Maxim Levitsky	cc3ed80ae6	KVM: nSVM: always use vmcb01 to for vmsave/vmload of guest state This allows to avoid copying of these fields between vmcb01 and vmcb02 on nested guest entry/exit. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:24 -04:00
Paolo Bonzini	fb0c4a4fee	KVM: SVM: move VMLOAD/VMSAVE to C code Thanks to the new macros that handle exception handling for SVM instructions, it is easier to just do the VMLOAD/VMSAVE in C. This is safe, as shown by the fact that the host reload is already done outside the assembly source. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:23 -04:00
Sean Christopherson	c8781feaf1	KVM: SVM: Skip intercepted PAUSE instructions after emulation Skip PAUSE after interception to avoid unnecessarily re-executing the instruction in the guest, e.g. after regaining control post-yield. This is a benign bug as KVM disables PAUSE interception if filtering is off, including the case where pause_filter_count is set to zero. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:22 -04:00
Sean Christopherson	32c23c7d52	KVM: SVM: Don't manually emulate RDPMC if nrips=0 Remove bizarre code that causes KVM to run RDPMC through the emulator when nrips is disabled. Accelerated emulation of RDPMC doesn't rely on any additional data from the VMCB, and SVM has generic handling for updating RIP to skip instructions when nrips is disabled. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:21 -04:00
Sean Christopherson	c483c45471	KVM: x86: Move RDPMC emulation to common code Move the entirety of the accelerated RDPMC emulation to x86.c, and assign the common handler directly to the exit handler array for VMX. SVM has bizarre nrips behavior that prevents it from directly invoking the common handler. The nrips goofiness will be addressed in a future patch. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:20 -04:00
Sean Christopherson	5ff3a351f6	KVM: x86: Move trivial instruction-based exit handlers to common code Move the trivial exit handlers, e.g. for instructions that KVM "emulates" as nops, to common x86 code. Assign the common handlers directly to the exit handler arrays and drop the vendor trampolines. Opportunistically use pr_warn_once() where appropriate. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:19 -04:00
Sean Christopherson	92f9895c14	KVM: x86: Move XSETBV emulation to common code Move the entirety of XSETBV emulation to x86.c, and assign the function directly to both VMX's and SVM's exit handlers, i.e. drop the unnecessary trampolines. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:18 -04:00
Sean Christopherson	2ac636a6ea	KVM: nSVM: Add VMLOAD/VMSAVE helper to deduplicate code Add another helper layer for VMLOAD+VMSAVE, the code is identical except for the one line that determines which VMCB is the source and which is the destination. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:17 -04:00
Sean Christopherson	3a87c7e0d1	KVM: nSVM: Add helper to synthesize nested VM-Exit without collateral Add a helper to consolidate boilerplate for nested VM-Exits that don't provide any data in exit_info_*. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210302174515.2812275-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:16 -04:00
Sean Christopherson	cb6a32c2b8	KVM: x86: Handle triple fault in L2 without killing L1 Synthesize a nested VM-Exit if L2 triggers an emulated triple fault instead of exiting to userspace, which likely will kill L1. Any flow that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that is _not_intercepted by L1. E.g. if KVM is intercepting #GPs for the VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF will cause KVM to emulate triple fault. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210302174515.2812275-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:15 -04:00
Paolo Bonzini	6312975417	KVM: SVM: Pass struct kvm_vcpu to exit handlers (and many, many other places) Refactor the svm_exit_handlers API to pass @vcpu instead of @svm to allow directly invoking common x86 exit handlers (in a future patch). Opportunistically convert an absurd number of instances of 'svm->vcpu' to direct uses of 'vcpu' to avoid pointless casting. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:09 -04:00
Paolo Bonzini	2a32a77cef	KVM: SVM: merge update_cr0_intercept into svm_set_cr0 The logic of update_cr0_intercept is pointlessly complicated. All svm_set_cr0 is compute the effective cr0 and compare it with the guest value. Inlining the function and simplifying the condition clarifies what it is doing. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:38 -04:00
Sean Christopherson	11f0cbf0c6	KVM: nSVM: Trace VM-Enter consistency check failures Use trace_kvm_nested_vmenter_failed() and its macro magic to trace consistency check failures on nested VMRUN. Tracing such failures by running the buggy VMM as a KVM guest is often the only way to get a precise explanation of why VMRUN failed. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:37 -04:00
Krish Sadhukhan	6906e06db9	KVM: nSVM: Add missing checks for reserved bits to svm_set_nested_state() The path for SVM_SET_NESTED_STATE needs to have the same checks for the CPU registers, as we have in the VMRUN path for a nested guest. This patch adds those missing checks to svm_set_nested_state(). Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20201006190654.32305-3-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:35 -04:00
Paolo Bonzini	c08f390a75	KVM: nSVM: only copy L1 non-VMLOAD/VMSAVE data in svm_set_nested_state() The VMLOAD/VMSAVE data is not taken from userspace, since it will not be restored on VMEXIT (it will be copied from VMCB02 to VMCB01). For clarity, replace the wholesale copy of the VMCB save area with a copy of that state only. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:34 -04:00
Paolo Bonzini	4bb170a543	KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit Since L1 and L2 now use different VMCBs, most of the fields remain the same in VMCB02 from one L2 run to the next. Since KVM itself is not looking at VMCB12's clean field, for now not much can be optimized. However, in the future we could avoid more copies if the VMCB12's SEG and DT sections are clean. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:33 -04:00
Paolo Bonzini	7ca62d1322	KVM: nSVM: do not mark all VMCB01 fields dirty on nested vmexit Since L1 and L2 now use different VMCBs, most of the fields remain the same from one L1 run to the next. svm_set_cr0 and other functions called by nested_svm_vmexit already take care of clearing the corresponding clean bits; only the TSC offset is special. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:32 -04:00
Paolo Bonzini	7c3ecfcd31	KVM: nSVM: do not copy vmcb01->control blindly to vmcb02->control Most fields were going to be overwritten by vmcb12 control fields, or do not matter at all because they are filled by the processor on vmexit. Therefore, we need not copy them from vmcb01 to vmcb02 on vmentry. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:32 -04:00
Paolo Bonzini	9e8f0fbfff	KVM: nSVM: rename functions and variables according to vmcbXY nomenclature Now that SVM is using a separate vmcb01 and vmcb02 (and also uses the vmcb12 naming) we can give clearer names to functions that write to and read from those VMCBs. Likewise, variables and parameters can be renamed from nested_vmcb to vmcb12. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:31 -04:00
Cathy Avery	193015adf4	KVM: nSVM: Track the ASID generation of the vmcb vmrun through the vmcb This patch moves the asid_generation from the vcpu to the vmcb in order to track the ASID generation that was active the last time the vmcb was run. If sd->asid_generation changes between two runs, the old ASID is invalid and must be changed. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20210112164313.4204-3-cavery@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:30 -04:00
Cathy Avery	af18fa775d	KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb This patch moves the physical cpu tracking from the vcpu to the vmcb in svm_switch_vmcb. If either vmcb01 or vmcb02 change physical cpus from one vmrun to the next the vmcb's previous cpu is preserved for comparison with the current cpu and the vmcb is marked dirty if different. This prevents the processor from using old cached data for a vmcb that may have been updated on a prior run on a different processor. It also moves the physical cpu check from svm_vcpu_load to pre_svm_run as the check only needs to be done at run. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20210112164313.4204-2-cavery@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:29 -04:00
Cathy Avery	4995a3685f	KVM: SVM: Use a separate vmcb for the nested L2 guest svm->vmcb will now point to a separate vmcb for L1 (not nested) or L2 (nested). The main advantages are removing get_host_vmcb and hsave, in favor of concepts that are shared with VMX. We don't need anymore to stash the L1 registers in hsave while L2 runs, but we need to copy the VMLOAD/VMSAVE registers from VMCB01 to VMCB02 and back. This more or less has the same cost, but code-wise nested_svm_vmloadsave can be reused. This patch omits several optimizations that are possible: - for simplicity there is some wholesale copying of vmcb.control areas which can go away. - we should be able to better use the VMCB01 and VMCB02 clean bits. - another possibility is to always use VMCB01 for VMLOAD and VMSAVE, thus avoiding the copy of VMLOAD/VMSAVE registers from VMCB01 to VMCB02 and back. Tested: kvm-unit-tests kvm self tests Loaded fedora nested guest on fedora Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20201011184818.3609-3-cavery@redhat.com> [Fix conflicts; keep VMCB02 G_PAT up to date whenever guest writes the PAT MSR; do not copy CR4 over from VMCB01 as it is not needed anymore; add a few more comments. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:28 -04:00
Sean Christopherson	6d1b867d04	KVM: SVM: Don't strip the C-bit from CR2 on #PF interception Don't strip the C-bit from the faulting address on an intercepted #PF, the address is a virtual address, not a physical address. Fixes: `0ede79e132` ("KVM: SVM: Clear C-bit from the page fault address") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:26 -04:00
Dongli Zhang	43c11d91fb	KVM: x86: to track if L1 is running L2 VM The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM is running or used to run L2 VM. An example of the usage of 'nested_run' is to help the host administrator to easily track if any L1 VM is used to run L2 VM. Suppose there is issue that may happen with nested virtualization, the administrator will be able to easily narrow down and confirm if the issue is due to nested virtualization via 'nested_run'. For example, whether the fix like commit `88dddc11a8` ("KVM: nVMX: do not use dangling shadow VMCS after guest reset") is required. Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:28:02 -04:00
Sean Christopherson	99840a7545	KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled' Directly connect the 'npt' param to the 'npt_enabled' variable so that runtime adjustments to npt_enabled are reflected in sysfs. Move the !PAE restriction to a runtime check to ensure NPT is forced off if the host is using 2-level paging, and add a comment explicitly stating why NPT requires a 64-bit kernel or a kernel with PAE enabled. Opportunistically switch the param to octal permissions. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305021637.3768573-1-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-05 08:33:15 -05:00
Babu Moger	9e46f6c6c9	KVM: SVM: Clear the CR4 register on reset This problem was reported on a SVM guest while executing kexec. Kexec fails to load the new kernel when the PCID feature is enabled. When kexec starts loading the new kernel, it starts the process by resetting the vCPU's and then bringing each vCPU online one by one. The vCPU reset is supposed to reset all the register states before the vCPUs are brought online. However, the CR4 register is not reset during this process. If this register is already setup during the last boot, all the flags can remain intact. The X86_CR4_PCIDE bit can only be enabled in long mode. So, it must be enabled much later in SMP initialization. Having the X86_CR4_PCIDE bit set during SMP boot can cause a boot failures. Fix the issue by resetting the CR4 register in init_vmcb(). Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <161471109108.30811.6392805173629704166.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-02 14:39:11 -05:00
Sean Christopherson	2df8d3807c	KVM: SVM: Fix nested VM-Exit on #GP interception handling Fix the interpreation of nested_svm_vmexit()'s return value when synthesizing a nested VM-Exit after intercepting an SVM instruction while L2 was running. The helper returns '0' on success, whereas a return value of '0' in the exit handler path means "exit to userspace". The incorrect return value causes KVM to exit to userspace without filling the run state, e.g. QEMU logs "KVM: unknown exit, hardware reason 0". Fixes: `14c2bf81fc` ("KVM: SVM: Fix #GP handling for doubly-nested virtualization") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210224005627.657028-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-25 05:13:05 -05:00
Paolo Bonzini	d2df592fd8	KVM: nSVM: prepare guest save area while is_guest_mode is true Right now, enter_svm_guest_mode is calling nested_prepare_vmcb_save and nested_prepare_vmcb_control. This results in is_guest_mode being false until the end of nested_prepare_vmcb_control. This is a problem because nested_prepare_vmcb_save can in turn cause changes to the intercepts and these have to be applied to the "host VMCB" (stored in svm->nested.hsave) and then merged with the VMCB12 intercepts into svm->vmcb. In particular, without this change we forget to set the CR0 read and CR0 write intercepts when running a real mode L2 guest with NPT disabled. The guest is therefore able to see the CR0.PG bit that KVM sets to enable "paged real mode". This patch fixes the svm.flat mode_switch test case with npt=0. There are no other problematic calls in nested_prepare_vmcb_save. Moving is_guest_mode to the end is done since commit `06fc777269` ("KVM: SVM: Activate nested state only when guest state is complete", 2010-04-25). However, back then KVM didn't grab a different VMCB when updating the intercepts, it had already copied/merged L1's stuff to L0's VMCB, and then updated L0's VMCB regardless of is_nested(). Later recalc_intercepts was introduced in commit `384c636843` ("KVM: SVM: Add function to recalculate intercept masks", 2011-01-12). This introduced the bug, because recalc_intercepts now throws away the intercept manipulations that svm_set_cr0 had done in the meanwhile to svm->vmcb. [1] https://lore.kernel.org/kvm/1266493115-28386-1-git-send-email-joerg.roedel@amd.com/ Reviewed-by: Sean Christopherson <seanjc@google.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-22 13:11:56 -05:00
Paolo Bonzini	a04aead144	KVM: nSVM: fix running nested guests when npt=0 In case of npt=0 on host, nSVM needs the same .inject_page_fault tweak as VMX has, to make sure that shadow mmu faults are injected as vmexits. It is not clear why this is needed at all, but for now keep the same code as VMX and we'll fix it for both. Based on a patch by Maxim Levitsky <mlevitsk@redhat.com>. Fixes: `7c86663b68` ("KVM: nSVM: inject exceptions via svm_check_nested_events") Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-18 07:33:30 -05:00
Maxim Levitsky	954f419ba8	KVM: nSVM: move nested vmrun tracepoint to enter_svm_guest_mode This way trace will capture all the nested mode entries (including entries after migration, and from smm) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210217145718.1217358-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-18 07:33:30 -05:00
Sean Christopherson	e420333422	KVM: x86: Advertise INVPCID by default Advertise INVPCID by default (if supported by the host kernel) instead of having both SVM and VMX opt in. INVPCID was opt in when it was a VMX only feature so that KVM wouldn't prematurely advertise support if/when it showed up in the kernel on AMD hardware. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210212003411.1102677-3-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-18 07:33:29 -05:00
Sean Christopherson	0a8ed2eaac	KVM: SVM: Intercept INVPCID when it's disabled to inject #UD Intercept INVPCID if it's disabled in the guest, even when using NPT, as KVM needs to inject #UD in this case. Fixes: `4407a797e9` ("KVM: SVM: Enable INVPCID feature on AMD") Cc: Babu Moger <babu.moger@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210212003411.1102677-2-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-18 07:33:28 -05:00
Wei Yongjun	2e215216d6	KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static The sparse tool complains as follows: arch/x86/kvm/svm/svm.c:204:6: warning: symbol 'svm_gp_erratum_intercept' was not declared. Should it be static? This symbol is not used outside of svm.c, so this commit marks it static. Fixes: `82a11e9c6f` ("KVM: SVM: Add emulation support for #GP triggered by SVM instructions") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Message-Id: <20210210075958.1096317-1-weiyongjun1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-11 08:02:08 -05:00
Paolo Bonzini	996ff5429e	KVM: x86: move kvm_inject_gp up from kvm_set_dr to callers Push the injection of #GP up to the callers, so that they can just use kvm_complete_insn_gp. __kvm_set_dr is pretty much what the callers can use together with kvm_complete_insn_gp, so rename it to kvm_set_dr and drop the old kvm_set_dr wrapper. This also allows nested VMX code, which really wanted to use __kvm_set_dr, to use the right function. While at it, remove the kvm_require_dr() check from the SVM interception. The APM states: All normal exception checks take precedence over the SVM intercepts. which includes the CR4.DE=1 #UD. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-09 08:17:07 -05:00
Sean Christopherson	6f7a343987	KVM: SVM: Remove an unnecessary forward declaration Drop a defunct forward declaration of svm_complete_interrupts(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-09 08:17:06 -05:00
Sean Christopherson	e6c804a848	KVM: SVM: Move AVIC vCPU kicking snippet to helper function Add a helper function to handle kicking non-running vCPUs when sending virtual IPIs. A future patch will change SVM's interception functions to take @vcpu instead of @svm, at which piont declaring and modifying 'vcpu' in a case statement is confusing, and potentially dangerous. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-09 08:17:06 -05:00
Sean Christopherson	ca29e14506	KVM: x86: SEV: Treat C-bit as legal GPA bit regardless of vCPU mode Rename cr3_lm_rsvd_bits to reserved_gpa_bits, and use it for all GPA legality checks. AMD's APM states: If the C-bit is an address bit, this bit is masked from the guest physical address when it is translated through the nested page tables. Thus, any access that can conceivably be run through NPT should ignore the C-bit when checking for validity. For features that KVM emulates in software, e.g. MTRRs, there is no clear direction in the APM for how the C-bit should be handled. For such cases, follow the SME behavior inasmuch as possible, since SEV is is essentially a VM-specific variant of SME. For SME, the APM states: In this case the upper physical address bits are treated as reserved when the feature is enabled except where otherwise indicated. Collecting the various relavant SME snippets in the APM and cross- referencing the omissions with Linux kernel code, this leaves MTTRs and APIC_BASE as the only flows that KVM emulates that should _not_ ignore the C-bit. Note, this means the reserved bit checks in the page tables are technically broken. This will be remedied in a future patch. Although the page table checks are technically broken, in practice, it's all but guaranteed to be irrelevant. NPT is required for SEV, i.e. shadowing page tables isn't needed in the common case. Theoretically, the checks could be in play for nested NPT, but it's extremely unlikely that anyone is running nested VMs on SEV, as doing so would require L1 to expose sensitive data to L0, e.g. the entire VMCB. And if anyone is running nested VMs, L0 can't read the guest's encrypted memory, i.e. L1 would need to put its NPT in shared memory, in which case the C-bit will never be set. Or, L1 could use shadow paging, but again, if L0 needs to read page tables, e.g. to load PDPTRs, the memory can't be encrypted if L1 has any expectation of L0 doing the right thing. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 09:27:29 -05:00
Sean Christopherson	bbc2c63ddd	KVM: nSVM: Use common GPA helper to check for illegal CR3 Replace an open coded check for an invalid CR3 with its equivalent helper. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 09:27:28 -05:00
Sean Christopherson	2732be9023	KVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs Don't clear the SME C-bit when reading a guest PDPTR, as the GPA (CR3) is in the guest domain. Barring a bizarre paravirtual use case, this is likely a benign bug. SME is not emulated by KVM, loading SEV guest PDPTRs is doomed as KVM can't use the correct key to read guest memory, and setting guest MAXPHYADDR higher than the host, i.e. overlapping the C-bit, would cause faults in the guest. Note, for SEV guests, stripping the C-bit is technically aligned with CPU behavior, but for KVM it's the greater of two evils. Because KVM doesn't have access to the guest's encryption key, ignoring the C-bit would at best result in KVM reading garbage. By keeping the C-bit, KVM will fail its read (unless userspace creates a memslot with the C-bit set). The guest will still undoubtedly die, as KVM will use '0' for the PDPTR value, but that's preferable to interpreting encrypted data as a PDPTR. Fixes: `d0ec49d4de` ("kvm/x86/svm: Support Secure Memory Encryption within KVM") Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 09:27:27 -05:00
Paolo Bonzini	bbefd4fc8f	KVM: x86: move kvm_inject_gp up from kvm_set_xcr to callers Push the injection of #GP up to the callers, so that they can just use kvm_complete_insn_gp. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:37 -05:00
Krish Sadhukhan	04548ed020	KVM: SVM: Replace hard-coded value with #define Replace the hard-coded value for bit# 1 in EFLAGS, with the available #define. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20210203012842.101447-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:35 -05:00
Michael Roth	a7fc06dd2f	KVM: SVM: use .prepare_guest_switch() to handle CPU register save/setup Currently we save host state like user-visible host MSRs, and do some initial guest register setup for MSR_TSC_AUX and MSR_AMD64_TSC_RATIO in svm_vcpu_load(). Defer this until just before we enter the guest by moving the handling to kvm_x86_ops.prepare_guest_switch() similarly to how it is done for the VMX implementation. Additionally, since handling of saving/restoring host user MSRs is the same both with/without SEV-ES enabled, move that handling to common code. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20210202190126.2185715-4-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:35 -05:00
Michael Roth	553cc15f6e	KVM: SVM: remove uneeded fields from host_save_users_msrs Now that the set of host user MSRs that need to be individually saved/restored are the same with/without SEV-ES, we can drop the .sev_es_restored flag and just iterate through the list unconditionally for both cases. A subsequent patch can then move these loops to a common path. Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20210202190126.2185715-3-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:34 -05:00
Michael Roth	e79b91bb3c	KVM: SVM: use vmsave/vmload for saving/restoring additional host state Using a guest workload which simply issues 'hlt' in a tight loop to generate VMEXITs, it was observed (on a recent EPYC processor) that a significant amount of the VMEXIT overhead measured on the host was the result of MSR reads/writes in svm_vcpu_load/svm_vcpu_put according to perf: 67.49%--kvm_arch_vcpu_ioctl_run \| \|--23.13%--vcpu_put \| kvm_arch_vcpu_put \| \| \| \|--21.31%--native_write_msr \| \| \| --1.27%--svm_set_cr4 \| \|--16.11%--vcpu_load \| \| \| --15.58%--kvm_arch_vcpu_load \| \| \| \|--13.97%--svm_set_cr4 \| \| \| \| \| \|--12.64%--native_read_msr Most of these MSRs relate to 'syscall'/'sysenter' and segment bases, and can be saved/restored using 'vmsave'/'vmload' instructions rather than explicit MSR reads/writes. In doing so there is a significant reduction in the svm_vcpu_load/svm_vcpu_put overhead measured for the above workload: 50.92%--kvm_arch_vcpu_ioctl_run \| \|--19.28%--disable_nmi_singlestep \| \|--13.68%--vcpu_load \| kvm_arch_vcpu_load \| \| \| \|--9.19%--svm_set_cr4 \| \| \| \| \| --6.44%--native_read_msr \| \| \| --3.55%--native_write_msr \| \|--6.05%--kvm_inject_nmi \|--2.80%--kvm_sev_es_mmio_read \|--2.19%--vcpu_put \| \| \| --1.25%--kvm_arch_vcpu_put \| native_write_msr Quantifying this further, if we look at the raw cycle counts for a normal iteration of the above workload (according to 'rdtscp'), kvm_arch_vcpu_ioctl_run() takes ~4600 cycles from start to finish with the current behavior. Using 'vmsave'/'vmload', this is reduced to ~2800 cycles, a savings of 39%. While this approach doesn't seem to manifest in any noticeable improvement for more realistic workloads like UnixBench, netperf, and kernel builds, likely due to their exit paths generally involving IO with comparatively high latencies, it does improve overall overhead of KVM_RUN significantly, which may still be noticeable for certain situations. It also simplifies some aspects of the code. With this change, explicit save/restore is no longer needed for the following host MSRs, since they are documented[1] as being part of the VMCB State Save Area: MSR_STAR, MSR_LSTAR, MSR_CSTAR, MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, MSR_FS_BASE, MSR_GS_BASE and only the following MSR needs individual handling in svm_vcpu_put/svm_vcpu_load: MSR_TSC_AUX We could drop the host_save_user_msrs array/loop and instead handle MSR read/write of MSR_TSC_AUX directly, but we leave that for now as a potential follow-up. Since 'vmsave'/'vmload' also handles the LDTR and FS/GS segment registers (and associated hidden state)[2], some of the code previously used to handle this is no longer needed, so we drop it as well. The first public release of the SVM spec[3] also documents the same handling for the host state in question, so we make these changes unconditionally. Also worth noting is that we 'vmsave' to the same page that is subsequently used by 'vmrun' to record some host additional state. This is okay, since, in accordance with the spec[2], the additional state written to the page by 'vmrun' does not overwrite any fields written by 'vmsave'. This has also been confirmed through testing (for the above CPU, at least). [1] AMD64 Architecture Programmer's Manual, Rev 3.33, Volume 2, Appendix B, Table B-2 [2] AMD64 Architecture Programmer's Manual, Rev 3.31, Volume 3, Chapter 4, VMSAVE/VMLOAD [3] Secure Virtual Machine Architecture Reference Manual, Rev 3.01 Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20210202190126.2185715-2-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:34 -05:00
Sean Christopherson	35a7831912	KVM: SVM: Use asm goto to handle unexpected #UD on SVM instructions Add svm_asm() macros, a la the existing vmx_asm() macros, to handle faults on SVM instructions instead of using the generic __ex(), a.k.a. __kvm_handle_fault_on_reboot(). Using asm goto generates slightly better code as it eliminates the in-line JMP+CALL sequences that are needed by __kvm_handle_fault_on_reboot() to avoid triggering BUG() from fixup (which generates bad stack traces). Using SVM specific macros also drops the last user of __ex() and the the last asm linkage to kvm_spurious_fault(), and adds a helper for VMSAVE, which may gain an addition call site in the future (as part of optimizing the SVM context switching). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201231002702.2223707-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:33 -05:00
Jason Baron	b6a7cc3544	KVM: X86: prepend vmx/svm prefix to additional kvm_x86_ops functions A subsequent patch introduces macros in preparation for simplifying the definition for vmx_x86_ops and svm_x86_ops. Making the naming more uniform expands the coverage of the macros. Add vmx/svm prefix to the following functions: update_exception_bitmap(), enable_nmi_window(), enable_irq_window(), update_cr8_intercept and enable_smi_window(). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Message-Id: <ed594696f8e2c2b2bfc747504cee9bbb2a269300.1610680941.git.jbaron@akamai.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:29 -05:00
Wei Huang	14c2bf81fc	KVM: SVM: Fix #GP handling for doubly-nested virtualization Under the case of nested on nested (L0, L1, L2 are all hypervisors), we do not support emulation of the vVMLOAD/VMSAVE feature, the L0 hypervisor can inject the proper #VMEXIT to inform L1 of what is happening and L1 can avoid invoking the #GP workaround. For this reason we turns on guest VM's X86_FEATURE_SVME_ADDR_CHK bit for KVM running inside VM to receive the notification and change behavior. Similarly we check if vcpu is under guest mode before emulating the vmware-backdoor instructions. For the case of nested on nested, we let the guest handle it. Co-developed-by: Bandan Das <bsd@redhat.com> Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Tested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210126081831.570253-5-wei.huang2@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:28 -05:00
Wei Huang	3b9c723ed7	KVM: SVM: Add support for SVM instruction address check change New AMD CPUs have a change that checks #VMEXIT intercept on special SVM instructions before checking their EAX against reserved memory region. This change is indicated by CPUID_0x8000000A_EDX[28]. If it is 1, #VMEXIT is triggered before #GP. KVM doesn't need to intercept and emulate #GP faults as #GP is supposed to be triggered. Co-developed-by: Bandan Das <bsd@redhat.com> Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210126081831.570253-4-wei.huang2@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:28 -05:00
Bandan Das	82a11e9c6f	KVM: SVM: Add emulation support for #GP triggered by SVM instructions While running SVM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD CPUs check EAX against reserved memory regions (e.g. SMM memory on host) before checking VMCB's instruction intercept. If EAX falls into such memory areas, #GP is triggered before VMEXIT. This causes problem under nested virtualization. To solve this problem, KVM needs to trap #GP and check the instructions triggering #GP. For VM execution instructions, KVM emulates these instructions. Co-developed-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Bandan Das <bsd@redhat.com> Message-Id: <20210126081831.570253-3-wei.huang2@amd.com> [Conditionally enable #GP intercept. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:28 -05:00
Chenyi Qiang	9a3ecd5e2a	KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW DR6_INIT contains the 1-reserved bits as well as the bit that is cleared to 0 when the condition (e.g. RTM) happens. The value can be used to initialize dr6 and also be the XOR mask between the #DB exit qualification (or payload) and DR6. Concerning that DR6_INIT is used as initial value only once, rename it to DR6_ACTIVE_LOW and apply it in other places, which would make the incoming changes for bus lock debug exception more simple. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20210202090433.13441-2-chenyi.qiang@intel.com> [Define DR6_FIXED_1 from DR6_ACTIVE_LOW and DR6_VOLATILE. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:27 -05:00
Brijesh Singh	2c07ded064	KVM/SVM: add support for SEV attestation command The SEV FW version >= 0.23 added a new command that can be used to query the attestation report containing the SHA-256 digest of the guest memory encrypted through the KVM_SEV_LAUNCH_UPDATE_{DATA, VMSA} commands and sign the report with the Platform Endorsement Key (PEK). See the SEV FW API spec section 6.8 for more details. Note there already exist a command (KVM_SEV_LAUNCH_MEASURE) that can be used to get the SHA-256 digest. The main difference between the KVM_SEV_LAUNCH_MEASURE and KVM_SEV_ATTESTATION_REPORT is that the latter can be called while the guest is running and the measurement value is signed with PEK. Cc: James Bottomley <jejb@linux.ibm.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: David Rientjes <rientjes@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: John Allen <john.allen@amd.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: linux-crypto@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: David Rientjes <rientjes@google.com> Tested-by: James Bottomley <jejb@linux.ibm.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Message-Id: <20210104151749.30248-1-brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:20 -05:00
Paolo Bonzini	c1c35cf78b	KVM: x86: cleanup CR3 reserved bits checks If not in long mode, the low bits of CR3 are reserved but not enforced to be zero, so remove those checks. If in long mode, however, the MBZ bits extend down to the highest physical address bit of the guest, excluding the encryption bit. Make the checks consistent with the above, and match them between nested_vmcb_checks and KVM_SET_SREGS. Cc: stable@vger.kernel.org Fixes: `761e416934` ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests") Fixes: `a780a3ea62` ("KVM: X86: Fix reserved bits check for MOV to CR3") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-03 04:30:38 -05:00
Sean Christopherson	ccd85d90ce	KVM: SVM: Treat SVM as unsupported when running as an SEV guest Don't let KVM load when running as an SEV guest, regardless of what CPUID says. Memory is encrypted with a key that is not accessible to the host (L0), thus it's impossible for L0 to emulate SVM, e.g. it'll see garbage when reading the VMCB. Technically, KVM could decrypt all memory that needs to be accessible to the L0 and use shadow paging so that L0 does not need to shadow NPT, but exposing such information to L0 largely defeats the purpose of running as an SEV guest. This can always be revisited if someone comes up with a use case for running VMs inside SEV guests. Note, VMLOAD, VMRUN, etc... will also #GP on GPAs with C-bit set, i.e. KVM is doomed even if the SEV guest is debuggable and the hypervisor is willing to decrypt the VMCB. This may or may not be fixed on CPUs that have the SVME_ADDR_CHK fix. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210202212017.2486595-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-03 04:30:37 -05:00
Peter Gonda	19a23da539	Fix unsynchronized access to sev members through svm_register_enc_region Grab kvm->lock before pinning memory when registering an encrypted region; sev_pin_memory() relies on kvm->lock being held to ensure correctness when checking and updating the number of pinned pages. Add a lockdep assertion to help prevent future regressions. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Fixes: `1e80fdc09d` ("KVM: SVM: Pin guest memory when SEV is active") Signed-off-by: Peter Gonda <pgonda@google.com> V2 - Fix up patch description - Correct file paths svm.c -> sev.c - Add unlock of kvm->lock on sev_pin_memory error V1 - https://lore.kernel.org/kvm/20210126185431.1824530-1-pgonda@google.com/ Message-Id: <20210127161524.2832400-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-28 13:03:14 -05:00
Paolo Bonzini	9a78e15802	KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS, which may need to be loaded outside guest mode. Therefore we cannot WARN in that case. However, that part of nested_get_vmcs12_pages is _not_ needed at vmentry time. Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling, so that both vmentry and migration (and in the latter case, independent of is_guest_mode) do the parts that are needed. Cc: <stable@vger.kernel.org> # 5.10.x: `f2c7ef3ba`: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES Cc: <stable@vger.kernel.org> # 5.10.x Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-25 18:54:09 -05:00
Sean Christopherson	250091409a	KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the GRPs' dirty bits are set from time zero and never cleared, i.e. will always be seen as dirty. The obvious alternative would be to clear the dirty bits when appropriate, but removing the dirty checks is desirable as it allows reverting GPR dirty+available tracking, which adds overhead to all flavors of x86 VMs. Note, unconditionally writing the GPRs in the GHCB is tacitly allowed by the GHCB spec, which allows the hypervisor (or guest) to provide unnecessary info; it's the guest's responsibility to consume only what it needs (the hypervisor is untrusted after all). The guest and hypervisor can supply additional state if desired but must not rely on that additional state being provided. Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Fixes: `291bd20d5d` ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210122235049.3107620-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-25 18:52:09 -05:00
Lorenzo Brescia	d95df95106	kvm: tracing: Fix unmatched kvm_entry and kvm_exit events On VMX, if we exit and then re-enter immediately without leaving the vmx_vcpu_run() function, the kvm_entry event is not logged. That means we will see one (or more) kvm_exit, without its (their) corresponding kvm_entry, as shown here: CPU-1979 [002] 89.871187: kvm_entry: vcpu 1 CPU-1979 [002] 89.871218: kvm_exit: reason MSR_WRITE CPU-1979 [002] 89.871259: kvm_exit: reason MSR_WRITE It also seems possible for a kvm_entry event to be logged, but then we leave vmx_vcpu_run() right away (if vmx->emulation_required is true). In this case, we will have a spurious kvm_entry event in the trace. Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run() (where trace_kvm_exit() already is). A trace obtained with this patch applied looks like this: CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0 CPU-14295 [000] 8388.395392: kvm_exit: reason MSR_WRITE CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0 CPU-14295 [000] 8388.395503: kvm_exit: reason EXTERNAL_INTERRUPT Of course, not calling trace_kvm_entry() in common x86 code any longer means that we need to adjust the SVM side of things too. Signed-off-by: Lorenzo Brescia <lorenzo.brescia@edu.unito.it> Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-25 18:52:08 -05:00
Tom Lendacky	647daca25d	KVM: SVM: Add support for booting APs in an SEV-ES guest Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence, where the guest vCPU register state is updated and then the vCPU is VMRUN to begin execution of the AP. For an SEV-ES guest, this won't work because the guest register state is encrypted. Following the GHCB specification, the hypervisor must not alter the guest register state, so KVM must track an AP/vCPU boot. Should the guest want to park the AP, it must use the AP Reset Hold exit event in place of, for example, a HLT loop. First AP boot (first INIT-SIPI-SIPI sequence): Execute the AP (vCPU) as it was initialized and measured by the SEV-ES support. It is up to the guest to transfer control of the AP to the proper location. Subsequent AP boot: KVM will expect to receive an AP Reset Hold exit event indicating that the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to awaken it. When the AP Reset Hold exit event is received, KVM will place the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI sequence, KVM will make the vCPU runnable. It is again up to the guest to then transfer control of the AP to the proper location. To differentiate between an actual HLT and an AP Reset Hold, a new MP state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is placed in upon receiving the AP Reset Hold exit event. Additionally, to communicate the AP Reset Hold exit event up to userspace (if needed), a new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD. A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds a new function that, for non SEV-ES guests, invokes the original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests, implements the logic above. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:11:37 -05:00
Maxim Levitsky	f2c7ef3ba9	KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit It is possible to exit the nested guest mode, entered by svm_set_nested_state prior to first vm entry to it (e.g due to pending event) if the nested run was not pending during the migration. In this case we must not switch to the nested msr permission bitmap. Also add a warning to catch similar cases in the future. Fixes: `a7d5c7ce41` ("KVM: nSVM: delay MSR permission processing to first nested VM run") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210107093854.882483-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:11:35 -05:00
Maxim Levitsky	56fe28de8c	KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode We overwrite most of vmcb fields while doing so, so we must mark it as dirty. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210107093854.882483-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:11:34 -05:00
Maxim Levitsky	81f76adad5	KVM: nSVM: correctly restore nested_run_pending on migration The code to store it on the migration exists, but no code was restoring it. One of the side effects of fixing this is that L1->L2 injected events are no longer lost when migration happens with nested run pending. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210107093854.882483-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:11:33 -05:00
Uros Bizjak	52782d5b63	KVM/SVM: Remove leftover __svm_vcpu_run prototype from svm.c Commit `16809ecdc1` moved __svm_vcpu_run the prototype to svm.h, but forgot to remove the original from svm.c. Fixes: `16809ecdc1` ("KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests") Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20201220200339.65115-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:07:28 -05:00
Nathan Chancellor	f65cf84ee7	KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load When using LLVM's integrated assembler (LLVM_IAS=1) while building x86_64_defconfig + CONFIG_KVM=y + CONFIG_KVM_AMD=y, the following build error occurs: $ make LLVM=1 LLVM_IAS=1 arch/x86/kvm/svm/sev.o arch/x86/kvm/svm/sev.c:2004:15: error: too few operands for instruction asm volatile(__ex("vmsave") : : "a" (__sme_page_pa(sd->save_area)) : "memory"); ^ arch/x86/kvm/svm/sev.c:28:17: note: expanded from macro '__ex' #define __ex(x) __kvm_handle_fault_on_reboot(x) ^ ./arch/x86/include/asm/kvm_host.h:1646:10: note: expanded from macro '__kvm_handle_fault_on_reboot' "666: \n\t" \ ^ <inline asm>:2:2: note: instantiated into assembly here vmsave ^ 1 error generated. This happens because LLVM currently does not support calling vmsave without the fixed register operand (%rax for 64-bit and %eax for 32-bit). This will be fixed in LLVM 12 but the kernel currently supports LLVM 10.0.1 and newer so this needs to be handled. Add the proper register using the _ASM_AX macro, which matches the vmsave call in vmenter.S. Fixes: `861377730a` ("KVM: SVM: Provide support for SEV-ES vCPU loading") Link: https://reviews.llvm.org/D93524 Link: https://github.com/ClangBuiltLinux/linux/issues/1216 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Message-Id: <20201219063711.3526947-1-natechancellor@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:07:27 -05:00
Paolo Bonzini	bc351f0726	Merge branch 'kvm-master' into kvm-next Fixes to get_mmio_spte, destined to 5.10 stable branch.	2021-01-07 18:06:52 -05:00
Paolo Bonzini	d45f89f743	KVM: SVM: fix 32-bit compilation VCPU_REGS_R8...VCPU_REGS_R15 are not defined on 32-bit x86, so cull them from the synchronization of the VMSA. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-16 13:08:21 -05:00
Tom Lendacky	8640ca588b	KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting The GHCB specification requires the hypervisor to save the address of an AP Jump Table so that, for example, vCPUs that have been parked by UEFI can be started by the OS. Provide support for the AP Jump Table set/get exit code. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 12:47:53 -05:00
Tom Lendacky	ad73109ae7	KVM: SVM: Provide support to launch and run an SEV-ES guest An SEV-ES guest is started by invoking a new SEV initialization ioctl, KVM_SEV_ES_INIT. This identifies the guest as an SEV-ES guest, which is used to drive the appropriate ASID allocation, VMSA encryption, etc. Before being able to run an SEV-ES vCPU, the vCPU VMSA must be encrypted and measured. This is done using the LAUNCH_UPDATE_VMSA command after all calls to LAUNCH_UPDATE_DATA have been performed, but before LAUNCH_MEASURE has been performed. In order to establish the encrypted VMSA, the current (traditional) VMSA and the GPRs are synced to the page that will hold the encrypted VMSA and then LAUNCH_UPDATE_VMSA is invoked. The vCPU is then marked as having protected guest state. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <e9643245adb809caf3a87c09997926d2f3d6ff41.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:21:00 -05:00
Tom Lendacky	16809ecdc1	KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests The run sequence is different for an SEV-ES guest compared to a legacy or even an SEV guest. The guest vCPU register state of an SEV-ES guest will be restored on VMRUN and saved on VMEXIT. There is no need to restore the guest registers directly and through VMLOAD before VMRUN and no need to save the guest registers directly and through VMSAVE on VMEXIT. Update the svm_vcpu_run() function to skip register state saving and restoring and provide an alternative function for running an SEV-ES guest in vmenter.S Additionally, certain host state is restored across an SEV-ES VMRUN. As a result certain register states are not required to be restored upon VMEXIT (e.g. FS, GS, etc.), so only do that if the guest is not an SEV-ES guest. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <fb1c66d32f2194e171b95fc1a8affd6d326e10c1.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:59 -05:00

1 2 3 4 5 ...

374 Commits