2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

747 Commits

Author SHA1 Message Date
Paolo Bonzini
79367a6574 KVM: x86: introduce linear_{read,write}_system
Wrap the common invocation of ctxt->ops->read_std and ctxt->ops->write_std, so
as to have a smaller patch when the functions grow another argument.

Fixes: 129a72a0d3 ("KVM: x86: Introduce segmented_write_std", 2017-01-12)
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-12 15:06:15 +02:00
Wanpeng Li
a780a3ea62 KVM: X86: Fix reserved bits check for MOV to CR3
MSB of CR3 is a reserved bit if the PCIDE bit is not set in CR4.
It should be checked when PCIDE bit is not set, however commit
'd1cd3ce900441 ("KVM: MMU: check guest CR3 reserved bits based on
its physical address width")' removes the bit 63 checking
unconditionally. This patch fixes it by checking bit 63 of CR3
when PCIDE bit is not set in CR4.

Fixes: d1cd3ce900 (KVM: MMU: check guest CR3 reserved bits based on its physical address width)
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Reviewed-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-05-14 18:14:16 +02:00
Stefan Fritsch
29916968c4 kvm: Add emulation for movups/movupd
This is very similar to the aligned versions movaps/movapd.

We have seen the corresponding emulation failures with openbsd as guest
and with Windows 10 with intel HD graphics pass through.

Signed-off-by: Christian Ehrhardt <christian_ehrhardt@genua.de>
Signed-off-by: Stefan Fritsch <sf@sfritsch.de>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04 17:52:46 +02:00
Arbel Moshe
2d7921c499 KVM: x86: Add support for VMware backdoor Pseudo-PMCs
VMware exposes the following Pseudo PMCs:
0x10000: Physical host TSC
0x10001: Elapsed real time in ns
0x10002: Elapsed apparent time in ns

For more info refer to:
https://www.vmware.com/files/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf

VMware allows access to these Pseduo-PMCs even when read via RDPMC
in Ring3 and CR4.PCE=0. Therefore, commit modifies x86 emulator
to allow access to these PMCs in this situation. In addition,
emulation of these PMCs were added to kvm_pmu_rdpmc().

Signed-off-by: Arbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-16 22:02:01 +01:00
Liran Alon
9a29d449e3 KVM: x86: Always allow access to VMware backdoor I/O ports
VMware allows access to these ports even if denied
by TSS I/O permission bitmap. Mimic behavior.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-16 22:01:40 +01:00
Radim Krčmář
7bf14c28ee Merge branch 'x86/hyperv' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Topic branch for stable KVM clockource under Hyper-V.

Thanks to Christoffer Dall for resolving the ARM conflict.
2018-02-01 15:04:17 +01:00
Linus Torvalds
6304672b7f Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/pti updates from Thomas Gleixner:
 "Another set of melted spectrum related changes:

   - Code simplifications and cleanups for RSB and retpolines.

   - Make the indirect calls in KVM speculation safe.

   - Whitelist CPUs which are known not to speculate from Meltdown and
     prepare for the new CPUID flag which tells the kernel that a CPU is
     not affected.

   - A less rigorous variant of the module retpoline check which merily
     warns when a non-retpoline protected module is loaded and reflects
     that fact in the sysfs file.

   - Prepare for Indirect Branch Prediction Barrier support.

   - Prepare for exposure of the Speculation Control MSRs to guests, so
     guest OSes which depend on those "features" can use them. Includes
     a blacklist of the broken microcodes. The actual exposure of the
     MSRs through KVM is still being worked on"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Simplify indirect_branch_prediction_barrier()
  x86/retpoline: Simplify vmexit_fill_RSB()
  x86/cpufeatures: Clean up Spectre v2 related CPUID flags
  x86/cpu/bugs: Make retpoline module warning conditional
  x86/bugs: Drop one "mitigation" from dmesg
  x86/nospec: Fix header guards names
  x86/alternative: Print unadorned pointers
  x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support
  x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes
  x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown
  x86/msr: Add definitions for new speculation control MSRs
  x86/cpufeatures: Add AMD feature bits for Speculation Control
  x86/cpufeatures: Add Intel feature bits for Speculation Control
  x86/cpufeatures: Add CPUID_7_EDX CPUID leaf
  module/retpoline: Warn about missing retpoline in module
  KVM: VMX: Make indirect call speculation safe
  KVM: x86: Make indirect calls in emulator speculation safe
2018-01-29 19:08:02 -08:00
Peter Zijlstra
1a29b5b7f3 KVM: x86: Make indirect calls in emulator speculation safe
Replace the indirect calls with CALL_NOSPEC.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: rga@amazon.de
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: https://lkml.kernel.org/r/20180125095843.595615683@infradead.org
2018-01-25 11:30:07 +01:00
Paolo Bonzini
fae1a3e775 kvm: x86: fix RSM when PCID is non-zero
rsm_load_state_64() and rsm_enter_protected_mode() load CR3, then
CR4 & ~PCIDE, then CR0, then CR4.

However, setting CR4.PCIDE fails if CR3[11:0] != 0.  It's probably easier
in the long run to replace rsm_enter_protected_mode() with an emulator
callback that sets all the special registers (like KVM_SET_SREGS would
do).  For now, set the PCID field of CR3 only after CR4.PCIDE is 1.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Fixes: 660a5d517a
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-21 12:59:54 +01:00
Paolo Bonzini
fb6d4d340e KVM: x86: emulate RDPID
This is encoded as F3 0F C7 /7 with a register argument.  The register
argument is the second array in the group9 GroupDual, while F3 is the
fourth element of a Prefix.

Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:26:40 +01:00
Paolo Bonzini
dd307d017b KVM: x86: emulate sldt and str
These are needed to handle the descriptor table vmexits when emulating
UMIP.

Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:26:38 +01:00
Paolo Bonzini
ae3e61e1c2 KVM: x86: add support for UMIP
Add the CPUID bits, make the CR4.UMIP bit not reserved anymore, and
add UMIP support for instructions that are already emulated by KVM.

Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:26:38 +01:00
Rik van Riel
6ab0b9feb8 x86,kvm: remove KVM emulator get_fpu / put_fpu
Now that get_fpu and put_fpu do nothing, because the scheduler will
automatically load and restore the guest FPU context for us while we
are in this code (deep inside the vcpu_run main loop), we can get rid
of the get_fpu and put_fpu hooks.

Signed-off-by: Rik van Riel <riel@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-05 21:20:24 +01:00
David Hildenbrand
4d772cb85f KVM: x86: fix em_fxstor() sleeping while in atomic
Commit 9d643f6312 ("KVM: x86: avoid large stack allocations in
em_fxrstor") optimize the stack size, but introduced a guest memory access
which might sleep while in atomic.

Fix it by introducing, again, a second fxregs_state. Try to avoid
large stacks by using noinline. Add some helpful comments.

Reported by syzbot:

in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109
2 locks held by syzkaller879109/2909:
  #0:  (&vcpu->mutex){+.+.}, at: [<ffffffff8106222c>] vcpu_load+0x1c/0x70
arch/x86/kvm/../../../virt/kvm/kvm_main.c:154
  #1:  (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_enter_guest
arch/x86/kvm/x86.c:6983 [inline]
  #1:  (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_run
arch/x86/kvm/x86.c:7061 [inline]
  #1:  (&kvm->srcu){....}, at: [<ffffffff810dd162>]
kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222
CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted 4.13.0-rc4-next-20170811
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:16 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:52
  ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014
  __might_sleep+0x95/0x190 kernel/sched/core.c:5967
  __might_fault+0xab/0x1d0 mm/memory.c:4383
  __copy_from_user include/linux/uaccess.h:71 [inline]
  __kvm_read_guest_page+0x58/0xa0
arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771
  kvm_vcpu_read_guest_page+0x44/0x60
arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791
  kvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407
  kvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466
  segmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819
  em_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022
  x86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471
  x86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698
  kvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854
  handle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400
  vmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718
  vcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline]
  vcpu_run arch/x86/kvm/x86.c:7061 [inline]
  kvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222
  kvm_vcpu_ioctl+0x64c/0x1010 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591
  vfs_ioctl fs/ioctl.c:45 [inline]
  do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
  SYSC_ioctl fs/ioctl.c:700 [inline]
  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
  entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x437fc9
RSP: 002b:00007ffc7b4d5ab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000437fc9
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000020ae8000
R10: 0000000000009120 R11: 0000000000000206 R12: 0000000000000000
R13: 0000000000000004 R14: 0000000000000004 R15: 0000000020077000

Fixes: 9d643f6312 ("KVM: x86: avoid large stack allocations in em_fxrstor")
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17 13:20:15 +01:00
Wanpeng Li
3853be2603 KVM: X86: Fix operand/address-size during instruction decoding
Pedro reported:
  During tests that we conducted on KVM, we noticed that executing a "PUSH %ES"
  instruction under KVM produces different results on both memory and the SP
  register depending on whether EPT support is enabled. With EPT the SP is
  reduced by 4 bytes (and the written value is 0-padded) but without EPT support
  it is only reduced by 2 bytes. The difference can be observed when the CS.DB
  field is 1 (32-bit) but not when it's 0 (16-bit).

The internal segment descriptor cache exist even in real/vm8096 mode. The CS.D
also should be respected instead of just default operand/address-size/66H
prefix/67H prefix during instruction decoding. This patch fixes it by also
adjusting operand/address-size according to CS.D.

Reported-by: Pedro Fonseca <pfonseca@cs.washington.edu>
Tested-by: Pedro Fonseca <pfonseca@cs.washington.edu>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Pedro Fonseca <pfonseca@cs.washington.edu>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17 13:20:12 +01:00
Ladi Prosek
0234bf8852 KVM: x86: introduce ISA specific SMM entry/exit callbacks
Entering and exiting SMM may require ISA specific handling under certain
circumstances. This commit adds two new callbacks with empty implementations.
Actual functionality will be added in following commits.

* pre_enter_smm() is to be called when injecting an SMM, before any
  SMM related vcpu state has been changed
* pre_leave_smm() is to be called when emulating the RSM instruction,
  when the vcpu is in real mode and before any SMM related vcpu state
  has been restored

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-12 14:01:55 +02:00
Josh Poimboeuf
f26e60167d x86/kvm: Move kvm_fastop_exception to .fixup section
When compiling the kernel with the '-frecord-gcc-switches' flag, objtool
complains:

  arch/x86/kvm/emulate.o: warning: objtool: .GCC.command.line+0x0: special: can't find new instruction

And also the kernel fails to link.

The problem is that the 'kvm_fastop_exception' code gets placed into the
throwaway '.GCC.command.line' section instead of '.text'.

Exception fixup code is conventionally placed in the '.fixup' section,
so put it there where it belongs.

Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-05 15:06:30 +02:00
Linus Torvalds
a141fd55f2 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Another round of CR3/PCID related fixes (I think this addresses all
  but one of the known problems with PCID support), an objtool fix plus
  a Clang fix that (finally) solves all Clang quirks to build a bootable
  x86 kernel as-is"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Fix inline asm call constraints for Clang
  objtool: Handle another GCC stack pointer adjustment bug
  x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
  x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
  x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code
  x86/mm: Factor out CR3-building code
2017-09-24 12:33:58 -07:00
Josh Poimboeuf
f5caf621ee x86/asm: Fix inline asm call constraints for Clang
For inline asm statements which have a CALL instruction, we list the
stack pointer as a constraint to convince GCC to ensure the frame
pointer is set up first:

  static inline void foo()
  {
	register void *__sp asm(_ASM_SP);
	asm("call bar" : "+r" (__sp))
  }

Unfortunately, that pattern causes Clang to corrupt the stack pointer.

The fix is easy: convert the stack pointer register variable to a global
variable.

It should be noted that the end result is different based on the GCC
version.  With GCC 6.4, this patch has exactly the same result as
before:

	defconfig	defconfig-nofp	distro		distro-nofp
 before	9820389		9491555		8816046		8516940
 after	9820389		9491555		8816046		8516940

With GCC 7.2, however, GCC's behavior has changed.  It now changes its
behavior based on the conversion of the register variable to a global.
That somehow convinces it to *always* set up the frame pointer before
inserting *any* inline asm.  (Therefore, listing the variable as an
output constraint is a no-op and is no longer necessary.)  It's a bit
overkill, but the performance impact should be negligible.  And in fact,
there's a nice improvement with frame pointers disabled:

	defconfig	defconfig-nofp	distro		distro-nofp
 before	9796316		9468236		9076191		8790305
 after	9796957		9464267		9076381		8785949

So in summary, while listing the stack pointer as an output constraint
is no longer necessary for newer versions of GCC, it's still needed for
older versions.

Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-23 15:06:20 +02:00
Yu Zhang
d6500149bc KVM: x86: Fix the NULL pointer parameter in check_cr_write()
Routine check_cr_write() will trigger emulator_get_cpuid()->
kvm_cpuid() to get maxphyaddr, and NULL is passed as values
for ebx/ecx/edx. This is problematic because kvm_cpuid() will
dereference these pointers.

Fixes: d1cd3ce900 ("KVM: MMU: check guest CR3 reserved bits based on its physical address width.")
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-19 14:28:58 +02:00
Yu Zhang
fd8cb43373 KVM: MMU: Expose the LA57 feature to VM.
This patch exposes 5 level page table feature to the VM.
At the same time, the canonical virtual address checking is
extended to support both 48-bits and 57-bits address width.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-24 18:09:17 +02:00
Yu Zhang
d1cd3ce900 KVM: MMU: check guest CR3 reserved bits based on its physical address width.
Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the
reserved bits in CR3. Yet the length of reserved bits in
guest CR3 should be based on the physical address width
exposed to the VM. This patch changes CR3 check logic to
calculate the reserved bits at runtime.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-24 18:09:16 +02:00
Yu Zhang
e911eb3b34 KVM: x86: Add return value to kvm_cpuid().
Return false in kvm_cpuid() when it fails to find the cpuid
entry. Also, this routine(and its caller) is optimized with
a new argument - check_limit, so that the check_cpuid_limit()
fall back can be avoided.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-24 18:09:15 +02:00
Linus Torvalds
c136b84393 PPC:
- Better machine check handling for HV KVM
 - Ability to support guests with threads=2, 4 or 8 on POWER9
 - Fix for a race that could cause delayed recognition of signals
 - Fix for a bug where POWER9 guests could sleep with interrupts pending.
 
 ARM:
 - VCPU request overhaul
 - allow timer and PMU to have their interrupt number selected from userspace
 - workaround for Cavium erratum 30115
 - handling of memory poisonning
 - the usual crop of fixes and cleanups
 
 s390:
 - initial machine check forwarding
 - migration support for the CMMA page hinting information
 - cleanups and fixes
 
 x86:
 - nested VMX bugfixes and improvements
 - more reliable NMI window detection on AMD
 - APIC timer optimizations
 
 Generic:
 - VCPU request overhaul + documentation of common code patterns
 - kvm_stat improvements
 
 There is a small conflict in arch/s390 due to an arch-wide field rename.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZW4XTAAoJEL/70l94x66DkhMH/izpk54KI17PtyQ9VYI2sYeZ
 BWK6Kl886g3ij4pFi3pECqjDJzWaa3ai+vFfzzpJJ8OkCJT5Rv4LxC5ERltVVmR8
 A3T1I/MRktSC0VJLv34daPC2z4Lco/6SPipUpPnL4bE2HATKed4vzoOjQ3tOeGTy
 dwi7TFjKwoVDiM7kPPDRnTHqCe5G5n13sZ49dBe9WeJ7ttJauWqoxhlYosCGNPEj
 g8ZX8+cvcAhVnz5uFL8roqZ8ygNEQq2mgkU18W8ZZKuiuwR0gdsG0gSBFNTdwIMK
 NoreRKMrw0+oLXTIB8SZsoieU6Qi7w3xMAMabe8AJsvYtoersugbOmdxGCr1lsA=
 =OD7H
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "PPC:
   - Better machine check handling for HV KVM
   - Ability to support guests with threads=2, 4 or 8 on POWER9
   - Fix for a race that could cause delayed recognition of signals
   - Fix for a bug where POWER9 guests could sleep with interrupts pending.

  ARM:
   - VCPU request overhaul
   - allow timer and PMU to have their interrupt number selected from userspace
   - workaround for Cavium erratum 30115
   - handling of memory poisonning
   - the usual crop of fixes and cleanups

  s390:
   - initial machine check forwarding
   - migration support for the CMMA page hinting information
   - cleanups and fixes

  x86:
   - nested VMX bugfixes and improvements
   - more reliable NMI window detection on AMD
   - APIC timer optimizations

  Generic:
   - VCPU request overhaul + documentation of common code patterns
   - kvm_stat improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
  Update my email address
  kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
  x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
  kvm: x86: mmu: allow A/D bits to be disabled in an mmu
  x86: kvm: mmu: make spte mmio mask more explicit
  x86: kvm: mmu: dead code thanks to access tracking
  KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code
  KVM: PPC: Book3S HV: Close race with testing for signals on guest entry
  KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
  KVM: x86: remove ignored type attribute
  KVM: LAPIC: Fix lapic timer injection delay
  KVM: lapic: reorganize restart_apic_timer
  KVM: lapic: reorganize start_hv_timer
  kvm: nVMX: Check memory operand to INVVPID
  KVM: s390: Inject machine check into the nested guest
  KVM: s390: Inject machine check into the guest
  tools/kvm_stat: add new interactive command 'b'
  tools/kvm_stat: add new command line switch '-i'
  tools/kvm_stat: fix error on interactive command 'g'
  KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit
  ...
2017-07-06 18:38:31 -07:00
Nick Desaulniers
8616abc253 KVM: x86: remove ignored type attribute
The macro insn_fetch marks the 'type' argument as having a specified
alignment.  Type attributes can only be applied to structs, unions, or
enums, but insn_fetch is only ever invoked with integral types, so Clang
produces 19 -Wignored-attributes warnings for this source file.

Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-30 12:45:55 +02:00
Paolo Bonzini
c8401dda2f KVM: x86: fix singlestepping over syscall
TF is handled a bit differently for syscall and sysret, compared
to the other instructions: TF is checked after the instruction completes,
so that the OS can disable #DB at a syscall by adding TF to FMASK.
When the sysret is executed the #DB is taken "as if" the syscall insn
just completed.

KVM emulates syscall so that it can trap 32-bit syscall on Intel processors.
Fix the behavior, otherwise you could get #DB on a user stack which is not
nice.  This does not affect Linux guests, as they use an IST or task gate
for #DB.

This fixes CVE-2017-7518.

Cc: stable@vger.kernel.org
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-06-22 16:13:29 +02:00
Nick Desaulniers
9d643f6312 KVM: x86: avoid large stack allocations in em_fxrstor
em_fxstor previously called fxstor_fixup.  Both created instances of
struct fxregs_state on the stack, which triggered the warning:

arch/x86/kvm/emulate.c:4018:12: warning: stack frame size of 1080 bytes
in function
      'em_fxrstor' [-Wframe-larger-than=]
static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
           ^
with CONFIG_FRAME_WARN set to 1024.

This patch does the fixup in em_fxstor now, avoiding one additional
struct fxregs_state, and now fxstor_fixup can be removed as it has no
other call sites.

Further, the calculation for offsets into xmm_space can be shared
between em_fxstor and em_fxsave.

Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com>
[Clean up calculation of offsets and fix it for 64-bit mode. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-01 11:23:12 +02:00
Radim Krčmář
92ceb7679a KVM: x86: prevent uninitialized variable warning in check_svme()
get_msr() of MSR_EFER is currently always going to succeed, but static
checker doesn't see that far.

Don't complicate stuff and just use 0 for the fallback -- it means that
the feature is not present.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-05-19 19:59:28 +02:00
Ladi Prosek
6ed071f051 KVM: x86: fix emulation of RSM and IRET instructions
On AMD, the effect of set_nmi_mask called by emulate_iret_real and em_rsm
on hflags is reverted later on in x86_emulate_instruction where hflags are
overwritten with ctxt->emul_flags (the kvm_set_hflags call). This manifests
as a hang when rebooting Windows VMs with QEMU, OVMF, and >1 vcpu.

Instead of trying to merge ctxt->emul_flags into vcpu->arch.hflags after
an instruction is emulated, this commit deletes emul_flags altogether and
makes the emulator access vcpu->arch.hflags using two new accessors. This
way all changes, on the emulator side as well as in functions called from
the emulator and accessing vcpu state with emul_to_vcpu, are preserved.

More details on the bug and its manifestation with Windows and OVMF:

  It's a KVM bug in the interaction between SMI/SMM and NMI, specific to AMD.
  I believe that the SMM part explains why we started seeing this only with
  OVMF.

  KVM masks and unmasks NMI when entering and leaving SMM. When KVM emulates
  the RSM instruction in em_rsm, the set_nmi_mask call doesn't stick because
  later on in x86_emulate_instruction we overwrite arch.hflags with
  ctxt->emul_flags, effectively reverting the effect of the set_nmi_mask call.
  The AMD-specific hflag of interest here is HF_NMI_MASK.

  When rebooting the system, Windows sends an NMI IPI to all but the current
  cpu to shut them down. Only after all of them are parked in HLT will the
  initiating cpu finish the restart. If NMI is masked, other cpus never get
  the memo and the initiating cpu spins forever, waiting for
  hal!HalpInterruptProcessorsStarted to drop. That's the symptom we observe.

Fixes: a584539b24 ("KVM: x86: pass the whole hflags field to emulator and back")
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-27 16:54:09 +02:00
Kyle Huey
db2336a804 KVM: x86: virtualize cpuid faulting
Hardware support for faulting on the cpuid instruction is not required to
emulate it, because cpuid triggers a VM exit anyways. KVM handles the relevant
MSRs (MSR_PLATFORM_INFO and MSR_MISC_FEATURES_ENABLE) and upon a
cpuid-induced VM exit checks the cpuid faulting state and the CPL.
kvm_require_cpl is even kind enough to inject the GP fault for us.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Reviewed-by: David Matlack <dmatlack@google.com>
[Return "1" from kvm_emulate_cpuid, it's not void. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-21 12:50:06 +02:00
Radim Krčmář
a9ff720e0f Merge branch 'x86/cpufeature' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
For AVX512_VPOPCNTDQ.
2017-01-17 17:53:01 +01:00
Paolo Bonzini
33ab91103b KVM: x86: fix emulation of "MOV SS, null selector"
This is CVE-2017-2583.  On Intel this causes a failed vmentry because
SS's type is neither 3 nor 7 (even though the manual says this check is
only done for usable SS, and the dmesg splat says that SS is unusable!).
On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.

The fix fabricates a data segment descriptor when SS is set to a null
selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
this in turn ensures CPL < 3 because RPL must be equal to CPL.

Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
the bug and deciphering the manuals.

Reported-by: Xiaohan Zhang <zhangxiaohan1@huawei.com>
Fixes: 79d5b4c3cd
Cc: stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-12 15:17:13 +01:00
Steve Rutherford
129a72a0d3 KVM: x86: Introduce segmented_write_std
Introduces segemented_write_std.

Switches from emulated reads/writes to standard read/writes in fxsave,
fxrstor, sgdt, and sidt.  This fixes CVE-2017-2584, a longstanding
kernel memory leak.

Since commit 283c95d0e3 ("KVM: x86: emulate FXSAVE and FXRSTOR",
2016-11-09), which is luckily not yet in any final release, this would
also be an exploitable kernel memory *write*!

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: 96051572c8
Fixes: 283c95d0e3
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-12 14:34:58 +01:00
Tom Lendacky
0f89b207b0 kvm: svm: Use the hardware provided GPA instead of page walk
When a guest causes a NPF which requires emulation, KVM sometimes walks
the guest page tables to translate the GVA to a GPA. This is unnecessary
most of the time on AMD hardware since the hardware provides the GPA in
EXITINFO2.

The only exception cases involve string operations involving rep or
operations that use two memory locations. With rep, the GPA will only be
the value of the initial NPF and with dual memory locations we won't know
which memory address was translated into EXITINFO2.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09 14:47:58 +01:00
Linus Torvalds
93173b5bf2 Small release, the most interesting stuff is x86 nested virt improvements.
x86: userspace can now hide nested VMX features from guests; nested
 VMX can now run Hyper-V in a guest; support for AVX512_4VNNIW and
 AVX512_FMAPS in KVM; infrastructure support for virtual Intel GPUs.
 
 PPC: support for KVM guests on POWER9; improved support for interrupt
 polling; optimizations and cleanups.
 
 s390: two small optimizations, more stuff is in flight and will be
 in 4.11.
 
 ARM: support for the GICv3 ITS on 32bit platforms.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQExBAABCAAbBQJYTkP0FBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D
 lZIH/iT1n9OQXcuTpYYnQhuCenzI3GZZOIMTbCvK2i5bo0FIJKxVn0EiAAqZSXvO
 nO185FqjOgLuJ1AD1kJuxzye5suuQp4HIPWWgNHcexLuy43WXWKZe0IQlJ4zM2Xf
 u31HakpFmVDD+Cd1qN3yDXtDrRQ79/xQn2kw7CWb8olp+pVqwbceN3IVie9QYU+3
 gCz0qU6As0aQIwq2PyalOe03sO10PZlm4XhsoXgWPG7P18BMRhNLTDqhLhu7A/ry
 qElVMANT7LSNLzlwNdpzdK8rVuKxETwjlc1UP8vSuhrwad4zM2JJ1Exk26nC2NaG
 D0j4tRSyGFIdx6lukZm7HmiSHZ0=
 =mkoB
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Small release, the most interesting stuff is x86 nested virt
  improvements.

  x86:
   - userspace can now hide nested VMX features from guests
   - nested VMX can now run Hyper-V in a guest
   - support for AVX512_4VNNIW and AVX512_FMAPS in KVM
   - infrastructure support for virtual Intel GPUs.

  PPC:
   - support for KVM guests on POWER9
   - improved support for interrupt polling
   - optimizations and cleanups.

  s390:
   - two small optimizations, more stuff is in flight and will be in
     4.11.

  ARM:
   - support for the GICv3 ITS on 32bit platforms"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits)
  arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest
  KVM: arm/arm64: timer: Check for properly initialized timer on init
  KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUs
  KVM: x86: Handle the kthread worker using the new API
  KVM: nVMX: invvpid handling improvements
  KVM: nVMX: check host CR3 on vmentry and vmexit
  KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry
  KVM: nVMX: propagate errors from prepare_vmcs02
  KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT
  KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry
  KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID
  KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
  KVM: nVMX: support restore of VMX capability MSRs
  KVM: nVMX: generate non-true VMX MSRs based on true versions
  KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs.
  KVM: x86: Add kvm_skip_emulated_instruction and use it.
  KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12
  KVM: VMX: Reorder some skip_emulated_instruction calls
  KVM: x86: Add a return value to kvm_emulate_cpuid
  KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h
  ...
2016-12-13 15:47:02 -08:00
Radim Krčmář
2117d5398c KVM: x86: drop error recovery in em_jmp_far and em_ret_far
em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
bit mode, but syzkaller proved otherwise (and SDM agrees).
Code segment was restored upon failure, but it was left uninitialized
outside of long mode, which could lead to a leak of host kernel stack.
We could have fixed that by always saving and restoring the CS, but we
take a simpler approach and just break any guest that manages to fail
as the error recovery is error-prone and modern CPUs don't need emulator
for this.

Found by syzkaller:

  WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
  Kernel panic - not syncing: panic_on_warn set ...

  CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
   [...]
  Call Trace:
   [...] __dump_stack lib/dump_stack.c:15
   [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
   [...] panic+0x1b7/0x3a3 kernel/panic.c:179
   [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
   [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
   [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
   [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
   [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
   [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
   [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
   [...] complete_emulated_io arch/x86/kvm/x86.c:6870
   [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
   [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
   [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
   [...] vfs_ioctl fs/ioctl.c:43
   [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
   [...] SYSC_ioctl fs/ioctl.c:694
   [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
   [...] entry_SYSCALL_64_fastpath+0x1f/0xc2

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: d1442d85cc ("KVM: x86: Handle errors when RIP is set during far jumps")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-24 18:36:54 +01:00
Radim Krčmář
283c95d0e3 KVM: x86: emulate FXSAVE and FXRSTOR
Internal errors were reported on 16 bit fxsave and fxrstor with ipxe.
Old Intels don't have unrestricted_guest, so we have to emulate them.

The patch takes advantage of the hardware implementation.

AMD and Intel differ in saving and restoring other fields in first 32
bytes.  A test wrote 0xff to the fxsave area, 0 to upper bits of MCSXR
in the fxsave area, executed fxrstor, rewrote the fxsave area to 0xee,
and executed fxsave:

  Intel (Nehalem):
    7f 1f 7f 7f ff 00 ff 07 ff ff ff ff ff ff 00 00
    ff ff ff ff ff ff 00 00 ff ff 00 00 ff ff 00 00
  Intel (Haswell -- deprecated FPU CS and FPU DS):
    7f 1f 7f 7f ff 00 ff 07 ff ff ff ff 00 00 00 00
    ff ff ff ff 00 00 00 00 ff ff 00 00 ff ff 00 00
  AMD (Opteron 2300-series):
    7f 1f 7f 7f ff 00 ee ee ee ee ee ee ee ee ee ee
    ee ee ee ee ee ee ee ee ff ff 00 00 ff ff 02 00

fxsave/fxrstor will only be emulated on early Intels, so KVM can't do
much to improve the situation.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-16 22:09:46 +01:00
Radim Krčmář
aabba3c6ab KVM: x86: add asm_safe wrapper
Move the existing exception handling for inline assembly into a macro
and switch its return values to X86EMUL type.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-16 22:09:45 +01:00
Radim Krčmář
4852018789 KVM: x86: save one bit in ctxt->d
Alignments are exclusive, so 5 modes can be expressed in 3 bits.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-16 22:09:45 +01:00
Radim Krčmář
d3fe959f81 KVM: x86: add Align16 instruction flag
Needed for FXSAVE and FXRSTOR.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-16 22:09:45 +01:00
Owen Hofmann
d9092f52d7 kvm: x86: Check memopp before dereference (CVE-2016-8630)
Commit 41061cdb98 ("KVM: emulate: do not initialize memopp") removes a
check for non-NULL under incorrect assumptions. An undefined instruction
with a ModR/M byte with Mod=0 and R/M-5 (e.g. 0xc7 0x15) will attempt
to dereference a null pointer here.

Fixes: 41061cdb98
Message-Id: <1477592752-126650-2-git-send-email-osh@google.com>
Signed-off-by: Owen Hofmann <osh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02 21:31:53 +01:00
Paul Gortmaker
1767e931e3 x86/kvm: Audit and remove any unnecessary uses of module.h
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends.  That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.  In the case of
kvm where it is modular, we can extend that to also include files
that are building basic support functionality but not related
to loading or registering the final module; such files also have
no need whatsoever for module.h

The advantage in removing such instances is that module.h itself
sources about 15 other headers; adding significantly to what we feed
cpp, and it can obscure what headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each instance for the
presence of either and replace as needed.

Several instances got replaced with moduleparam.h since that was
really all that was required for those particular files.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/20160714001901.31603-8-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 15:07:00 +02:00
Josh Poimboeuf
4548f63e65 x86/kvm: Add stack frame dependency to fastop() inline asm
The kbuild test robot reported this objtool warning [1]:

  arch/x86/kvm/emulate.o: warning: objtool: fastop()+0x69: call without frame pointer save/setup

The issue seems to be caused by CONFIG_PROFILE_ALL_BRANCHES.  With that
option, for some reason gcc decides not to create a stack frame in
fastop() before doing the inline asm call, which can result in a bad
stack trace.

Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by
listing the stack pointer as an output operand for the inline asm
statement.

This change has no effect for !CONFIG_PROFILE_ALL_BRANCHES.

[1] https://lists.01.org/pipermail/kbuild-all/2016-March/018249.html

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-10 18:16:50 +02:00
Linus Torvalds
26660a4046 Merge branch 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull 'objtool' stack frame validation from Ingo Molnar:
 "This tree adds a new kernel build-time object file validation feature
  (ONFIG_STACK_VALIDATION=y): kernel stack frame correctness validation.
  It was written by and is maintained by Josh Poimboeuf.

  The motivation: there's a category of hard to find kernel bugs, most
  of them in assembly code (but also occasionally in C code), that
  degrades the quality of kernel stack dumps/backtraces.  These bugs are
  hard to detect at the source code level.  Such bugs result in
  incorrect/incomplete backtraces most of time - but can also in some
  rare cases result in crashes or other undefined behavior.

  The build time correctness checking is done via the new 'objtool'
  user-space utility that was written for this purpose and which is
  hosted in the kernel repository in tools/objtool/.  The tool's (very
  simple) UI and source code design is shaped after Git and perf and
  shares quite a bit of infrastructure with tools/perf (which tooling
  infrastructure sharing effort got merged via perf and is already
  upstream).  Objtool follows the well-known kernel coding style.

  Objtool does not try to check .c or .S files, it instead analyzes the
  resulting .o generated machine code from first principles: it decodes
  the instruction stream and interprets it.  (Right now objtool supports
  the x86-64 architecture.)

  From tools/objtool/Documentation/stack-validation.txt:

   "The kernel CONFIG_STACK_VALIDATION option enables a host tool named
    objtool which runs at compile time.  It has a "check" subcommand
    which analyzes every .o file and ensures the validity of its stack
    metadata.  It enforces a set of rules on asm code and C inline
    assembly code so that stack traces can be reliable.

    Currently it only checks frame pointer usage, but there are plans to
    add CFI validation for C files and CFI generation for asm files.

    For each function, it recursively follows all possible code paths
    and validates the correct frame pointer state at each instruction.

    It also follows code paths involving special sections, like
    .altinstructions, __jump_table, and __ex_table, which can add
    alternative execution paths to a given instruction (or set of
    instructions).  Similarly, it knows how to follow switch statements,
    for which gcc sometimes uses jump tables."

  When this new kernel option is enabled (it's disabled by default), the
  tool, if it finds any suspicious assembly code pattern, outputs
  warnings in compiler warning format:

    warning: objtool: rtlwifi_rate_mapping()+0x2e7: frame pointer state mismatch
    warning: objtool: cik_tiling_mode_table_init()+0x6ce: call without frame pointer save/setup
    warning: objtool:__schedule()+0x3c0: duplicate frame pointer save
    warning: objtool:__schedule()+0x3fd: sibling call from callable instruction with changed frame pointer

  ... so that scripts that pick up compiler warnings will notice them.
  All known warnings triggered by the tool are fixed by the tree, most
  of the commits in fact prepare the kernel to be warning-free.  Most of
  them are bugfixes or cleanups that stand on their own, but there are
  also some annotations of 'special' stack frames for justified cases
  such entries to JIT-ed code (BPF) or really special boot time code.

  There are two other long-term motivations behind this tool as well:

   - To improve the quality and reliability of kernel stack frames, so
     that they can be used for optimized live patching.

   - To create independent infrastructure to check the correctness of
     CFI stack frames at build time.  CFI debuginfo is notoriously
     unreliable and we cannot use it in the kernel as-is without extra
     checking done both on the kernel side and on the build side.

  The quality of kernel stack frames matters to debuggability as well,
  so IMO we can merge this without having to consider the live patching
  or CFI debuginfo angle"

* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  objtool: Only print one warning per function
  objtool: Add several performance improvements
  tools: Copy hashtable.h into tools directory
  objtool: Fix false positive warnings for functions with multiple switch statements
  objtool: Rename some variables and functions
  objtool: Remove superflous INIT_LIST_HEAD
  objtool: Add helper macros for traversing instructions
  objtool: Fix false positive warnings related to sibling calls
  objtool: Compile with debugging symbols
  objtool: Detect infinite recursion
  objtool: Prevent infinite recursion in noreturn detection
  objtool: Detect and warn if libelf is missing and don't break the build
  tools: Support relative directory path for 'O='
  objtool: Support CROSS_COMPILE
  x86/asm/decoder: Use explicitly signed chars
  objtool: Enable stack metadata validation on 64-bit x86
  objtool: Add CONFIG_STACK_VALIDATION option
  objtool: Add tool to perform compile-time stack metadata validation
  x86/kprobes: Mark kretprobe_trampoline() stack frame as non-standard
  sched: Always inline context_switch()
  ...
2016-03-20 18:23:21 -07:00
Paolo Bonzini
0c1d77f4ba KVM: x86: fix conversion of addresses to linear in 32-bit protected mode
Commit e8dd2d2d64 ("Silence compiler warning in arch/x86/kvm/emulate.c",
2015-09-06) broke boot of the Hurd.  The bug is that the "default:"
case actually could modify "la", but after the patch this change is
not reflected in *linear.

The bug is visible whenever a non-zero segment base causes the linear
address to wrap around the 4GB mark.

Fixes: e8dd2d2d64
Cc: stable@vger.kernel.org
Reported-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-24 14:47:45 +01:00
Josh Poimboeuf
cb7390fed4 x86/kvm: Make test_cc() always inline
With some configs (including allyesconfig), gcc doesn't inline
test_cc().  When that happens, test_cc() doesn't create a stack frame
before inserting the inline asm call instruction.  This breaks frame
pointer convention if CONFIG_FRAME_POINTER is enabled and can result in
a bad stack trace.

Force it to always be inlined so that its containing function's stack
frame can be used.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/20160122161612.GE20502@treble.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24 08:35:44 +01:00
Josh Poimboeuf
1482a0825b x86/kvm: Set ELF function type for fastop functions
The callable functions created with the FOP* and FASTOP* macros are
missing ELF function annotations, which confuses tools like stacktool.
Properly annotate them.

This adds some additional labels to the assembly, but the generated
binary code is unchanged (with the exception of instructions which have
embedded references to __LINE__).

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/e399651c89ace54906c203c0557f66ed6ea3ce8d.1453405861.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24 08:35:44 +01:00
Paolo Bonzini
89651a3dec KVM: x86: allow RSM from 64-bit mode
The SDM says that exiting system management mode from 64-bit mode
is invalid, but that would be too good to be true.  But actually,
most of the code is already there to support exiting from compat
mode (EFER.LME=1, EFER.LMA=0).  Getting all the way from 64-bit
mode to real mode only requires clearing CS.L and CR4.PCIDE.

Cc: stable@vger.kernel.org
Fixes: 660a5d517a
Tested-by: Laszlo Ersek <lersek@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:38 +01:00
Radim Krčmář
f40606b147 KVM: x86: handle SMBASE as physical address in RSM
GET_SMSTATE depends on real mode to ensure that smbase+offset is treated
as a physical address, which has already caused a bug after shuffling
the code.  Enforce physical addressing.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reported-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:32 +01:00
Paolo Bonzini
b10d92a54d KVM: x86: fix RSM into 64-bit protected mode
In order to get into 64-bit protected mode, you need to enable
paging while EFER.LMA=1.  For this to work, CS.L must be 0.
Currently, we load the segments before CR0 and CR4, which means
that if RSM returns into 64-bit protected mode CS.L is already 1
and everything breaks.

Luckily, CS.L=0 is always the case when executing RSM, because it
is forbidden to execute RSM from 64-bit protected mode.  Hence it
is enough to load CR0 and CR4 first, and only then the segments.

Fixes: 660a5d517a
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:39:52 +02:00
Valdis Kletnieks
e8dd2d2d64 Silence compiler warning in arch/x86/kvm/emulate.c
Compiler warning:

 CC [M]  arch/x86/kvm/emulate.o
arch/x86/kvm/emulate.c: In function "__do_insn_fetch_bytes":
arch/x86/kvm/emulate.c:814:9: warning: "linear" may be used uninitialized in this function [-Wmaybe-uninitialized]

GCC is smart enough to realize that the inlined __linearize may return before
setting the value of linear, but not smart enough to realize the same
X86EMU_CONTINUE blocks actual use of the value.  However, the value of
'linear' can only be set to one value, so hoisting the one line of code
upwards makes GCC happy with the code.

Reported-by: Aruna Hewapathirane <aruna.hewapathirane@gmail.com>
Tested-by: Aruna Hewapathirane <aruna.hewapathirane@gmail.com>
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-06 16:26:23 +02:00
Paolo Bonzini
660a5d517a KVM: x86: save/load state on SMM switch
The big ugly one.  This patch adds support for switching in and out of
system management mode, respectively upon receiving KVM_REQ_SMI and upon
executing a RSM instruction.  Both 32- and 64-bit formats are supported
for the SMM state save area.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04 16:17:46 +02:00
Paolo Bonzini
64d6067057 KVM: x86: stubs for SMM support
This patch adds the interface between x86.c and the emulator: the
SMBASE register, a new emulator flag, the RSM instruction.  It also
adds a new request bit that will be used by the KVM_SMI ioctl.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04 16:01:45 +02:00
Paolo Bonzini
a584539b24 KVM: x86: pass the whole hflags field to emulator and back
The hflags field will contain information about system management mode
and will be useful for the emulator.  Pass the entire field rather than
just the guest-mode information.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04 16:01:05 +02:00
Nadav Amit
428e3d0857 KVM: x86: Fix zero iterations REP-string
When a REP-string is executed in 64-bit mode with an address-size prefix,
ECX/EDI/ESI are used as counter and pointers. When ECX is initially zero, Intel
CPUs clear the high 32-bits of RCX, and recent Intel CPUs update the high bits
of the pointers in MOVS/STOS. This behavior is specific to Intel according to
few experiments.

As one may guess, this is an undocumented behavior. Yet, it is observable in
the guest, since at least VMX traps REP-INS/OUTS even when ECX=0. Note that
VMware appears to get it right.  The behavior can be observed using the
following code:

 #include <stdio.h>

 #define LOW_MASK	(0xffffffff00000000ull)
 #define ALL_MASK	(0xffffffffffffffffull)
 #define TEST(opcode)							\
	do {								\
	asm volatile(".byte 0xf2 \n\t .byte 0x67 \n\t .byte " opcode "\n\t" \
			: "=S"(s), "=c"(c), "=D"(d) 			\
			: "S"(ALL_MASK), "c"(LOW_MASK), "D"(ALL_MASK));	\
	printf("opcode %s rcx=%llx rsi=%llx rdi=%llx\n",		\
		opcode, c, s, d);					\
	} while(0)

void main()
{
	unsigned long long s, d, c;
	iopl(3);
	TEST("0x6c");
	TEST("0x6d");
	TEST("0x6e");
	TEST("0x6f");
	TEST("0xa4");
	TEST("0xa5");
	TEST("0xa6");
	TEST("0xa7");
	TEST("0xaa");
	TEST("0xab");
	TEST("0xae");
	TEST("0xaf");
}

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:36 +02:00
Nadav Amit
ee122a7109 KVM: x86: Fix update RCX/RDI/RSI on REP-string
When REP-string instruction is preceded with an address-size prefix,
ECX/EDI/ESI are used as the operation counter and pointers.  When they are
updated, the high 32-bits of RCX/RDI/RSI are cleared, similarly to the way they
are updated on every 32-bit register operation.  Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:35 +02:00
Nadav Amit
3db176d5b4 KVM: x86: Fix DR7 mask on task-switch while debugging
If the host sets hardware breakpoints to debug the guest, and a task-switch
occurs in the guest, the architectural DR7 will not be updated. The effective
DR7 would be updated instead.

This fix puts the DR7 update during task-switch emulation, so it now uses the
standard DR setting mechanism instead of the one that was previously used. As a
bonus, the update of DR7 will now be effective for AMD as well.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:35 +02:00
Nadav Amit
acac6f8957 KVM: x86: Call-far should not be emulated as stack op
Far call in 64-bit has a 32-bit operand size. Remove the marking of this
operation as Stack so it can be emulated correctly in 64-bit.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-08 10:51:44 +02:00
Wanpeng Li
35fd68a38d kvm: x86: fix x86 eflags fixed bit
Guest can't be booted w/ ept=0, there is a message dumped as below:

If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.

EAX=00000011 EBX=f000d2f6 ECX=00006cac EDX=000f8956
ESI=bffbdf62 EDI=00000000 EBP=00006c68 ESP=00006c68
EIP=0000d187 EFL=00000004 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =e000 000e0000 ffffffff 00809300 DPL=0 DS16 [-WA]
CS =f000 000f0000 ffffffff 00809b00 DPL=0 CS16 [-RA]
SS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA]
DS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA]
FS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA]
GS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000f6a80 00000037
IDT=     000f6abe 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=01 1e b8 6a 2e 0f 01 16 74 6a 0f 20 c0 66 83 c8 01 0f 22 c0 <66> ea 8f d1 0f 00 08 00 b8 10 00 00 00 8e d8 8e c0 8e d0 8e e0 8e e8 89 c8 ff e2 89 c1 b8X

X86 eflags bit 1 is fixed set, which means that 1 << 1 is set instead of 1,
this patch fix it.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Message-Id: <1428473294-6633-1-git-send-email-wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08 10:46:52 +02:00
Eugene Korenevsky
2f729b10bb KVM: remove useless check of "ret" variable prior to returning the same value
A trivial code cleanup. This `if` is redundant.

Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Message-Id: <20150328222717.GA6508@gnote>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-30 16:57:15 +02:00
Nadav Amit
b32a991800 KVM: x86: Remove redundant definitions
Some constants are redfined in emulate.c. Avoid it.

s/SELECTOR_RPL_MASK/SEGMENT_RPL_MASK
s/SELECTOR_TI_MASK/SEGMENT_TI_MASK

No functional change.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427635984-8113-3-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-30 16:46:42 +02:00
Nadav Amit
0efb04406d KVM: x86: removing redundant eflags bits definitions
The eflags are redefined (using other defines) in emulate.c.
Use the definition from processor-flags.h as some mess already started.
No functional change.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427635984-8113-2-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-30 16:46:37 +02:00
Nadav Amit
900efe200e KVM: x86: BSF and BSR emulation change register unnecassarily
If the source of BSF and BSR is zero, the destination register should not
change. That is how real hardware behaves.  If we set the destination even with
the same value that we had before, we may clear bits [63:32] unnecassarily.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427719163-5429-4-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-30 16:46:11 +02:00
Nadav Amit
6fd8e12757 KVM: x86: POPA emulation may not clear bits [63:32]
POPA should assign the values to the registers as usual registers are assigned.
In other words, 32-bits register assignments should clear bits [63:32] of the
register.

Split the code of register assignments that will be used by future changes as
well.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427719163-5429-3-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-30 16:46:03 +02:00
Nadav Amit
b91aa14d95 KVM: x86: CMOV emulation on legacy mode is wrong
On legacy mode CMOV emulation should still clear bits [63:32] even if the
assignment is not done. The previous fix 140bad89fd ("KVM: x86: emulation of
dword cmov on long-mode should clear [63:32]") was incomplete.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427719163-5429-2-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-30 16:45:50 +02:00
Jan Kiszka
b34a80517b KVM: x86: Fix re-execution of patched vmmcall
For a very long time (since 2b3d2a20), the path handling a vmmcall
instruction of the guest on an Intel host only applied the patch but no
longer handled the hypercall. The reverse case, vmcall on AMD hosts, is
fine. As both em_vmcall and em_vmmcall actually have to do the same, we
can fix the issue by consolidating both into the same handler.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-13 13:27:54 -03:00
Paolo Bonzini
4ff6f8e61e KVM: emulate: fix CMPXCHG8B on 32-bit hosts
This has been broken for a long time: it broke first in 2.6.35, then was
almost fixed in 2.6.36 but this one-liner slipped through the cracks.
The bug shows up as an infinite loop in Windows 7 (and newer) boot on
32-bit hosts without EPT.

Windows uses CMPXCHG8B to write to page tables, which causes a
page fault if running without EPT; the emulator is then called from
kvm_mmu_page_fault.  The loop then happens if the higher 4 bytes are
not 0; the common case for this is that the NX bit (bit 63) is 1.

Fixes: 6550e1f165
Fixes: 16518d5ada
Cc: stable@vger.kernel.org   # 2.6.35+
Reported-by: Erik Rull <erik.rull@rdsoftware.de>
Tested-by: Erik Rull <erik.rull@rdsoftware.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-23 22:28:48 +01:00
Linus Torvalds
b9085bcbf5 Fairly small update, but there are some interesting new features.
Common: Optional support for adding a small amount of polling on each HLT
 instruction executed in the guest (or equivalent for other architectures).
 This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes
 or TCP_RR netperf tests).  This also has to be enabled manually for now,
 but the plan is to auto-tune this in the future.
 
 ARM/ARM64: the highlights are support for GICv3 emulation and dirty page
 tracking
 
 s390: several optimizations and bugfixes.  Also a first: a feature
 exposed by KVM (UUID and long guest name in /proc/sysinfo) before
 it is available in IBM's hypervisor! :)
 
 MIPS: Bugfixes.
 
 x86: Support for PML (page modification logging, a new feature in
 Broadwell Xeons that speeds up dirty page tracking), nested virtualization
 improvements (nested APICv---a nice optimization), usual round of emulation
 fixes.  There is also a new option to reduce latency of the TSC deadline
 timer in the guest; this needs to be tuned manually.
 
 Some commits are common between this pull and Catalin's; I see you
 have already included his tree.
 
 ARM has other conflicts where functions are added in the same place
 by 3.19-rc and 3.20 patches.  These are not large though, and entirely
 within KVM.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJU28rkAAoJEL/70l94x66DXqQH/1TDOfJIjW7P2kb0Sw7Fy1wi
 cEX1KO/VFxAqc8R0E/0Wb55CXyPjQJM6xBXuFr5cUDaIjQ8ULSktL4pEwXyyv/s5
 DBDkN65mriry2w5VuEaRLVcuX9Wy+tqLQXWNkEySfyb4uhZChWWHvKEcgw5SqCyg
 NlpeHurYESIoNyov3jWqvBjr4OmaQENyv7t2c6q5ErIgG02V+iCux5QGbphM2IC9
 LFtPKxoqhfeB2xFxTOIt8HJiXrZNwflsTejIlCl/NSEiDVLLxxHCxK2tWK/tUXMn
 JfLD9ytXBWtNMwInvtFm4fPmDouv2VDyR0xnK2db+/axsJZnbxqjGu1um4Dqbak=
 =7gdx
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM update from Paolo Bonzini:
 "Fairly small update, but there are some interesting new features.

  Common:
     Optional support for adding a small amount of polling on each HLT
     instruction executed in the guest (or equivalent for other
     architectures).  This can improve latency up to 50% on some
     scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests).  This
     also has to be enabled manually for now, but the plan is to
     auto-tune this in the future.

  ARM/ARM64:
     The highlights are support for GICv3 emulation and dirty page
     tracking

  s390:
     Several optimizations and bugfixes.  Also a first: a feature
     exposed by KVM (UUID and long guest name in /proc/sysinfo) before
     it is available in IBM's hypervisor! :)

  MIPS:
     Bugfixes.

  x86:
     Support for PML (page modification logging, a new feature in
     Broadwell Xeons that speeds up dirty page tracking), nested
     virtualization improvements (nested APICv---a nice optimization),
     usual round of emulation fixes.

     There is also a new option to reduce latency of the TSC deadline
     timer in the guest; this needs to be tuned manually.

     Some commits are common between this pull and Catalin's; I see you
     have already included his tree.

  Powerpc:
     Nothing yet.

     The KVM/PPC changes will come in through the PPC maintainers,
     because I haven't received them yet and I might end up being
     offline for some part of next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
  KVM: ia64: drop kvm.h from installed user headers
  KVM: x86: fix build with !CONFIG_SMP
  KVM: x86: emulate: correct page fault error code for NoWrite instructions
  KVM: Disable compat ioctl for s390
  KVM: s390: add cpu model support
  KVM: s390: use facilities and cpu_id per KVM
  KVM: s390/CPACF: Choose crypto control block format
  s390/kernel: Update /proc/sysinfo file with Extended Name and UUID
  KVM: s390: reenable LPP facility
  KVM: s390: floating irqs: fix user triggerable endless loop
  kvm: add halt_poll_ns module parameter
  kvm: remove KVM_MMIO_SIZE
  KVM: MIPS: Don't leak FPU/DSP to guest
  KVM: MIPS: Disable HTW while in guest
  KVM: nVMX: Enable nested posted interrupt processing
  KVM: nVMX: Enable nested virtual interrupt delivery
  KVM: nVMX: Enable nested apic register virtualization
  KVM: nVMX: Make nested control MSRs per-cpu
  KVM: nVMX: Enable nested virtualize x2apic mode
  KVM: nVMX: Prepare for using hardware MSR bitmap
  ...
2015-02-13 09:55:09 -08:00
Paolo Bonzini
d44e121223 KVM: x86: emulate: correct page fault error code for NoWrite instructions
NoWrite instructions (e.g. cmp or test) never set the "write access"
bit in the error code, even if one of the operands is treated as a
destination.

Fixes: c205fb7d7d
Cc: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-09 13:36:01 +01:00
Nadav Amit
82268083fa KVM: x86: Emulation of call may use incorrect stack size
On long-mode, when far call that changes cs.l takes place, the stack size is
determined by the new mode.  For instance, if we go from 32-bit mode to 64-bit
mode, the stack-size if 64.  KVM uses the old stack size.

Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:17:34 +01:00
Nadav Amit
bac155310b KVM: x86: 32-bit wraparound read/write not emulated correctly
If we got a wraparound of 32-bit operand, and the limit is 0xffffffff, read and
writes should be successful. It just needs to be done in two segments.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:15:18 +01:00
Nadav Amit
2b42fce695 KVM: x86: Fix defines in emulator.c
Unnecassary define was left after commit 7d882ffa81 ("KVM: x86: Revert
NoBigReal patch in the emulator").

Commit 39f062ff51 ("KVM: x86: Generate #UD when memory operand is required")
was missing undef.

Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:15:03 +01:00
Nadav Amit
2276b5116e KVM: x86: ARPL emulation can cause spurious exceptions
ARPL and MOVSXD are encoded the same and their execution depends on the
execution mode.  The operand sizes of each instruction are different.
Currently, ARPL is detected too late, after the decoding was already done, and
therefore may result in spurious exception (instead of failed emulation).

Introduce a group to the emulator to handle instructions according to execution
mode (32/64 bits). Note: in order not to make changes that may affect
performance, the new ModeDual can only be applied to instructions with ModRM.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:14:49 +01:00
Nadav Amit
801806d956 KVM: x86: IRET emulation does not clear NMI masking
The IRET instruction should clear NMI masking, but the current implementation
does not do so.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:14:42 +01:00
Nadav Amit
16794aaaab KVM: x86: Wrong operand size for far ret
Indeed, Intel SDM specifically states that for the RET instruction "In 64-bit
mode, the default operation size of this instruction is the stack-address size,
i.e. 64 bits."

However, experiments show this is not the case. Here is for example objdump of
small 64-bit asm:

  4004f1:	ca 14 00             	lret   $0x14
  4004f4:	48 cb                	lretq
  4004f6:	48 ca 14 00          	lretq  $0x14

Therefore, remove the Stack flag from far-ret instructions.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:14:25 +01:00
Nadav Amit
2fcf5c8ae2 KVM: x86: Dirty the dest op page on cmpxchg emulation
Intel SDM says for CMPXCHG: "To simplify the interface to the processor’s bus,
the destination operand receives a write cycle without regard to the result of
the comparison.". This means the destination page should be dirtied.

Fix it to by writing back the original value if cmpxchg failed.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:14:18 +01:00
Nadav Amit
f3747379ac KVM: x86: SYSENTER emulation is broken
SYSENTER emulation is broken in several ways:
1. It misses the case of 16-bit code segments completely (CVE-2015-0239).
2. MSR_IA32_SYSENTER_CS is checked in 64-bit mode incorrectly (bits 0 and 1 can
   still be set without causing #GP).
3. MSR_IA32_SYSENTER_EIP and MSR_IA32_SYSENTER_ESP are not masked in
   legacy-mode.
4. There is some unneeded code.

Fix it.

Cc: stable@vger.linux.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-23 13:57:15 +01:00
Nadav Amit
63ea0a49ae KVM: x86: Fix of previously incomplete fix for CVE-2014-8480
STR and SLDT with rip-relative operand can cause a host kernel oops.
Mark them as DstMem as well.

Cc: stable@vger.linux.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-23 13:56:56 +01:00
Nadav Amit
c205fb7d7d KVM: x86: #PF error-code on R/W operations is wrong
When emulating an instruction that reads the destination memory operand (i.e.,
instructions without the Mov flag in the emulator), the operand is first read.
If a page-fault is detected in this phase, the error-code which would be
delivered to the VM does not indicate that the access that caused the exception
is a write one. This does not conform with real hardware, and may cause the VM
to enter the page-fault handler twice for no reason (once for read, once for
write).

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-09 10:24:11 +01:00
Nadav Amit
edccda7ca7 KVM: x86: Access to LDT/GDT that wraparound is incorrect
When access to descriptor in LDT/GDT wraparound outside long-mode, the address
of the descriptor should be truncated to 32-bit.  Citing Intel SDM 2.1.1.1
"Global and Local Descriptor Tables in IA-32e Mode": "GDTR and LDTR registers
are expanded to 64-bits wide in both IA-32e sub-modes (64-bit mode and
compatibility mode)."

So in other cases, we need to truncate. Creating new function to return a
pointer to descriptor table to avoid too much code duplication.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
[Wrap 64-bit check with #ifdef CONFIG_X86_64, to avoid a "right shift count
 >= width of type" warning and consequent undefined behavior. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:08 +01:00
Nadav Amit
e2cefa746e KVM: x86: Do not set access bit on accessed segments
When segment is loaded, the segment access bit is set unconditionally.  In
fact, it should be set conditionally, based on whether the segment had the
accessed bit set before. In addition, it can improve performance.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:07 +01:00
Nadav Amit
ab708099a0 KVM: x86: POP [ESP] is not emulated correctly
According to Intel SDM: "If the ESP register is used as a base register for
addressing a destination operand in memory, the POP instruction computes the
effective address of the operand after it increments the ESP register."

The current emulation does not behave so. The fix required to waste another
of the precious instruction flags and to check the flag in decode_modrm.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:07 +01:00
Nadav Amit
80976dbb5c KVM: x86: em_call_far should return failure result
Currently, if em_call_far fails it returns success instead of the resulting
error-code. Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:06 +01:00
Nadav Amit
3dc4bc4f6b KVM: x86: JMP/CALL using call- or task-gate causes exception
The KVM emulator does not emulate JMP and CALL that target a call gate or a
task gate.  This patch does not try to implement these scenario as they are
presumably rare; yet it returns X86EMUL_UNHANDLEABLE error in such cases
instead of generating an exception.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:05 +01:00
Nadav Amit
16bebefe29 KVM: x86: fnstcw and fnstsw may cause spurious exception
Since the operand size of fnstcw and fnstsw is updated during the execution,
the emulation may cause spurious exceptions as it reads the memory beforehand.

Marking these instructions as Mov (since the previous value is ignored) and
DstMem16 to simplify the setting of operand size.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:05 +01:00
Nadav Amit
3313bc4ee8 KVM: x86: pop sreg accesses only 2 bytes
Although pop sreg updates RSP according to the operand size, only 2 bytes are
read.  The current behavior may result in incorrect #GP or #PF exceptions.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:04 +01:00
Nicholas Krause
5ff22e7ebf KVM: x86: Remove FIXMEs in emulate.c for the function,task_switch_32
Remove FIXME comments about needing fault addresses to be returned.  These
are propaagated from walk_addr_generic to gva_to_gpa and from there to
ops->read_std and ops->write_std.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:17 +01:00
Linus Torvalds
66dcff86ba 3.19 changes for KVM:
- spring cleaning: removed support for IA64, and for hardware-assisted
 virtualization on the PPC970
 - ARM, PPC, s390 all had only small fixes
 
 For x86:
 - small performance improvements (though only on weird guests)
 - usual round of hardware-compliancy fixes from Nadav
 - APICv fixes
 - XSAVES support for hosts and guests.  XSAVES hosts were broken because
 the (non-KVM) XSAVES patches inadvertently changed the KVM userspace
 ABI whenever XSAVES was enabled; hence, this part is going to stable.
 Guest support is just a matter of exposing the feature and CPUID leaves
 support.
 
 Right now KVM is broken for PPC BookE in your tree (doesn't compile).
 I'll reply to the pull request with a patch, please apply it either
 before the pull request or in the merge commit, in order to preserve
 bisectability somewhat.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJUkpg+AAoJEL/70l94x66DUmoH/jzXYkptSW9NGgm79KqxGJlD
 lzLnLBkitVvx++Mz5YBhdJEhKKLUlCtifFT1zPJQ/pthQhIRSaaAwZyNGgUs5w5x
 yMGKHiPQFyZRbmQtZhCInW0BftJoYHHciO3nUfHCZnp34My9MP2D55W7/z+fYFfQ
 DuqBSE9ThyZJtZ4zh8NRA9fCOeuqwVYRyoBs820Wbsh4cpIBoIK63Dg7k+CLE+ZV
 MZa/mRL6bAfsn9W5bnOUAgHJ3SPznnWbO3/g0aV+roL/5pffblprJx9lKNR08xUM
 6hDFLop2gDehDJesDkY/o8Ckp1hEouvfsVpSShry4vcgtn0hgh2O5/6Orbmj6vE=
 =Zwq1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM update from Paolo Bonzini:
 "3.19 changes for KVM:

   - spring cleaning: removed support for IA64, and for hardware-
     assisted virtualization on the PPC970

   - ARM, PPC, s390 all had only small fixes

  For x86:
   - small performance improvements (though only on weird guests)
   - usual round of hardware-compliancy fixes from Nadav
   - APICv fixes
   - XSAVES support for hosts and guests.  XSAVES hosts were broken
     because the (non-KVM) XSAVES patches inadvertently changed the KVM
     userspace ABI whenever XSAVES was enabled; hence, this part is
     going to stable.  Guest support is just a matter of exposing the
     feature and CPUID leaves support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits)
  KVM: move APIC types to arch/x86/
  KVM: PPC: Book3S: Enable in-kernel XICS emulation by default
  KVM: PPC: Book3S HV: Improve H_CONFER implementation
  KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
  KVM: PPC: Book3S HV: Remove code for PPC970 processors
  KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
  KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
  arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
  arch: powerpc: kvm: book3s_pr.c: Remove unused function
  arch: powerpc: kvm: book3s.c: Remove some unused functions
  arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
  KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
  KVM: PPC: Book3S HV: ptes are big endian
  KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
  KVM: PPC: Book3S HV: Fix KSM memory corruption
  KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
  KVM: PPC: Book3S HV: Fix computation of tlbie operand
  KVM: PPC: Book3S HV: Add missing HPTE unlock
  KVM: PPC: BookE: Improve irq inject tracepoint
  arm/arm64: KVM: Require in-kernel vgic for the arch timers
  ...
2014-12-18 16:05:28 -08:00
Nadav Amit
ab646f54f4 KVM: x86: em_ret_far overrides cpl
commit d50eaa1803 ("KVM: x86: Perform limit checks when assigning EIP")
mistakenly used zero as cpl on em_ret_far. Use the actual one.

Fixes: d50eaa1803
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-11 12:27:32 +01:00
Nadav Amit
64a38292ed KVM: x86: Emulate should check #UD before #GP
Intel SDM table 6-2 ("Priority Among Simultaneous Exceptions and Interrupts")
shows that faults from decoding the next instruction got higher priority than
general protection.  Moving the protected-mode check before the CPL check to
avoid wrong exception on vm86 mode.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-10 12:53:37 +01:00
Nadav Amit
bc397a6c91 KVM: x86: Do not push eflags.vm on pushf
The pushf instruction does not push eflags.VM, so emulation should not do so as
well.  Although eflags.RF should not be pushed as well, it is already cleared
by the time pushf is executed.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-10 12:51:28 +01:00
Nadav Amit
53bb4f789a KVM: x86: Remove prefix flag when GP macro is used
The macro GP already sets the flag Prefix. Remove the redundant flag for
0f_38_f0 and 0f_38_f1 opcodes.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-10 12:51:13 +01:00
Nadav Amit
39f062ff51 KVM: x86: Generate #UD when memory operand is required
Certain x86 instructions that use modrm operands only allow memory operand
(i.e., mod012), and cause a #UD exception otherwise. KVM ignores this fact.
Currently, the instructions that are such and are emulated by KVM are MOVBE,
MOVNTPS, MOVNTPD and MOVNTI.  MOVBE is the most blunt example, since it may be
emulated by the host regardless of MMIO.

The fix introduces a new group for handling such instructions, marking mod3 as
illegal instruction.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04 15:29:04 +01:00
Nicholas Krause
86619e7ba3 KVM: x86: Remove FIXMEs in emulate.c
Remove FIXME comments about needing fault addresses to be returned.  These
are propaagated from walk_addr_generic to gva_to_gpa and from there to
ops->read_std and ops->write_std.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:54:43 +01:00
Paolo Bonzini
997b04128d KVM: emulator: remove duplicated limit check
The check on the higher limit of the segment, and the check on the
maximum accessible size, is the same for both expand-up and
expand-down segments.  Only the computation of "lim" varies.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:40:24 +01:00
Paolo Bonzini
01485a2230 KVM: emulator: remove code duplication in register_address{,_increment}
register_address has been a duplicate of address_mask ever since the
ancestor of __linearize was born in 90de84f50b (KVM: x86 emulator:
preserve an operand's segment identity, 2010-11-17).

However, we can put it to a better use by including the call to reg_read
in register_address.  Similarly, the call to reg_rmw can be moved to
register_address_increment.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:27:27 +01:00
Nadav Amit
31ff64881b KVM: x86: Move __linearize masking of la into switch
In __linearize there is check of the condition whether to check if masking of
the linear address is needed.  It occurs immediately after switch that
evaluates the same condition.  Merge them.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:20:15 +01:00
Nadav Amit
abc7d8a4c9 KVM: x86: Non-canonical access using SS should cause #SS
When SS is used using a non-canonical address, an #SS exception is generated on
real hardware.  KVM emulator causes a #GP instead. Fix it to behave as real x86
CPU.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:19:57 +01:00
Nadav Amit
d50eaa1803 KVM: x86: Perform limit checks when assigning EIP
If branch (e.g., jmp, ret) causes limit violations, since the target IP >
limit, the #GP exception occurs before the branch.  In other words, the RIP
pushed on the stack should be that of the branch and not that of the target.

To do so, we can call __linearize, with new EIP, which also saves us the code
which performs the canonical address checks. On the case of assigning an EIP >=
2^32 (when switching cs.l), we also safe, as __linearize will check the new EIP
does not exceed the limit and would trigger #GP(0) otherwise.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:19:22 +01:00
Nadav Amit
a7315d2f3c KVM: x86: Emulator performs privilege checks on __linearize
When segment is accessed, real hardware does not perform any privilege level
checks.  In contrast, KVM emulator does. This causes some discrepencies from
real hardware. For instance, reading from readable code segment may fail due to
incorrect segment checks. In addition, it introduces unnecassary overhead.

To reference Intel SDM 5.5 ("Privilege Levels"): "Privilege levels are checked
when the segment selector of a segment descriptor is loaded into a segment
register." The SDM never mentions privilege level checks during memory access,
except for loading far pointers in section 5.10 ("Pointer Validation"). Those
are actually segment selector loads and are emulated in the similarily (i.e.,
regardless to __linearize checks).

This behavior was also checked using sysexit. A data-segment whose DPL=0 was
loaded, and after sysexit (CPL=3) it is still accessible.

Therefore, all the privilege level checks in __linearize are removed.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:17:58 +01:00
Nadav Amit
1c1c35ae4b KVM: x86: Stack size is overridden by __linearize
When performing segmented-read/write in the emulator for stack operations, it
ignores the stack size, and uses the ad_bytes as indication for the pointer
size. As a result, a wrong address may be accessed.

To fix this behavior, we can remove the masking of address in __linearize and
perform it beforehand.  It is already done for the operands (so currently it is
inefficiently done twice). It is missing in two cases:
1. When using rip_relative
2. On fetch_bit_operand that changes the address.

This patch masks the address on these two occassions, and removes the masking
from __linearize.

Note that it does not mask EIP during fetch. In protected/legacy mode code
fetch when RIP >= 2^32 should result in #GP and not wrap-around. Since we make
limit checks within __linearize, this is the expected behavior.

Partial revert of commit 518547b32a (KVM: x86: Emulator does not
calculate address correctly, 2014-09-30).

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:17:10 +01:00
Nadav Amit
7d882ffa81 KVM: x86: Revert NoBigReal patch in the emulator
Commit 10e38fc7cab6 ("KVM: x86: Emulator flag for instruction that only support
16-bit addresses in real mode") introduced NoBigReal for instructions such as
MONITOR. Apparetnly, the Intel SDM description that led to this patch is
misleading.  Since no instruction is using NoBigReal, it is safe to remove it,
we fully understand what the SDM means.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:13:27 +01:00
Nadav Amit
ed9aad215f KVM: x86: MOVNTI emulation min opsize is not respected
Commit 3b32004a66 ("KVM: x86: movnti minimum op size of 32-bit is not kept")
did not fully fix the minimum operand size of MONTI emulation. Still, MOVNTI
may be mistakenly performed using 16-bit opsize.

This patch add No16 flag to mark an instruction does not support 16-bits
operand size.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-08 08:20:54 +01:00
Nadav Amit
b2c9d43e6c KVM: x86: Return UNHANDLABLE on unsupported SYSENTER
Now that KVM injects #UD on "unhandlable" error, it makes better sense to
return such error on sysenter instead of directly injecting #UD to the guest.
This allows to track more easily the unhandlable cases the emulator does not
support.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-08 08:20:52 +01:00
Nadav Amit
d14cb5df59 KVM: x86: Emulator mis-decodes VEX instructions on real-mode
Commit 7fe864dc94 (KVM: x86: Mark VEX-prefix instructions emulation as
unimplemented, 2014-06-02) marked VEX instructions as such in protected
mode.  VEX-prefix instructions are not supported relevant on real-mode
and VM86, but should cause #UD instead of being decoded as LES/LDS.

Fix this behaviour to be consistent with real hardware.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
[Check for mod == 3, rather than 2 or 3. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-08 08:20:10 +01:00
Nadav Amit
2c2ca2d12f KVM: x86: Remove redundant and incorrect cpl check on task-switch
Task-switch emulation checks the privilege level prior to performing the
task-switch.  This check is incorrect in the case of task-gates, in which the
tss.dpl is ignored, and can cause superfluous exceptions.  Moreover this check
is unnecassary, since the CPU checks the privilege levels prior to exiting.
Intel SDM 25.4.2 says "If CALL or JMP accesses a TSS descriptor directly
outside IA-32e mode, privilege levels are checked on the TSS descriptor" prior
to exiting.  AMD 15.14.1 says "The intercept is checked before the task switch
takes place but after the incoming TSS and task gate (if one was involved) have
been checked for correctness."

This patch removes the CPL checks for CALL and JMP.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07 15:44:10 +01:00
Nadav Amit
9a9abf6b61 KVM: x86: Inject #GP when loading system segments with non-canonical base
When emulating LTR/LDTR/LGDT/LIDT, #GP should be injected if the base is
non-canonical. Otherwise, VM-entry will fail.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07 15:44:09 +01:00
Nadav Amit
5b7f6a1e6f KVM: x86: Combine the lgdt and lidt emulation logic
LGDT and LIDT emulation logic is almost identical. Merge the logic into a
single point to avoid redundancy. This will be used by the next patch that
will ensure the bases of the loaded GDTR and IDTR are canonical.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07 15:44:08 +01:00
Nadav Amit
9d88fca71a KVM: x86: MOV to CR3 can set bit 63
Although Intel SDM mentions bit 63 is reserved, MOV to CR3 can have bit 63 set.
As Intel SDM states in section 4.10.4 "Invalidation of TLBs and
Paging-Structure Caches": " MOV to CR3. ... If CR4.PCIDE = 1 and bit 63 of the
instruction’s source operand is 0 ..."

In other words, bit 63 is not reserved. KVM emulator currently consider bit 63
as reserved. Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07 15:44:07 +01:00
Nadav Amit
0fcc207c66 KVM: x86: Emulate push sreg as done in Core
According to Intel SDM push of segment selectors is done in the following
manner: "if the operand size is 32-bits, either a zero-extended value is pushed
on the stack or the segment selector is written on the stack using a 16-bit
move. For the last case, all recent Core and Atom processors perform a 16-bit
move, leaving the upper portion of the stack location unmodified."

This patch modifies the behavior to match the core behavior.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07 15:44:06 +01:00
Nadav Amit
5aca372236 KVM: x86: Wrong flags on CMPS and SCAS emulation
CMPS and SCAS instructions are evaluated in the wrong order.  For reference (of
CMPS), see http://www.fermimn.gov.it/linux/quarta/x86/cmps.htm : "Note that the
direction of subtraction for CMPS is [SI] - [DI] or [ESI] - [EDI]. The left
operand (SI or ESI) is the source and the right operand (DI or EDI) is the
destination. This is the reverse of the usual Intel convention in which the
left operand is the destination and the right operand is the source."

Introducing em_cmp_r for this matter that performs comparison in reverse order
using fastop infrastructure to avoid a wrapper function.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07 15:44:06 +01:00
Nadav Amit
807c142595 KVM: x86: SYSCALL cannot clear eflags[1]
SYSCALL emulation currently clears in 64-bit mode eflags according to
MSR_SYSCALL_MASK.  However, on bare-metal eflags[1] which is fixed to one
cannot be cleared, even if MSR_SYSCALL_MASK masks the bit.  This wrong behavior
may result in failed VM-entry, as VT disallows entry with eflags[1] cleared.

This patch sets the bit after masking eflags on syscall.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07 15:44:05 +01:00
Nadav Amit
b5bbf10ee6 KVM: x86: Emulation of MOV-sreg to memory uses incorrect size
In x86, you can only MOV-sreg to memory with either 16-bits or 64-bits size.
In contrast, KVM may write to 32-bits memory on MOV-sreg. This patch fixes KVM
behavior, and sets the destination operand size to two, if the destination is
memory.

When destination is registers, and the operand size is 32-bits, the high
16-bits in modern CPUs is filled with zero.  This is handled correctly.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07 15:44:04 +01:00
Nadav Amit
6d2a0526b0 KVM: x86: Emulator should set DR6 upon GD like real CPU
It should clear B0-B3 and set BD.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07 15:44:02 +01:00
Nadav Amit
5b38ab877e KVM: x86: decode_modrm does not regard modrm correctly
In one occassion, decode_modrm uses the rm field after it is extended with
REX.B to determine the addressing mode. Doing so causes it not to read the
offset for rip-relative addressing with REX.B=1.

This patch moves the fetch where we already mask REX.B away instead.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07 15:44:01 +01:00
Nadav Amit
d29b9d7ed7 KVM: x86: Fix uninitialized op->type for some immediate values
The emulator could reuse an op->type from a previous instruction for some
immediate values.  If it mistakenly considers the operands as memory
operands, it will performs a memory read and overwrite op->val.

Consider for instance the ROR instruction - src2 (the number of times)
would be read from memory instead of being used as immediate.

Mark every immediate operand as such to avoid this problem.

Cc: stable@vger.kernel.org
Fixes: c44b4c6ab8
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-05 12:36:58 +01:00
Nadav Amit
c49c759f7a KVM: x86: Emulator performs code segment checks on read access
When read access is performed using a readable code segment, the "conforming"
and "non-conforming" checks should not be done.  As a result, read using
non-conforming readable code segment fails.

This is according to Intel SDM 5.6.1 ("Accessing Data in Code Segments").

The fix is not to perform the "non-conforming" checks if the access is not a
fetch; the relevant checks are already done when loading the segment.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03 12:07:25 +01:00
Nadav Amit
518547b32a KVM: x86: Emulator does not calculate address correctly
In long-mode, when the address size is 4 bytes, the linear address is not
truncated as the emulator mistakenly does.  Instead, the offset within the
segment (the ea field) should be truncated according to the address size.

As Intel SDM says: "In 64-bit mode, the effective address components are added
and the effective address is truncated ... before adding the full 64-bit
segment base."

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03 12:07:24 +01:00
Nadav Amit
bf0b682c9b KVM: x86: Sysexit emulation does not mask RIP/RSP
If the operand size is not 64-bit, then the sysexit instruction should assign
ECX to RSP and EDX to RIP.  The current code assigns the full 64-bits.

Fix it by masking.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03 12:07:21 +01:00
Nadav Amit
58b7075d05 KVM: x86: Distinguish between stack operation and near branches
In 64-bit, stack operations default to 64-bits, but can be overriden (to
16-bit) using opsize override prefix. In contrast, near-branches are always
64-bit.  This patch distinguish between the different behaviors.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03 12:07:20 +01:00
Nadav Amit
f7784046ab KVM: x86: Getting rid of grp45 in emulator
Breaking grp45 to the relevant functions to speed up the emulation and simplify
the code. In addition, it is necassary the next patch will distinguish between
far and near branches according to the flags.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03 12:07:20 +01:00
Nadav Amit
4be4de7ef9 KVM: x86: Use new is_noncanonical_address in _linearize
Replace the current canonical address check with the new function which is
identical.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03 12:07:19 +01:00
Paolo Bonzini
d09155d2f3 KVM: emulator: always inline __linearize
The two callers have a lot of constant arguments that can be
optimized out.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03 12:07:18 +01:00
Nadav Amit
7e46dddd6f KVM: x86: Fix far-jump to non-canonical check
Commit d1442d85cc ("KVM: x86: Handle errors when RIP is set during far
jumps") introduced a bug that caused the fix to be incomplete.  Due to
incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
not trigger #GP.  As we know, this imposes a security problem.

In addition, the condition for two warnings was incorrect.

Fixes: d1442d85cc
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
[Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-02 07:54:55 +01:00
Nadav Amit
cd9b8e2c48 KVM: x86: Fix far-jump to non-canonical check
Commit d1442d85cc ("KVM: x86: Handle errors when RIP is set during far
jumps") introduced a bug that caused the fix to be incomplete.  Due to
incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
not trigger #GP.  As we know, this imposes a security problem.

In addition, the condition for two warnings was incorrect.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
[Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-29 13:13:51 +01:00
Paolo Bonzini
fd56e1546a KVM: emulator: fix execution close to the segment limit
Emulation of code that is 14 bytes to the segment limit or closer
(e.g. RIP = 0xFFFFFFF2 after reset) is broken because we try to read as
many as 15 bytes from the beginning of the instruction, and __linearize
fails when the passed (address, size) pair reaches out of the segment.

To fix this, let __linearize return the maximum accessible size (clamped
to 2^32-1) for usage in __do_insn_fetch_bytes, and avoid the limit check
by passing zero for the desired size.

For expand-down segments, __linearize is performing a redundant check.
(u32)(addr.ea + size - 1) <= lim can only happen if addr.ea is close
to 4GB; in this case, addr.ea + size - 1 will also fail the check against
the upper bound of the segment (which is provided by the D/B bit).
After eliminating the redundant check, it is simple to compute
the *max_size for expand-down segments too.

Now that the limit check is done in __do_insn_fetch_bytes, we want
to inject a general protection fault there if size < op_size (like
__linearize would have done), instead of just aborting.

This fixes booting Tiano Core from emulated flash with EPT disabled.

Cc: stable@vger.kernel.org
Fixes: 719d5a9b24
Reported-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-29 13:13:48 +01:00
Paolo Bonzini
3606189fa3 KVM: emulator: fix error code for __linearize
The error code for #GP and #SS is zero when the segment is used to
access an operand or an instruction.  It is only non-zero when
a segment register is being loaded; for limit checks this means
cases such as:

* for #GP, when RIP is beyond the limit on a far call (before the first
instruction is executed).  We do not implement this check, but it
would be in em_jmp_far/em_call_far.

* for #SS, if the new stack overflows during an inter-privilege-level
call to a non-conforming code segment.  We do not implement stack
switching at all.

So use an error code of zero.

Reviewed-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-29 12:40:28 +01:00
Nadav Amit
3f6f1480d8 KVM: x86: PREFETCH and HINT_NOP should have SrcMem flag
The decode phase of the x86 emulator assumes that every instruction with the
ModRM flag, and which can be used with RIP-relative addressing, has either
SrcMem or DstMem.  This is not the case for several instructions - prefetch,
hint-nop and clflush.

Adding SrcMem|NoAccess for prefetch and hint-nop and SrcMem for clflush.

This fixes CVE-2014-8480.

Fixes: 41061cdb98
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:30:36 +02:00
Nadav Amit
13e457e0ee KVM: x86: Emulator does not decode clflush well
Currently, all group15 instructions are decoded as clflush (e.g., mfence,
xsave).  In addition, the clflush instruction requires no prefix (66/f2/f3)
would exist. If prefix exists it may encode a different instruction (e.g.,
clflushopt).

Creating a group for clflush, and different group for each prefix.

This has been the case forever, but the next patch needs the cflush group
in order to fix a bug introduced in 3.17.

Fixes: 41061cdb98
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:30:36 +02:00
Paolo Bonzini
a430c91663 KVM: emulate: avoid accessing NULL ctxt->memopp
A failure to decode the instruction can cause a NULL pointer access.
This is fixed simply by moving the "done" label as close as possible
to the return.

This fixes CVE-2014-8481.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Cc: stable@vger.kernel.org
Fixes: 41061cdb98
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:30:35 +02:00
Nadav Amit
08da44aedb KVM: x86: Decoding guest instructions which cross page boundary may fail
Once an instruction crosses a page boundary, the size read from the second page
disregards the common case that part of the operand resides on the first page.
As a result, fetch of long insturctions may fail, and thereby cause the
decoding to fail as well.

Cc: stable@vger.kernel.org
Fixes: 5cfc7e0f5e
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:18 +02:00
Nadav Amit
d1442d85cc KVM: x86: Handle errors when RIP is set during far jumps
Far jmp/call/ret may fault while loading a new RIP.  Currently KVM does not
handle this case, and may result in failed vm-entry once the assignment is
done.  The tricky part of doing so is that loading the new CS affects the
VMCS/VMCB state, so if we fail during loading the new RIP, we are left in
unconsistent state.  Therefore, this patch saves on 64-bit the old CS
descriptor and restores it if loading RIP failed.

This fixes CVE-2014-3647.

Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:16 +02:00
Nadav Amit
234f3ce485 KVM: x86: Emulator fixes for eip canonical checks on near branches
Before changing rip (during jmp, call, ret, etc.) the target should be asserted
to be canonical one, as real CPUs do.  During sysret, both target rsp and rip
should be canonical. If any of these values is noncanonical, a #GP exception
should occur.  The exception to this rule are syscall and sysenter instructions
in which the assigned rip is checked during the assignment to the relevant
MSRs.

This patch fixes the emulator to behave as real CPUs do for near branches.
Far branches are handled by the next patch.

This fixes CVE-2014-3647.

Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:16 +02:00
Nadav Amit
05c83ec9b7 KVM: x86: Fix wrong masking on relative jump/call
Relative jumps and calls do the masking according to the operand size, and not
according to the address size as the KVM emulator does today.

This patch fixes KVM behavior.

Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:15 +02:00
Nadav Amit
040c8dc8a5 KVM: x86: emulating descriptor load misses long-mode case
In 64-bit mode a #GP should be delivered to the guest "if the code segment
descriptor pointed to by the selector in the 64-bit gate doesn't have the L-bit
set and the D-bit clear." - Intel SDM "Interrupt 13—General Protection
Exception (#GP)".

This patch fixes the behavior of CS loading emulation code. Although the
comment says that segment loading is not supported in long mode, this function
is executed in long mode, so the fix is necassary.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:52 +02:00
Paolo Bonzini
d5b7706970 KVM: x86: remove Aligned bit from movntps/movntpd
These are not explicitly aligned, and do not require alignment on AVX.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 14:57:59 +02:00
Alex Williamson
0a37027e83 KVM: x86 emulator: emulate MOVNTDQ
Windows 8.1 guest with NVIDIA driver and GPU fails to boot with an
emulation failure.  The KVM spew suggests the fault is with lack of
movntdq emulation (courtesy of Paolo):

Code=02 00 00 b8 08 00 00 00 f3 0f 6f 44 0a f0 f3 0f 6f 4c 0a e0 <66> 0f e7 41 f0 66 0f e7 49 e0 48 83 e9 40 f3 0f 6f 44 0a 10 f3 0f 6f 0c 0a 66 0f e7 41 10

$ as -o a.out
        .section .text
        .byte 0x66, 0x0f, 0xe7, 0x41, 0xf0
        .byte 0x66, 0x0f, 0xe7, 0x49, 0xe0
$ objdump -d a.out
    0:  66 0f e7 41 f0          movntdq %xmm0,-0x10(%rcx)
    5:  66 0f e7 49 e0          movntdq %xmm1,-0x20(%rcx)

Add the necessary emulation.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 14:57:59 +02:00
Nadav Amit
0f54a32130 KVM: vmx: VMXOFF emulation in vm86 should cause #UD
Unlike VMCALL, the instructions VMXOFF, VMLAUNCH and VMRESUME should cause a UD
exception in real-mode or vm86.  However, the emulator considers all these
instructions the same for the matter of mode checks, and emulation upon exit
due to #UD exception.

As a result, the hypervisor behaves incorrectly on vm86 mode. VMXOFF, VMLAUNCH
or VMRESUME cause on vm86 exit due to #UD. The hypervisor then emulates these
instruction and inject #GP to the guest instead of #UD.

This patch creates a new group for these instructions and mark only VMCALL as
an instruction which can be emulated.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 14:02:49 +02:00
Paolo Bonzini
e0ad0b477c KVM: emulate: warn on invalid or uninitialized exception numbers
These were reported when running Jailhouse on AMD processors.

Initialize ctxt->exception.vector with an invalid exception number,
and warn if it remained invalid even though the emulator got
an X86EMUL_PROPAGATE_FAULT return code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-20 13:01:26 +02:00
Paolo Bonzini
592f085847 KVM: emulate: do not return X86EMUL_PROPAGATE_FAULT explicitly
Always get it through emulate_exception or emulate_ts.  This
ensures that the ctxt->exception fields have been populated.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-20 13:01:25 +02:00
Paolo Bonzini
15fc075269 KVM: x86: raise invalid TSS exceptions during a task switch
Conditions that would usually trigger a general protection fault should
instead raise #TS.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:28 +02:00
Paolo Bonzini
9a4cfb27f7 KVM: x86: do not check CS.DPL against RPL during task switch
This reverts the check added by commit 5045b46803 (KVM: x86: check CS.DPL
against RPL during task switch, 2014-05-15).  Although the CS.DPL=CS.RPL
check is mentioned in table 7-1 of the SDM as causing a #TSS exception,
it is not mentioned in table 6-6 that lists "invalid TSS conditions"
which cause #TSS exceptions. In fact it causes some tests to fail, which
pass on bare-metal.

Keep the rest of the commit, since we will find new uses for it in 3.18.

Reported-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:28 +02:00
Nadav Amit
3a6095a017 KVM: x86: Avoid emulating instructions on #UD mistakenly
Commit d40a6898e5 mistakenly caused instructions which are not marked as
EmulateOnUD to be emulated upon #UD exception. The commit caused the check of
whether the instruction flags include EmulateOnUD to never be evaluated. As a
result instructions whose emulation is broken may be emulated.  This fix moves
the evaluation of EmulateOnUD so it would be evaluated.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
[Tweak operand order in &&, remove EmulateOnUD where it's now superfluous.
 - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:28 +02:00
Mark Rustad
b55a8144d1 x86/kvm: Resolve shadow warning from min macro
Resolve a shadow warning generated in W=2 builds by the nested
use of the min macro by instead using the min3 macro for the
minimum of 3 values.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-25 16:05:54 +02:00
Nadav Amit
b9a1ecb909 KVM: x86: Setting rflags.rf during rep-string emulation
This patch updates RF for rep-string emulation.  The flag is set upon the first
iteration, and cleared after the last (if emulated). It is intended to make
sure that if a trap (in future data/io #DB emulation) or interrupt is delivered
to the guest during the rep-string instruction, RF will be set correctly. RF
affects whether instruction breakpoint in the guest is masked.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-24 14:03:54 +02:00
Nadav Amit
6c6cb69b8e KVM: x86: Cleanup of rflags.rf cleaning
RFLAGS.RF was cleaned in several functions (e.g., syscall) in the x86 emulator.
Now that we clear it before the execution of an instruction in the emulator, we
can remove the specific cleanup of RFLAGS.RF.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 13:42:39 +02:00
Nadav Amit
4467c3f1ad KVM: x86: Clear rflags.rf on emulated instructions
When an instruction is emulated RFLAGS.RF should be cleared. KVM previously did
not do so. This patch clears RFLAGS.RF after interception is done.  If a fault
occurs during the instruction, RFLAGS.RF will be set by a previous patch.  This
patch does not handle the case of traps/interrupts during rep-strings. Traps
are only expected to occur on debug watchpoints, and those are anyhow not
handled by the emulator.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 13:42:21 +02:00
Nadav Amit
163b135e7b KVM: x86: popf emulation should not change RF
RFLAGS.RF is always zero after popf. Therefore, popf should not updated RF, as
anyhow emulating popf, just as any other instruction should clear RFLAGS.RF.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 13:41:58 +02:00
Nadav Amit
68efa764f3 KVM: x86: Emulator support for #UD on CPL>0
Certain instructions (e.g., mwait and monitor) cause a #UD exception when they
are executed in user mode. This is in contrast to the regular privileged
instructions which cause #GP. In order not to mess with SVM interception of
mwait and monitor which assumes privilege level assertions take place before
interception, a flag has been added.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:05 +02:00
Nadav Amit
10e38fc7ca KVM: x86: Emulator flag for instruction that only support 16-bit addresses in real mode
Certain instructions, such as monitor and xsave do not support big real mode
and cause a #GP exception if any of the accessed bytes effective address are
not within [0, 0xffff].  This patch introduces a flag to mark these
instructions, including the necassary checks.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:04 +02:00