2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

253 Commits

Author SHA1 Message Date
Nadav Amit
d5262739cb KVM: x86: Remove debug assertion of non-PAE reserved bits
Commit 346874c950 ("KVM: x86: Fix CR3 reserved bits") removed non-PAE
reserved bits which were not according to Intel SDM.  However, residue was left
in a debug assertion (CR3_NONPAE_RESERVED_BITS).  Remove it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:55 +02:00
Paolo Bonzini
54987b7afa KVM: x86: propagate exception from permission checks on the nested page fault
Currently, if a permission error happens during the translation of
the final GPA to HPA, walk_addr_generic returns 0 but does not fill
in walker->fault.  To avoid this, add an x86_exception* argument
to the translate_gpa function, and let it fill in walker->fault.
The nested_page_fault field will be true, since the walk_mmu is the
nested_mmu and translate_gpu instead operates on the "outer" (NPT)
instance.

Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-05 12:01:13 +02:00
Paolo Bonzini
5e35251951 KVM: nSVM: propagate the NPF EXITINFO to the guest
This is similar to what the EPT code does with the exit qualification.
This allows the guest to see a valid value for bits 33:32.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-03 10:18:54 +02:00
Xiao Guangrong
a086f6a1eb Revert "KVM: Simplify kvm->tlbs_dirty handling"
This reverts commit 5befdc385d.

Since we will allow flush tlb out of mmu-lock in the later
patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:49:48 -03:00
Feng Wu
97ec8c067d KVM: Add SMAP support when setting CR4
This patch adds SMAP handling logic when setting CR4 for guests

Thanks a lot to Paolo Bonzini for his suggestion to use the branchless
way to detect SMAP violation.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:34 -03:00
Takuya Yoshikawa
5befdc385d KVM: Simplify kvm->tlbs_dirty handling
When this was introduced, kvm_flush_remote_tlbs() could be called
without holding mmu_lock.  It is now acknowledged that the function
must be called before releasing mmu_lock, and all callers have already
been changed to do so.

There is no need to use smp_mb() and cmpxchg() any more.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-18 10:07:26 +01:00
Marcelo Tosatti
37f6a4e237 KVM: x86: handle invalid root_hpa everywhere
Rom Freiman <rom@stratoscale.com> notes other code paths vulnerable to
bug fixed by 989c6b34f6.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-01-15 12:16:16 +01:00
Paolo Bonzini
ba6a354154 KVM: mmu: allow page tables to be in read-only slots
Page tables in a read-only memory slot will currently cause a triple
fault because the page walker uses gfn_to_hva and it fails on such a slot.

OVMF uses such a page table; however, real hardware seems to be fine with
that as long as the accessed/dirty bits are set.  Save whether the slot
is readonly, and later check it when updating the accessed and dirty bits.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-17 12:52:31 +03:00
Yang Zhang
25d92081ae nEPT: Add nEPT violation/misconfigration support
Inject nEPT fault to L1 guest. This patch is original from Xinhao.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07 15:57:40 +02:00
Yang Zhang
7a1638ce42 nEPT: Redefine EPT-specific link_shadow_page()
Since nEPT doesn't support A/D bit, so we should not set those bit
when build shadow page table.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07 15:57:39 +02:00
Nadav Har'El
37406aaaee nEPT: Add EPT tables support to paging_tmpl.h
This is the first patch in a series which adds nested EPT support to KVM's
nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use
EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest
to set its own cr3 and take its own page faults without either of L0 or L1
getting involved. This often significanlty improves L2's performance over the
previous two alternatives (shadow page tables over EPT, and shadow page
tables over shadow page tables).

This patch adds EPT support to paging_tmpl.h.

paging_tmpl.h contains the code for reading and writing page tables. The code
for 32-bit and 64-bit tables is very similar, but not identical, so
paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once
with PTTYPE=64, and this generates the two sets of similar functions.

There are subtle but important differences between the format of EPT tables
and that of ordinary x86 64-bit page tables, so for nested EPT we need a
third set of functions to read the guest EPT table and to write the shadow
EPT table.

So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions (prefixed
with "EPT") which correctly read and write EPT tables.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07 15:57:38 +02:00
Gleb Natapov
61719a8fff nEPT: Support shadow paging for guest paging without A/D bits
Some guest paging modes do not support A/D bits. Add support for such
modes in shadow page code. For such modes PT_GUEST_DIRTY_MASK,
PT_GUEST_ACCESSED_MASK, PT_GUEST_DIRTY_SHIFT and PT_GUEST_ACCESSED_SHIFT
should be set to zero.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07 15:57:37 +02:00
Gleb Natapov
d8089baca4 nEPT: make guest's A/D bits depends on guest's paging mode
This patch makes guest A/D bits definition to be dependable on paging
mode, so when EPT support will be added it will be able to define them
differently.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07 15:57:37 +02:00
Nadav Har'El
0ad805a0c3 nEPT: Move common code to paging_tmpl.h
For preparation, we just move gpte_access(), prefetch_invalid_gpte(),
s_rsvd_bits_set(), protect_clean_gpte() and is_dirty_gpte() from mmu.c
to paging_tmpl.h.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07 15:57:36 +02:00
Xiao Guangrong
027664216d KVM: MMU: fix check the reserved bits on the gpte of L2
Current code always uses arch.mmu to check the reserved bits on guest gpte
which is valid only for L1 guest, we should use arch.nested_mmu instead when
we translate gva to gpa for the L2 guest

Fix it by using @mmu instead since it is adapted to the current mmu mode
automatically

The bug can be triggered when nested npt is used and L1 guest and L2 guest
use different mmu mode

Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07 15:57:33 +02:00
Xiao Guangrong
f8f559422b KVM: MMU: fast invalidate all mmio sptes
This patch tries to introduce a very simple and scale way to invalidate
all mmio sptes - it need not walk any shadow pages and hold mmu-lock

KVM maintains a global mmio valid generation-number which is stored in
kvm->memslots.generation and every mmio spte stores the current global
generation-number into his available bits when it is created

When KVM need zap all mmio sptes, it just simply increase the global
generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
then it walks the shadow page table and get the mmio spte. If the
generation-number on the spte does not equal the global generation-number,
it will go to the normal #PF handler to update the mmio spte

Since 19 bits are used to store generation-number on mmio spte, we zap all
mmio sptes when the number is round

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-27 14:20:36 +03:00
Xiao Guangrong
f2fd125d32 KVM: MMU: store generation-number into mmio spte
Store the generation-number into bit3 ~ bit11 and bit52 ~ bit61, totally
19 bits can be used, it should be enough for nearly all most common cases

In this patch, the generation-number is always 0, it will be changed in
the later patch

[Gleb: masking generation bits from spte in get_mmio_spte_gfn() and
       get_mmio_spte_access()]

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-27 14:18:15 +03:00
Takuya Yoshikawa
450e0b411f Revert "KVM: MMU: Move kvm_mmu_free_some_pages() into kvm_mmu_alloc_page()"
With the following commit, shadow pages can be zapped at random during
a shadow page talbe walk:
  KVM: MMU: Move kvm_mmu_free_some_pages() into kvm_mmu_alloc_page()
  7ddca7e43c

This patch reverts it and fixes __direct_map() and FNAME(fetch)().

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-04-07 13:13:36 +03:00
Takuya Yoshikawa
7ddca7e43c KVM: MMU: Move kvm_mmu_free_some_pages() into kvm_mmu_alloc_page()
What this function is doing is to ensure that the number of shadow pages
does not exceed the maximum limit stored in n_max_mmu_pages: so this is
placed at every code path that can reach kvm_mmu_alloc_page().

Although it might have some sense to spread this function in each such
code path when it could be called before taking mmu_lock, the rule was
changed not to do so.

Taking this background into account, this patch moves it into
kvm_mmu_alloc_page() and simplifies the code.

Note: the unlikely hint in kvm_mmu_free_some_pages() guarantees that the
overhead of this function is almost zero except when we actually need to
allocate some shadow pages, so we do not need to care about calling it
multiple times in one path by doing kvm_mmu_get_page() a few times.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-03-21 19:44:56 -03:00
Xiao Guangrong
f761620377 KVM: MMU: remove pt_access in mmu_set_spte
It is only used in debug code, so drop it

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-06 22:42:08 -02:00
Gleb Natapov
eb3fce87cc KVM: MMU: drop superfluous is_present_gpte() check.
Gust page walker puts only present ptes into ptes[] array. No need to
check it again.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04 23:24:28 -02:00
Xiao Guangrong
93c05d3ef2 KVM: x86: improve reexecute_instruction
The current reexecute_instruction can not well detect the failed instruction
emulation. It allows guest to retry all the instructions except it accesses
on error pfn

For example, some cases are nested-write-protect - if the page we want to
write is used as PDE but it chains to itself. Under this case, we should
stop the emulation and report the case to userspace

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-21 22:58:33 -02:00
Xiao Guangrong
7751babd3c KVM: MMU: fix infinite fault access retry
We have two issues in current code:
- if target gfn is used as its page table, guest will refault then kvm will use
  small page size to map it. We need two #PF to fix its shadow page table

- sometimes, say a exception is triggered during vm-exit caused by #PF
  (see handle_exception() in vmx.c), we remove all the shadow pages shadowed
  by the target gfn before go into page fault path, it will cause infinite
  loop:
  delete shadow pages shadowed by the gfn -> try to use large page size to map
  the gfn -> retry the access ->...

To fix these, we can adjust page size early if the target gfn is used as page
table

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-10 15:28:30 -02:00
Xiao Guangrong
c22885050e KVM: MMU: fix Dirty bit missed if CR0.WP = 0
If the write-fault access is from supervisor and CR0.WP is not set on the
vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte
and clears U bit. This is the chance that kvm can change pte access from
readonly to writable

Unfortunately, the pte access is the access of 'direct' shadow page table,
means direct sp.role.access = pte_access, then we will create a writable
spte entry on the readonly shadow page table. It will cause Dirty bit is
not tracked when two guest ptes point to the same large page. Note, it
does not have other impact except Dirty bit since cr0.wp is encoded into
sp.role

It can be fixed by adjusting pte access before establishing shadow page
table. Also, after that, no mmu specified code exists in the common function
and drop two parameters in set_spte

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-10 15:28:08 -02:00
Gleb Natapov
b0cfeb5ded KVM: x86: remove unused variable from walk_addr_generic()
Fix compilation warning.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-08 17:23:39 -02:00
Gleb Natapov
908e7d7999 KVM: MMU: simplify folding of dirty bit into accessed_dirty
MMU code tries to avoid if()s HW is not able to predict reliably by using
bitwise operation to streamline code execution, but in case of a dirty bit
folding this gives us nothing since write_fault is checked right before
the folding code. Lets just piggyback onto the if() to make code more clear.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-07 20:31:35 -02:00
Xiao Guangrong
81c52c56e2 KVM: do not treat noslot pfn as a error pfn
This patch filters noslot pfn out from error pfns based on Marcelo comment:
noslot pfn is not a error pfn

After this patch,
- is_noslot_pfn indicates that the gfn is not in slot
- is_error_pfn indicates that the gfn is in slot but the error is occurred
  when translate the gfn to pfn
- is_error_noslot_pfn indicates that the pfn either it is error pfns or it
  is noslot pfn
And is_invalid_pfn can be removed, it makes the code more clean

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-29 20:31:04 -02:00
Christoffer Dall
8ca40a70a7 KVM: Take kvm instead of vcpu to mmu_notifier_retry
The mmu_notifier_retry is not specific to any vcpu (and never will be)
so only take struct kvm as a parameter.

The motivation is the ARM mmu code that needs to call this from
somewhere where we long let go of the vcpu pointer.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-23 13:35:43 +02:00
Xiao Guangrong
bd6360cc0a KVM: MMU: introduce FNAME(prefetch_gpte)
The only difference between FNAME(update_pte) and FNAME(pte_prefetch)
is that the former is allowed to prefetch gfn from dirty logged slot,
so introduce a common function to prefetch spte

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-17 16:39:19 +02:00
Xiao Guangrong
a052b42b0e KVM: MMU: move prefetch_invalid_gpte out of pagaing_tmp.h
The function does not depend on guest mmu mode, move it out from
paging_tmpl.h

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-17 16:39:18 +02:00
Xiao Guangrong
d4878f24e3 KVM: MMU: cleanup FNAME(page_fault)
Let it return emulate state instead of spte like __direct_map

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-17 16:39:16 +02:00
Xiao Guangrong
bd660776da KVM: MMU: remove mmu_is_invalid
Remove mmu_is_invalid and use is_invalid_pfn instead

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-17 16:39:15 +02:00
Avi Kivity
c5421519f3 KVM: MMU: Eliminate pointless temporary 'ac'
'ac' essentially reconstructs the 'access' variable we already
have, except for the PFERR_PRESENT_MASK and PFERR_RSVD_MASK.  As
these are not used by callees, just use 'access' directly.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20 13:00:10 +03:00
Avi Kivity
b514c30f77 KVM: MMU: Avoid access/dirty update loop if all is well
Keep track of accessed/dirty bits; if they are all set, do not
enter the accessed/dirty update loop.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20 13:00:09 +03:00
Avi Kivity
71331a1da1 KVM: MMU: Eliminate eperm temporary
'eperm' is no longer used in the walker loop, so we can eliminate it.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20 13:00:09 +03:00
Avi Kivity
6fd01b711b KVM: MMU: Optimize is_last_gpte()
Instead of branchy code depending on level, gpte.ps, and mmu configuration,
prepare everything in a bitmap during mode changes and look it up during
runtime.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20 13:00:09 +03:00
Avi Kivity
13d22b6aeb KVM: MMU: Simplify walk_addr_generic() loop
The page table walk is coded as an infinite loop, with a special
case on the last pte.

Code it as an ordinary loop with a termination condition on the last
pte (large page or walk length exhausted), and put the last pte handling
code after the loop where it belongs.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20 13:00:08 +03:00
Avi Kivity
97d64b7881 KVM: MMU: Optimize pte permission checks
walk_addr_generic() permission checks are a maze of branchy code, which is
performed four times per lookup.  It depends on the type of access, efer.nxe,
cr0.wp, cr4.smep, and in the near future, cr4.smap.

Optimize this away by precalculating all variants and storing them in a
bitmap.  The bitmap is recalculated when rarely-changing variables change
(cr0, cr4) and is indexed by the often-changing variables (page fault error
code, pte access permissions).

The permission check is moved to the end of the loop, otherwise an SMEP
fault could be reported as a false positive, when PDE.U=1 but PTE.U=0.
Noted by Xiao Guangrong.

The result is short, branch-free code.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20 13:00:08 +03:00
Avi Kivity
8cbc70696f KVM: MMU: Update accessed and dirty bits after guest pagetable walk
While unspecified, the behaviour of Intel processors is to first
perform the page table walk, then, if the walk was successful, to
atomically update the accessed and dirty bits of walked paging elements.

While we are not required to follow this exactly, doing so will allow us
to perform the access permissions check after the walk is complete, rather
than after each walk step.

(the tricky case is SMEP: a zero in any pte's U bit makes the referenced
page a supervisor page, so we can't fault on a one bit during the walk
itself).

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20 13:00:08 +03:00
Avi Kivity
3d34adec70 KVM: MMU: Move gpte_access() out of paging_tmpl.h
We no longer rely on paging_tmpl.h defines; so we can move the function
to mmu.c.

Rely on zero extension to 64 bits to get the correct nx behaviour.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20 13:00:08 +03:00
Avi Kivity
edc2ae84eb KVM: MMU: Optimize gpte_access() slightly
If nx is disabled, then is gpte[63] is set we will hit a reserved
bit set fault before checking permissions; so we can ignore the
setting of efer.nxe.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20 13:00:07 +03:00
Avi Kivity
8ea667f259 KVM: MMU: Push clean gpte write protection out of gpte_access()
gpte_access() computes the access permissions of a guest pte and also
write-protects clean gptes.  This is wrong when we are servicing a
write fault (since we'll be setting the dirty bit momentarily) but
correct when instantiating a speculative spte, or when servicing a
read fault (since we'll want to trap a following write in order to
set the dirty bit).

It doesn't seem to hurt in practice, but in order to make the code
readable, push the write protection out of gpte_access() and into
a new protect_clean_gpte() which is called explicitly when needed.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20 13:00:07 +03:00
Xiao Guangrong
cb9aaa30b1 KVM: do not release the error pfn
After commit a2766325cf, the error pfn is replaced by the
error code, it need not be released anymore

[ The patch has been compiling tested for powerpc ]

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06 16:04:57 +03:00
Xiao Guangrong
6fbc277053 KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
The P bit of page fault error code is missed in this tracepoint, fix it by
passing the full error code

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11 16:51:22 +03:00
Davidlohr Bueso
f71fa31f9f KVM: MMU: use page table level macro
Its much cleaner to use PT_PAGE_TABLE_LEVEL than its numeric value.

Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-04-18 23:35:01 -03:00
Cong Wang
8fd75e1216 x86: remove the second argument of k[un]map_atomic()
Acked-by: Avi Kivity <avi@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:15 +08:00
Xiao Guangrong
0375f7fad9 KVM: MMU: audit: replace mmu audit tracepoint with jump-label
The tracepoint is only used to audit mmu code, it should not be exposed to
user, let us replace it with jump-label.

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-27 11:22:05 +02:00
Xiao Guangrong
a30f47cb15 KVM: MMU: improve write flooding detected
Detecting write-flooding does not work well, when we handle page written, if
the last speculative spte is not accessed, we treat the page is
write-flooding, however, we can speculative spte on many path, such as pte
prefetch, page synced, that means the last speculative spte may be not point
to the written page and the written page can be accessed via other sptes, so
depends on the Accessed bit of the last speculative spte is not enough

Instead of detected page accessed, we can detect whether the spte is accessed
after it is written, if the spte is not accessed but it is written frequently,
we treat is not a page table or it not used for a long time

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-27 11:17:02 +02:00
Xiao Guangrong
f57f2ef58f KVM: MMU: fast prefetch spte on invlpg path
Fast prefetch spte for the unsync shadow page on invlpg path

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-27 11:16:56 +02:00
Xiao Guangrong
505aef8f30 KVM: MMU: cleanup FNAME(invlpg)
Directly Use mmu_page_zap_pte to zap spte in FNAME(invlpg), also remove the
same code between FNAME(invlpg) and FNAME(sync_page)

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-27 11:16:54 +02:00
Yang, Wei Y
cd46868c7f KVM: MMU: Fix SMEP failure during fetch
This patch fix kvm-unit-tests hanging and incorrect PT_ACCESSED_MASK
bit set in the case of SMEP fault.  The code updated 'eperm' after
the variable was checked.

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25 19:18:02 +03:00
Avi Kivity
e4e517b4be KVM: MMU: Do not unconditionally read PDPTE from guest memory
Architecturally, PDPTEs are cached in the PDPTRs when CR3 is reloaded.
On SVM, it is not possible to implement this, but on VMX this is possible
and was indeed implemented until nested SVM changed this to unconditionally
read PDPTEs dynamically.  This has noticable impact when running PAE guests.

Fix by changing the MMU to read PDPTRs from the cache, falling back to
reading from memory for the nested MMU.

Signed-off-by: Avi Kivity <avi@redhat.com>
Tested-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25 19:18:01 +03:00
Xiao Guangrong
ce88decffd KVM: MMU: mmio page fault support
The idea is from Avi:

| We could cache the result of a miss in an spte by using a reserved bit, and
| checking the page fault error code (or seeing if we get an ept violation or
| ept misconfiguration), so if we get repeated mmio on a page, we don't need to
| search the slot list/tree.
| (https://lkml.org/lkml/2011/2/22/221)

When the page fault is caused by mmio, we cache the info in the shadow page
table, and also set the reserved bits in the shadow page table, so if the mmio
is caused again, we can quickly identify it and emulate it directly

Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it
can be reduced by this feature, and also avoid walking guest page table for
soft mmu.

[jan: fix operator precedence issue]

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24 11:50:40 +03:00
Xiao Guangrong
d7c55201e6 KVM: MMU: abstract some functions to handle fault pfn
Introduce handle_abnormal_pfn to handle fault pfn on page fault path,
introduce mmu_invalid_pfn to handle fault pfn on prefetch path

It is the preparing work for mmio page fault support

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24 11:50:35 +03:00
Xiao Guangrong
c37079586f KVM: MMU: remove bypass_guest_pf
The idea is from Avi:
| Maybe it's time to kill off bypass_guest_pf=1.  It's not as effective as
| it used to be, since unsync pages always use shadow_trap_nonpresent_pte,
| and since we convert between the two nonpresent_ptes during sync and unsync.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24 11:50:33 +03:00
Xiao Guangrong
b90a0e6c81 KVM: MMU: rename 'pt_write' to 'emulate'
If 'pt_write' is true, we need to emulate the fault. And in later patch, we
need to emulate the fault even though it is not a pt_write event, so rename
it to better fit the meaning

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24 11:50:30 +03:00
Xiao Guangrong
b36c7a7c10 KVM: MMU: cleanup for FNAME(fetch)
gw->pte_access is the final access permission, since it is unified with
gw->pt_access when we walked guest page table:

FNAME(walk_addr_generic):
	pte_access = pt_access & FNAME(gpte_access)(vcpu, pte, true);

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24 11:50:29 +03:00
Xiao Guangrong
640d9b0dbe KVM: MMU: optimize to handle dirty bit
If dirty bit is not set, we can make the pte access read-only to avoid handing
dirty bit everywhere

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24 11:50:27 +03:00
Xiao Guangrong
bebb106a5a KVM: MMU: cache mmio info on page fault path
If the page fault is caused by mmio, we can cache the mmio info, later, we do
not need to walk guest page table and quickly know it is a mmio fault while we
emulate the mmio instruction

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24 11:50:26 +03:00
Takuya Yoshikawa
3c8c652ae4 KVM: MMU: Introduce is_last_gpte() to clean up walk_addr_generic()
Suggested by Ingo and Avi.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:44 +03:00
Takuya Yoshikawa
92c1c1e85b KVM: MMU: Rename the walk label in walk_addr_generic()
The current name does not explain the meaning well.  So give it a better
name "retry_walk" to show that we are trying the walk again.

This was suggested by Ingo Molnar.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:43 +03:00
Takuya Yoshikawa
134291bf3c KVM: MMU: Clean up the error handling of walk_addr_generic()
Avoid two step jump to the error handling part.  This eliminates the use
of the variables present and rsvd_fault.

We also use the const type qualifier to show that write/user/fetch_fault
do not change in the function.

Both of these were suggested by Ingo Molnar.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:42 +03:00
Yang, Wei Y
e57d4a356a KVM: Add instruction fetch checking when walking guest page table
This patch adds instruction fetch checking when walking guest page table,
to implement SMEP when emulating instead of executing natively.

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Shan, Haitao <haitao.shan@intel.com>
Signed-off-by: Li, Xin <xin.li@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:15 +03:00
Borislav Petkov
b72336355b KVM: MMU: Fix build warnings in walk_addr_generic()
On 3.0-rc1 I get

In file included from arch/x86/kvm/mmu.c:2856:
arch/x86/kvm/paging_tmpl.h: In function ‘paging32_walk_addr_generic’:
arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function
In file included from arch/x86/kvm/mmu.c:2852:
arch/x86/kvm/paging_tmpl.h: In function ‘paging64_walk_addr_generic’:
arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function

caused by 6e2ca7d180. According to Takuya
Yoshikawa, ptep_user won't be used uninitialized so shut up gcc.

Cc: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Link: http://lkml.kernel.org/r/20110530094604.GC21833@liondog.tnic
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-06-19 19:23:13 +03:00
Takuya Yoshikawa
c8cfbb555e KVM: MMU: Use ptep_user for cmpxchg_gpte()
The address of the gpte was already calculated and stored in ptep_user
before entering cmpxchg_gpte().

This patch makes cmpxchg_gpte() to use that to make it clear that we
are using the same address during walk_addr_generic().

Note that the unlikely annotations are used to show that the conditions
are something unusual rather than for performance.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-05-22 08:48:14 -04:00
Takuya Yoshikawa
fa3d315a4c KVM: Validate userspace_addr of memslot when registered
This way, we can avoid checking the user space address many times when
we read the guest memory.

Although we can do the same for write if we check which slots are
writable, we do not care write now: reading the guest memory happens
more often than writing.

[avi: change VERIFY_READ to VERIFY_WRITE]

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:47:56 -04:00
Takuya Yoshikawa
12cb814f3b KVM: MMU: Clean up gpte reading with copy_from_user()
When we optimized walk_addr_generic() by not using the generic guest
memory reader, we replaced copy_from_user() with get_user():

  commit e30d2a170506830d5eef5e9d7990c5aedf1b0a51
  KVM: MMU: Optimize guest page table walk

  commit 15e2ac9a43d4d7d08088e404fddf2533a8e7d52e
  KVM: MMU: Fix 64-bit paging breakage on x86_32

But as Andi pointed out later, copy_from_user() does the same as
get_user() as long as we give a constant size to it.

So we use copy_from_user() to clean up the code.

The only, noticeable, regression introduced by this is 64-bit gpte
reading on x86_32 hosts needed for PAE guests.

But this can be mitigated by implementing 8-byte get_user() for x86_32,
if needed.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:47:54 -04:00
Takuya Yoshikawa
8f74d8e168 KVM: MMU: Fix 64-bit paging breakage on x86_32
Fix regression introduced by
  commit e30d2a170506830d5eef5e9d7990c5aedf1b0a51
  KVM: MMU: Optimize guest page table walk

On x86_32, get_user() does not support 64-bit values and we fail to
build KVM at the point of 64-bit paging.

This patch fixes this by using get_user() twice for that condition.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:39:58 -04:00
Avi Kivity
781e0743af KVM: MMU: Add unlikely() annotations to walk_addr_generic()
walk_addr_generic() is a hot path and is also hard for the cpu to predict -
some of the parameters (fetch_fault in particular) vary wildly from
invocation to invocation.

Add unlikely() annotations where appropriate; all walk failures are
considered unlikely, as are cases where we have to mark the accessed or
dirty bit, as they are slow paths both in kvm and on real processors.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:39:46 -04:00
Takuya Yoshikawa
6e2ca7d180 KVM: MMU: Optimize guest page table walk
This patch optimizes the guest page table walk by using get_user()
instead of copy_from_user().

With this patch applied, paging64_walk_addr_generic() has become
about 0.5us to 1.0us faster on my Phenom II machine with NPT on.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:39:38 -04:00
Roedel, Joerg
a78484c60e KVM: MMU: Make cmpxchg_gpte aware of nesting too
This patch makes the cmpxchg_gpte() function aware of the
difference between l1-gfns and l2-gfns when nested
virtualization is in use.  This fixes a potential
data-corruption problem in the l1-guest and makes the code
work correct (at least as correct as the hardware which is
emulated in this code) again.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:39:26 -04:00
Xiao Guangrong
7c5625227f KVM: MMU: remove mmu_seq verification on pte update path
The mmu_seq verification can be removed since we get the pfn in the
protection of mmu_lock.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:57:03 -04:00
Linus Torvalds
f2e1fbb5f2 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Flush TLB if PGD entry is changed in i386 PAE mode
  x86, dumpstack: Correct stack dump info when frame pointer is available
  x86: Clean up csum-copy_64.S a bit
  x86: Fix common misspellings
  x86: Fix misspelling and align params
  x86: Use PentiumPro-optimized partial_csum() on VIA C7
2011-03-18 10:45:21 -07:00
Lucas De Marchi
0d2eb44f63 x86: Fix common misspellings
They were generated by 'codespell' and then manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: trivial@kernel.org
LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-18 10:39:30 +01:00
Xiao Guangrong
0f53b5b1c0 KVM: MMU: cleanup pte write path
This patch does:
- call vcpu->arch.mmu.update_pte directly
- use gfn_to_pfn_atomic in update_pte path

The suggestion is from Avi.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:35 -03:00
Xiao Guangrong
676646ee4b KVM: MMU: remove unused macros
These macros are not used, so removed

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:32 -03:00
Xiao Guangrong
49b26e26e4 KVM: MMU: do not record gfn in kvm_mmu_pte_write
No need to record the gfn to verifier the pte has the same mode as
current vcpu, it's because we only speculatively update the pte only
if the pte and vcpu have the same mode

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:32 -03:00
Andrea Arcangeli
936a5fe6e6 thp: kvm mmu transparent hugepage support
This should work for both hugetlbfs and transparent hugepages.

[akpm@linux-foundation.org: bring forward PageTransCompound() addition for bisectability]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:41 -08:00
Xiao Guangrong
f8e453b00c KVM: MMU: handle 'map_writable' in set_spte() function
Move the operation of 'writable' to set_spte() to clean up code

[avi: remove unneeded booleanification]

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:31:19 +02:00
Avi Kivity
d3c422bd33 KVM: MMU: Fix incorrect direct page write protection due to ro host page
If KVM sees a read-only host page, it will map it as read-only to prevent
breaking a COW.  However, if the page was part of a large guest page, KVM
incorrectly extends the write protection to the entire large page frame
instead of limiting it to the normal host page.

This results in the instantiation of a new shadow page with read-only access.

If this happens for a MOVS instruction that moves memory between two normal
pages, within a single large page frame, and mapped within the guest as a
large page, and if, in addition, the source operand is not writeable in the
host (perhaps due to KSM), then KVM will instantiate a read-only direct
shadow page, instantiate an spte for the source operand, then instantiate
a new read/write direct shadow page and instantiate an spte for the
destination operand.  Since these two sptes are in different shadow pages,
MOVS will never see them at the same time and the guest will not make
progress.

Fix by mapping the direct shadow page read/write, and only marking the
host page read-only.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:51 +02:00
Xiao Guangrong
fb67e14fc9 KVM: MMU: retry #PF for softmmu
Retry #PF for softmmu only when the current vcpu has the same cr3 as the time
when #PF occurs

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:41 +02:00
Xiao Guangrong
78b2c54aa4 KVM: MMU: rename 'no_apf' to 'prefault'
It's the speculative path if 'no_apf = 1' and we will specially handle this
speculative path in the later patch, so 'prefault' is better to fit the sense.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:38 +02:00
Avi Kivity
6389ee9463 KVM: Pull extra page fault information into struct x86_exception
Currently page fault cr2 and nesting infomation are carried outside
the fault data structure.  Instead they are placed in the vcpu struct,
which results in confusion as global variables are manipulated instead
of passing parameters.

Fix this issue by adding address and nested fields to struct x86_exception,
so this struct can carry all information associated with a fault.

Signed-off-by: Avi Kivity <avi@redhat.com>
Tested-by: Joerg Roedel <joerg.roedel@amd.com>
Tested-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:30:02 +02:00
Avi Kivity
8c28d03165 KVM: Push struct x86_exception into walk_addr()
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:30:01 +02:00
Avi Kivity
ab9ae31387 KVM: Push struct x86_exception info the various gva_to_gpa variants
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:29:59 +02:00
Xiao Guangrong
a4ee1ca4a3 KVM: MMU: delay flush all tlbs on sync_page path
Quote from Avi:
| I don't think we need to flush immediately; set a "tlb dirty" bit somewhere
| that is cleareded when we flush the tlb.  kvm_mmu_notifier_invalidate_page()
| can consult the bit and force a flush if set.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:29:51 +02:00
Xiao Guangrong
407c61c6bd KVM: MMU: abstract invalid guest pte mapping
Introduce a common function to map invalid gpte

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:29:49 +02:00
Xiao Guangrong
a4a8e6f76e KVM: MMU: remove 'clear_unsync' parameter
Remove it since we can judge it by using sp->unsync

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:29:48 +02:00
Lai Jiangshan
9bdbba13b8 KVM: MMU: rename 'reset_host_protection' to 'host_writable'
Rename it to fit its sense better

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:29:46 +02:00
Xiao Guangrong
30bfb3c425 KVM: MMU: fix forgot flush tlbs on sync_page path
We should flush all tlbs after drop spte on sync_page path since

Quote from Avi:
| sync_page
| drop_spte
| kvm_mmu_notifier_invalidate_page
| kvm_unmap_rmapp
| spte doesn't exist -> no flush
| page is freed
| guest can write into freed page?

KVM-Stable-Tag.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:29:43 +02:00
Xiao Guangrong
e730b63cc0 KVM: MMU: don't mark spte notrap if reserved bit set
If reserved bit is set, we need inject the #PF with PFEC.RSVD=1,
but shadow_notrap_nonpresent_pte injects #PF with PFEC.RSVD=0 only

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:29:27 +02:00
Marcelo Tosatti
612819c3c6 KVM: propagate fault r/w information to gup(), allow read-only memory
As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
to writable if host pte allows it.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:28:40 +02:00
Gleb Natapov
56028d0861 KVM: Retry fault before vmentry
When page is swapped in it is mapped into guest memory only after guest
tries to access it again and generate another fault. To save this fault
we can map it immediately since we know that guest is going to access
the page. Do it only when tdp is enabled for now. Shadow paging case is
more complicated. CR[034] and EFER registers should be switched before
doing mapping and then switched back.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:23:06 +02:00
Gleb Natapov
af585b921e KVM: Halt vcpu if page it tries to access is swapped out
If a guest accesses swapped out memory do not swap it in from vcpu thread
context. Schedule work to do swapping and put vcpu into halted state
instead.

Interrupts will still be delivered to the guest and if interrupt will
cause reschedule guest will continue to run another task.

[avi: remove call to get_user_pages_noio(), nacked by Linus; this
      makes everything synchrnous again]

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:21:39 +02:00
Nicolas Kaiser
9611c18777 KVM: fix typo in copyright notice
Fix typo in copyright notice.

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:14 +02:00
Xiao Guangrong
3377078027 KVM: MMU: move access code parsing to FNAME(walk_addr) function
Move access code parsing from caller site to FNAME(walk_addr) function

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:14 +02:00
Xiao Guangrong
20bd40dc64 KVM: MMU: cleanup for error mask set while walk guest page table
Small cleanup for set page fault error code

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:10 +02:00
Joerg Roedel
2d48a985c7 KVM: MMU: Track NX state in struct kvm_mmu
With Nested Paging emulation the NX state between the two
MMU contexts may differ. To make sure that always the right
fault error code is recorded this patch moves the NX state
into struct kvm_mmu so that the code can distinguish between
L1 and L2 NX state.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:44 +02:00
Joerg Roedel
d41d1895eb KVM: MMU: Introduce kvm_pdptr_read_mmu
This function is implemented to load the pdptr pointers of
the currently running guest (l1 or l2 guest). Therefore it
takes care about the current paging mode and can read pdptrs
out of l2 guest physical memory.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:42 +02:00
Joerg Roedel
2329d46d21 KVM: MMU: Make walk_addr_generic capable for two-level walking
This patch uses kvm_read_guest_page_tdp to make the
walk_addr_generic functions suitable for two-level page
table walking.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:38 +02:00