2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

138 Commits

Author SHA1 Message Date
Alexander Graf
c01e3f66cd KVM: PPC: Book3S: Add hack for split real mode
Today we handle split real mode by mapping both instruction and data faults
into a special virtual address space that only exists during the split mode
phase.

This is good enough to catch 32bit Linux guests that use split real mode for
copy_from/to_user. In this case we're always prefixed with 0xc0000000 for our
instruction pointer and can map the user space process freely below there.

However, that approach fails when we're running KVM inside of KVM. Here the 1st
level last_inst reader may well be in the same virtual page as a 2nd level
interrupt handler.

It also fails when running Mac OS X guests. Here we have a 4G/4G split, so a
kernel copy_from/to_user implementation can easily overlap with user space
addresses.

The architecturally correct way to fix this would be to implement an instruction
interpreter in KVM that kicks in whenever we go into split real mode. This
interpreter however would not receive a great amount of testing and be a lot of
bloat for a reasonably isolated corner case.

So I went back to the drawing board and tried to come up with a way to make
split real mode work with a single flat address space. And then I realized that
we could get away with the same trick that makes it work for Linux:

Whenever we see an instruction address during split real mode that may collide,
we just move it higher up the virtual address space to a place that hopefully
does not collide (keep your fingers crossed!).

That approach does work surprisingly well. I am able to successfully run
Mac OS X guests with KVM and QEMU (no split real mode hacks like MOL) when I
apply a tiny timing probe hack to QEMU. I'd say this is a win over even more
broken split real mode :).

Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28 15:23:10 +02:00
Alexander Graf
6f22bd3265 KVM: PPC: Book3S HV: Make HTAB code LE host aware
When running on an LE host all data structures are kept in little endian
byte order. However, the HTAB still needs to be maintained in big endian.

So every time we access any HTAB we need to make sure we do so in the right
byte order. Fix up all accesses to manually byte swap.

Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28 15:22:22 +02:00
Paul Mackerras
ae2113a4f1 KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled
This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL
capability is used to enable or disable in-kernel handling of an
hcall, that the hcall is actually implemented by the kernel.
If not an EINVAL error is returned.

This also checks the default-enabled list of hcalls and prints a
warning if any hcall there is not actually implemented.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28 15:22:18 +02:00
Paul Mackerras
699a0ea082 KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling
This provides a way for userspace controls which sPAPR hcalls get
handled in the kernel.  Each hcall can be individually enabled or
disabled for in-kernel handling, except for H_RTAS.  The exception
for H_RTAS is because userspace can already control whether
individual RTAS functions are handled in-kernel or not via the
KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for
H_RTAS is out of the normal sequence of hcall numbers.

Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the
KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM.
The args field of the struct kvm_enable_cap specifies the hcall number
in args[0] and the enable/disable flag in args[1]; 0 means disable
in-kernel handling (so that the hcall will always cause an exit to
userspace) and 1 means enable.  Enabling or disabling in-kernel
handling of an hcall is effective across the whole VM.

The ability for KVM_ENABLE_CAP to be used on a VM file descriptor
on PowerPC is new, added by this commit.  The KVM_CAP_ENABLE_CAP_VM
capability advertises that this ability exists.

When a VM is created, an initial set of hcalls are enabled for
in-kernel handling.  The set that is enabled is the set that have
an in-kernel implementation at this point.  Any new hcall
implementations from this point onwards should not be added to the
default set without a good reason.

No distinction is made between real-mode and virtual-mode hcall
implementations; the one setting controls them both.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28 15:22:17 +02:00
Aneesh Kumar K.V
3cd60e3118 KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
We use time base for PURR and SPURR emulation with PR KVM since we
are emulating a single threaded core. When using time base
we need to make sure that we don't accumulate time spent in the host
in PURR and SPURR value.

Also we don't need to emulate mtspr because both the registers are
hypervisor resource.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-06 13:56:49 +02:00
Alexander Graf
5deb8e7ad8 KVM: PPC: Make shared struct aka magic page guest endian
The shared (magic) page is a data structure that contains often used
supervisor privileged SPRs accessible via memory to the user to reduce
the number of exits we have to take to read/write them.

When we actually share this structure with the guest we have to maintain
it in guest endianness, because some of the patch tricks only work with
native endian load/store operations.

Since we only share the structure with either host or guest in little
endian on book3s_64 pr mode, we don't have to worry about booke or book3s hv.

For booke, the shared struct stays big endian. For book3s_64 hv we maintain
the struct in host native endian, since it never gets shared with the guest.

For book3s_64 pr we introduce a variable that tells us which endianness the
shared struct is in and route every access to it through helper inline
functions that evaluate this variable.

Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30 14:26:21 +02:00
Greg Kurz
e59d24e612 KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write
When the guest does an MMIO write which is handled successfully by an
ioeventfd, ioeventfd_write() returns 0 (success) and
kvmppc_handle_store() returns EMULATE_DONE.  Then
kvmppc_emulate_mmio() converts EMULATE_DONE to RESUME_GUEST_NV and
this causes an exit from the loop in kvmppc_vcpu_run_hv(), causing an
exit back to userspace with a bogus exit reason code, typically
causing userspace (e.g. qemu) to crash with a message about an unknown
exit code.

This adds handling of RESUME_GUEST_NV in kvmppc_vcpu_run_hv() in order
to fix that.  For generality, we define a helper to check for either
of the return-to-guest codes we use, RESUME_GUEST and RESUME_GUEST_NV,
to make it easy to check for either and provide one place to update if
any other return-to-guest code gets defined in future.

Since it only affects Book3S HV for now, the helper is added to
the kvm_book3s.h header file.

We use the helper in two places in kvmppc_run_core() as well for
future-proofing, though we don't see RESUME_GUEST_NV in either place
at present.

[paulus@samba.org - combined 4 patches into one, rewrote description]

Suggested-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2014-03-26 23:33:44 +11:00
Paolo Bonzini
b73117c493 Merge branch 'kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-queue
Conflicts:
	arch/powerpc/kvm/book3s_hv_rmhandlers.S
	arch/powerpc/kvm/booke.c
2014-01-29 18:29:01 +01:00
Cédric Le Goater
736017752d KVM: PPC: Book3S: MMIO emulation support for little endian guests
MMIO emulation reads the last instruction executed by the guest
and then emulates. If the guest is running in Little Endian order,
or more generally in a different endian order of the host, the
instruction needs to be byte-swapped before being emulated.

This patch adds a helper routine which tests the endian order of
the host and the guest in order to decide whether a byteswap is
needed or not. It is then used to byteswap the last instruction
of the guest in the endian order of the host before MMIO emulation
is performed.

Finally, kvmppc_handle_load() of kvmppc_handle_store() are modified
to reverse the endianness of the MMIO if required.

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
[agraf: add booke handling]
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:00:39 +01:00
Alexander Graf
7a8ff56be6 KVM: PPC: Unify kvmppc_get_last_inst and sc
We had code duplication between the inline functions to get our last
instruction on normal interrupts and system call interrupts. Unify
both helper functions towards a single implementation.

Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09 11:10:46 +01:00
Paul Mackerras
09548fdaf3 KVM: PPC: Use load_fp/vr_state rather than load_up_fpu/altivec
The load_up_fpu and load_up_altivec functions were never intended to
be called from C, and do things like modifying the MSR value in their
callers' stack frames, which are assumed to be interrupt frames.  In
addition, on 32-bit Book S they require the MMU to be off.

This makes KVM use the new load_fp_state() and load_vr_state() functions
instead of load_up_fpu/altivec.  This means we can remove the assembler
glue in book3s_rmhandlers.S, and potentially fixes a bug on Book E,
where load_up_fpu was called directly from C.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09 10:14:59 +01:00
Alexander Graf
c9dad7f9db KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu
The kvmppc_copy_{to,from}_svcpu functions are publically visible,
so we should also export them in a header for others C files to
consume.

So far we didn't need this because we only called it from asm code.
The next patch will introduce a C caller.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-12-09 09:41:38 +01:00
Aneesh Kumar K.V
699cc87641 kvm: powerpc: book3s: Add is_hv_enabled to kvmppc_ops
This help us to identify whether we are running with hypervisor mode KVM
enabled. The change is needed so that we can have both HV and PR kvm
enabled in the same kernel.

If both HV and PR KVM are included, interrupts come in to the HV version
of the kvmppc_interrupt code, which then jumps to the PR handler,
renamed to kvmppc_interrupt_pr, if the guest is a PR guest.

Allowing both PR and HV in the same kernel required some changes to
kvm_dev_ioctl_check_extension(), since the values returned now can't
be selected with #ifdefs as much as previously. We look at is_hv_enabled
to return the right value when checking for capabilities.For capabilities that
are only provided by HV KVM, we return the HV value only if
is_hv_enabled is true. For capabilities provided by PR KVM but not HV,
we return the PR value only if is_hv_enabled is false.

NOTE: in later patch we replace is_hv_enabled with a static inline
function comparing kvm_ppc_ops

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:29:09 +02:00
Aneesh Kumar K.V
3a167beac0 kvm: powerpc: Add kvmppc_ops callback
This patch add a new callback kvmppc_ops. This will help us in enabling
both HV and PR KVM together in the same kernel. The actual change to
enable them together is done in the later patch in the series.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: squash in booke changes]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:24:26 +02:00
Aneesh Kumar K.V
9975f5e369 kvm: powerpc: book3s: Add a new config variable CONFIG_KVM_BOOK3S_HV_POSSIBLE
This help ups to select the relevant code in the kernel code
when we later move HV and PR bits as seperate modules. The patch
also makes the config options for PR KVM selectable

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:18:28 +02:00
Aneesh Kumar K.V
7aa79938f7 kvm: powerpc: book3s: pr: Rename KVM_BOOK3S_PR to KVM_BOOK3S_PR_POSSIBLE
With later patches supporting PR kvm as a kernel module, the changes
that has to be built into the main kernel binary to enable PR KVM module
is now selected via KVM_BOOK3S_PR_POSSIBLE

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:17:49 +02:00
Paul Mackerras
d78bca7296 KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page()
When the MM code is invalidating a range of pages, it calls the KVM
kvm_mmu_notifier_invalidate_range_start() notifier function, which calls
kvm_unmap_hva_range(), which arranges to flush all the existing host
HPTEs for guest pages.  However, the Linux PTEs for the range being
flushed are still valid at that point.  We are not supposed to establish
any new references to pages in the range until the ...range_end()
notifier gets called.  The PPC-specific KVM code doesn't get any
explicit notification of that; instead, we are supposed to use
mmu_notifier_retry() to test whether we are or have been inside a
range flush notifier pair while we have been getting a page and
instantiating a host HPTE for the page.

This therefore adds a call to mmu_notifier_retry inside
kvmppc_mmu_map_page().  This call is inside a region locked with
kvm->mmu_lock, which is the same lock that is called by the KVM
MMU notifier functions, thus ensuring that no new notification can
proceed while we are in the locked region.  Inside this region we
also create the host HPTE and link the corresponding hpte_cache
structure into the lists used to find it later.  We cannot allocate
the hpte_cache structure inside this locked region because that can
lead to deadlock, so we allocate it outside the region and free it
if we end up not using it.

This also moves the updates of vcpu3s->hpte_cache_count inside the
regions locked with vcpu3s->mmu_lock, and does the increment in
kvmppc_mmu_hpte_cache_map() when the pte is added to the cache
rather than when it is allocated, in order that the hpte_cache_count
is accurate.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:49:35 +02:00
Paul Mackerras
93b159b466 KVM: PPC: Book3S PR: Better handling of host-side read-only pages
Currently we request write access to all pages that get mapped into the
guest, even if the guest is only loading from the page.  This reduces
the effectiveness of KSM because it means that we unshare every page we
access.  Also, we always set the changed (C) bit in the guest HPTE if
it allows writing, even for a guest load.

This fixes both these problems.  We pass an 'iswrite' flag to the
mmu.xlate() functions and to kvmppc_mmu_map_page() to indicate whether
the access is a load or a store.  The mmu.xlate() functions now only
set C for stores.  kvmppc_gfn_to_pfn() now calls gfn_to_pfn_prot()
instead of gfn_to_pfn() so that it can indicate whether we need write
access to the page, and get back a 'writable' flag to indicate whether
the page is writable or not.  If that 'writable' flag is clear, we then
make the host HPTE read-only even if the guest HPTE allowed writing.

This means that we can get a protection fault when the guest writes to a
page that it has mapped read-write but which is read-only on the host
side (perhaps due to KSM having merged the page).  Thus we now call
kvmppc_handle_pagefault() for protection faults as well as HPTE not found
faults.  In kvmppc_handle_pagefault(), if the access was allowed by the
guest HPTE and we thus need to install a new host HPTE, we then need to
remove the old host HPTE if there is one.  This is done with a new
function, kvmppc_mmu_unmap_page(), which uses kvmppc_mmu_pte_vflush() to
find and remove the old host HPTE.

Since the memslot-related functions require the KVM SRCU read lock to
be held, this adds srcu_read_lock/unlock pairs around the calls to
kvmppc_handle_pagefault().

Finally, this changes kvmppc_mmu_book3s_32_xlate_pte() to not ignore
guest HPTEs that don't permit access, and to return -EPERM for accesses
that are not permitted by the page protections.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:49:35 +02:00
Paul Mackerras
3ff955024d KVM: PPC: Book3S PR: Allocate kvm_vcpu structs from kvm_vcpu_cache
This makes PR KVM allocate its kvm_vcpu structs from the kvm_vcpu_cache
rather than having them embedded in the kvmppc_vcpu_book3s struct,
which is allocated with vzalloc.  The reason is to reduce the
differences between PR and HV KVM in order to make is easier to have
them coexist in one kernel binary.

With this, the kvm_vcpu struct has a pointer to the kvmppc_vcpu_book3s
struct.  The pointer to the kvmppc_book3s_shadow_vcpu struct has moved
from the kvmppc_vcpu_book3s struct to the kvm_vcpu struct, and is only
present for 32-bit, since it is only used for 32-bit.

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: squash in compile fix from Aneesh]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:45:05 +02:00
Paul Mackerras
c9029c341d KVM: PPC: Book3S PR: Use 64k host pages where possible
Currently, PR KVM uses 4k pages for the host-side mappings of guest
memory, regardless of the host page size.  When the host page size is
64kB, we might as well use 64k host page mappings for guest mappings
of 64kB and larger pages and for guest real-mode mappings.  However,
the magic page has to remain a 4k page.

To implement this, we first add another flag bit to the guest VSID
values we use, to indicate that this segment is one where host pages
should be mapped using 64k pages.  For segments with this bit set
we set the bits in the shadow SLB entry to indicate a 64k base page
size.  When faulting in host HPTEs for this segment, we make them
64k HPTEs instead of 4k.  We record the pagesize in struct hpte_cache
for use when invalidating the HPTE.

For now we restrict the segment containing the magic page (if any) to
4k pages.  It should be possible to lift this restriction in future
by ensuring that the magic 4k page is appropriately positioned within
a host 64k page.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:45:03 +02:00
Paul Mackerras
a4a0f2524a KVM: PPC: Book3S PR: Allow guest to use 64k pages
This adds the code to interpret 64k HPTEs in the guest hashed page
table (HPT), 64k SLB entries, and to tell the guest about 64k pages
in kvm_vm_ioctl_get_smmu_info().  Guest 64k pages are still shadowed
by 4k pages.

This also adds another hash table to the four we have already in
book3s_mmu_hpte.c to allow us to find all the PTEs that we have
instantiated that match a given 64k guest page.

The tlbie instruction changed starting with POWER6 to use a bit in
the RB operand to indicate large page invalidations, and to use other
RB bits to indicate the base and actual page sizes and the segment
size.  64k pages came in slightly earlier, with POWER5++.
We use one bit in vcpu->arch.hflags to indicate that the emulated
cpu supports 64k pages, and another to indicate that it has the new
tlbie definition.

The KVM_PPC_GET_SMMU_INFO ioctl presents a bit of a problem, because
the MMU capabilities depend on which CPU model we're emulating, but it
is a VM ioctl not a VCPU ioctl and therefore doesn't get passed a VCPU
fd.  In addition, commonly-used userspace (QEMU) calls it before
setting the PVR for any VCPU.  Therefore, as a best effort we look at
the first vcpu in the VM and return 64k pages or not depending on its
capabilities.  We also make the PVR default to the host PVR on recent
CPUs that support 1TB segments (and therefore multiple page sizes as
well) so that KVM_PPC_GET_SMMU_INFO will include 64k page and 1TB
segment support on those CPUs.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:45:03 +02:00
Paul Mackerras
a2d56020d1 KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu
Currently PR-style KVM keeps the volatile guest register values
(R0 - R13, CR, LR, CTR, XER, PC) in a shadow_vcpu struct rather than
the main kvm_vcpu struct.  For 64-bit, the shadow_vcpu exists in two
places, a kmalloc'd struct and in the PACA, and it gets copied back
and forth in kvmppc_core_vcpu_load/put(), because the real-mode code
can't rely on being able to access the kmalloc'd struct.

This changes the code to copy the volatile values into the shadow_vcpu
as one of the last things done before entering the guest.  Similarly
the values are copied back out of the shadow_vcpu to the kvm_vcpu
immediately after exiting the guest.  We arrange for interrupts to be
still disabled at this point so that we can't get preempted on 64-bit
and end up copying values from the wrong PACA.

This means that the accessor functions in kvm_book3s.h for these
registers are greatly simplified, and are same between PR and HV KVM.
In places where accesses to shadow_vcpu fields are now replaced by
accesses to the kvm_vcpu, we can also remove the svcpu_get/put pairs.
Finally, on 64-bit, we don't need the kmalloc'd struct at all any more.

With this, the time to read the PVR one million times in a loop went
from 567.7ms to 575.5ms (averages of 6 values), an increase of about
1.4% for this worse-case test for guest entries and exits.  The
standard deviation of the measurements is about 11ms, so the
difference is only marginally significant statistically.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:45:03 +02:00
Paul Mackerras
a0144e2a6b KVM: PPC: Book3S HV: Store LPCR value for each virtual core
This adds the ability to have a separate LPCR (Logical Partitioning
Control Register) value relating to a guest for each virtual core,
rather than only having a single value for the whole VM.  This
corresponds to what real POWER hardware does, where there is a LPCR
per CPU thread but most of the fields are required to have the same
value on all active threads in a core.

The per-virtual-core LPCR can be read and written using the
GET/SET_ONE_REG interface.  Userspace can can only modify the
following fields of the LPCR value:

DPFD	Default prefetch depth
ILE	Interrupt little-endian
TC	Translation control (secondary HPT hash group search disable)

We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
contains bits relating to memory management, i.e. the Virtualized
Partition Memory (VPM) bits and the bits relating to guest real mode.
When this default value is updated, the update needs to be propagated
to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
that.

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix whitespace]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:45:01 +02:00
Paul Mackerras
8b23de2948 KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls
It turns out that if we exit the guest due to a hcall instruction (sc 1),
and the loading of the instruction in the guest exit path fails for any
reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the
instruction after the hcall instruction rather than the hcall itself.
This in turn means that the instruction doesn't get recognized as an
hcall in kvmppc_handle_exit_pr() but gets passed to the guest kernel
as a sc instruction.  That usually results in the guest kernel getting
a return code of 38 (ENOSYS) from an hcall, which often triggers a
BUG_ON() or other failure.

This fixes the problem by adding a new variant of kvmppc_get_last_inst()
called kvmppc_get_last_sc(), which fetches the instruction if necessary
from pc - 4 rather than pc.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28 16:47:49 +02:00
Paul Mackerras
0f296829b5 KVM: PPC: Book3S PR: Allow guest to use 1TB segments
With this, the guest can use 1TB segments as well as 256MB segments.
Since we now have the situation where a single emulated guest segment
could correspond to multiple shadow segments (as the shadow segments
are still 256MB segments), this adds a new kvmppc_mmu_flush_segment()
to scan for all shadow segments that need to be removed.

This restructures the guest HPT (hashed page table) lookup code to
use the correct hashing and matching functions for HPTEs within a
1TB segment.  We use the standard hpt_hash() function instead of
open-coding the hash calculation, and we use HPTE_V_COMPARE() with
an AVPN value that has the B (segment size) field included.  The
calculation of avpn is done a little earlier since it doesn't change
in the loop starting at the do_second label.

The computation in kvmppc_mmu_book3s_64_esid_to_vsid() changes so that
it returns a 256MB VSID even if the guest SLB entry is a 1TB entry.
This is because the users of this function are creating 256MB SLB
entries.  We set a new VSID_1T flag so that entries created from 1T
segments don't collide with entries from 256MB segments.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-06-30 03:33:22 +02:00
Benjamin Herrenschmidt
bc5ad3f370 KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller
This adds in-kernel emulation of the XICS (eXternal Interrupt
Controller Specification) interrupt controller specified by PAPR, for
both HV and PR KVM guests.

The XICS emulation supports up to 1048560 interrupt sources.
Interrupt source numbers below 16 are reserved; 0 is used to mean no
interrupt and 2 is used for IPIs.  Internally these are represented in
blocks of 1024, called ICS (interrupt controller source) entities, but
that is not visible to userspace.

Each vcpu gets one ICP (interrupt controller presentation) entity,
used to store the per-vcpu state such as vcpu priority, pending
interrupt state, IPI request, etc.

This does not include any API or any way to connect vcpus to their
ICP state; that will be added in later patches.

This is based on an initial implementation by Michael Ellerman
<michael@ellerman.id.au> reworked by Benjamin Herrenschmidt and
Paul Mackerras.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix typo, add dependency on !KVM_MPIC]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:30 +02:00
Paul Mackerras
c35635efdc KVM: PPC: Book3S HV: Report VPA and DTL modifications in dirty map
At present, the KVM_GET_DIRTY_LOG ioctl doesn't report modifications
done by the host to the virtual processor areas (VPAs) and dispatch
trace logs (DTLs) registered by the guest.  This is because those
modifications are done either in real mode or in the host kernel
context, and in neither case does the access go through the guest's
HPT, and thus no change (C) bit gets set in the guest's HPT.

However, the changes done by the host do need to be tracked so that
the modified pages get transferred when doing live migration.  In
order to track these modifications, this adds a dirty flag to the
struct representing the VPA/DTL areas, and arranges to set the flag
when the VPA/DTL gets modified by the host.  Then, when we are
collecting the dirty log, we also check the dirty flags for the
VPA and DTL for each vcpu and set the relevant bit in the dirty log
if necessary.  Doing this also means we now need to keep track of
the guest physical address of the VPA/DTL areas.

So as not to lose track of modifications to a VPA/DTL area when it gets
unregistered, or when a new area gets registered in its place, we need
to transfer the dirty state to the rmap chain.  This adds code to
kvmppc_unpin_guest_page() to do that if the area was dirty.  To simplify
that code, we now require that all VPA, DTL and SLB shadow buffer areas
fit within a single host page.  Guests already comply with this
requirement because pHyp requires that these areas not cross a 4k
boundary.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:13 +02:00
Bharat Bhushan
8c32a2ea65 Added ONE_REG interface for debug instruction
This patch adds the one_reg interface to get the special instruction
to be used for setting software breakpoint from userspace.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-17 15:21:14 +02:00
Paul Mackerras
b0a94d4e23 KVM: PPC: Book3S PR: Emulate PURR, SPURR and DSCR registers
This adds basic emulation of the PURR and SPURR registers.  We assume
we are emulating a single-threaded core, so these advance at the same
rate as the timebase.  A Linux kernel running on a POWER7 expects to
be able to access these registers and is not prepared to handle a
program interrupt on accessing them.

This also adds a very minimal emulation of the DSCR (data stream
control register).  Writes are ignored and reads return zero.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:01 +01:00
Paul Mackerras
6b445ad4f8 KVM: PPC: Book3S HV: Make a HPTE removal function available
This makes a HPTE removal function, kvmppc_do_h_remove(), available
outside book3s_hv_rm_mmu.c.  This will be used by the HPT writing
code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:33:55 +01:00
Paul Mackerras
7ed661bf85 KVM: PPC: Book3S HV: Restructure HPT entry creation code
This restructures the code that creates HPT (hashed page table)
entries so that it can be called in situations where we don't have a
struct vcpu pointer, only a struct kvm pointer.  It also fixes a bug
where kvmppc_map_vrma() would corrupt the guest R4 value.

Most of the work of kvmppc_virtmode_h_enter is now done by a new
function, kvmppc_virtmode_do_h_enter, which itself calls another new
function, kvmppc_do_h_enter, which contains most of the old
kvmppc_h_enter.  The new kvmppc_do_h_enter takes explicit arguments
for the place to return the HPTE index, the Linux page tables to use,
and whether it is being called in real mode, thus removing the need
for it to have the vcpu as an argument.

Currently kvmppc_map_vrma creates the VRMA (virtual real mode area)
HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily
to handle H_ENTER hcalls from the guest that need to pin a page of
memory.  Since H_ENTER returns the index of the created HPTE in R4,
kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4
in the case when it gets called from kvmppc_map_vrma on the first
VCPU_RUN ioctl.  With this, kvmppc_map_vrma instead calls
kvmppc_virtmode_do_h_enter with the address of a dummy word as the
place to store the HPTE index, thus avoiding corrupting the guest R4.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:33:52 +01:00
Marcelo Tosatti
19bf7f8ac3 Merge remote-tracking branch 'master' into queue
Merge reason: development work has dependency on kvm patches merged
upstream.

Conflicts:
	arch/powerpc/include/asm/Kbuild
	arch/powerpc/include/asm/kvm_para.h

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-29 19:15:32 -02:00
Paul Mackerras
dfe49dbd1f KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly
This adds an implementation of kvm_arch_flush_shadow_memslot for
Book3S HV, and arranges for kvmppc_core_commit_memory_region to
flush the dirty log when modifying an existing slot.  With this,
we can handle deletion and modification of memory slots.

kvm_arch_flush_shadow_memslot calls kvmppc_core_flush_memslot, which
on Book3S HV now traverses the reverse map chains to remove any HPT
(hashed page table) entries referring to pages in the memslot.  This
gets called by generic code whenever deleting a memslot or changing
the guest physical address for a memslot.

We flush the dirty log in kvmppc_core_commit_memory_region for
consistency with what x86 does.  We only need to flush when an
existing memslot is being modified, because for a new memslot the
rmap array (which stores the dirty bits) is all zero, meaning that
every page is considered clean already, and when deleting a memslot
we obviously don't care about the dirty bits any more.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:51 +02:00
Aneesh Kumar K.V
5524a27d39 powerpc/mm: Convert virtual address to vpn
This patch convert different functions to take virtual page number
instead of virtual address. Virtual page number is virtual address
shifted right by VPN_SHIFT (12) bits. This enable us to have an
address range of upto 76 bits.

Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-17 16:31:49 +10:00
Linus Torvalds
07acfc2a93 Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM changes from Avi Kivity:
 "Changes include additional instruction emulation, page-crossing MMIO,
  faster dirty logging, preventing the watchdog from killing a stopped
  guest, module autoload, a new MSI ABI, and some minor optimizations
  and fixes.  Outside x86 we have a small s390 and a very large ppc
  update.

  Regarding the new (for kvm) rebaseless workflow, some of the patches
  that were merged before we switch trees had to be rebased, while
  others are true pulls.  In either case the signoffs should be correct
  now."

Fix up trivial conflicts in Documentation/feature-removal-schedule.txt
arch/powerpc/kvm/book3s_segment.S and arch/x86/include/asm/kvm_para.h.

I suspect the kvm_para.h resolution ends up doing the "do I have cpuid"
check effectively twice (it was done differently in two different
commits), but better safe than sorry ;)

* 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (125 commits)
  KVM: make asm-generic/kvm_para.h have an ifdef __KERNEL__ block
  KVM: s390: onereg for timer related registers
  KVM: s390: epoch difference and TOD programmable field
  KVM: s390: KVM_GET/SET_ONEREG for s390
  KVM: s390: add capability indicating COW support
  KVM: Fix mmu_reload() clash with nested vmx event injection
  KVM: MMU: Don't use RCU for lockless shadow walking
  KVM: VMX: Optimize %ds, %es reload
  KVM: VMX: Fix %ds/%es clobber
  KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte()
  KVM: VMX: unlike vmcs on fail path
  KVM: PPC: Emulator: clean up SPR reads and writes
  KVM: PPC: Emulator: clean up instruction parsing
  kvm/powerpc: Add new ioctl to retreive server MMU infos
  kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM
  KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler
  KVM: PPC: Book3S: Enable IRQs during exit handling
  KVM: PPC: Fix PR KVM on POWER7 bare metal
  KVM: PPC: Fix stbux emulation
  KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields
  ...
2012-05-24 16:17:30 -07:00
Benjamin Herrenschmidt
ffe3649282 powerpc/kvm: Fix VSID usage in 64-bit "PR" KVM
The code forgot to scramble the VSIDs the way we normally do
and was basically using the "proto VSID" directly with the MMU.

This means that in practice, KVM used random VSIDs that could
collide with segments used by other user space programs.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: simplify ppc32 case]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-16 15:02:11 +02:00
Scott Wood
043cc4d724 KVM: PPC: factor out lpid allocator from book3s_64_mmu_hv
We'll use it on e500mc as well.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:51:02 +03:00
Alexander Graf
1022fc3d3b KVM: PPC: Add support for explicit HIOR setting
Until now, we always set HIOR based on the PVR, but this is just wrong.
Instead, we should be setting HIOR explicitly, so user space can decide
what the initial HIOR value is - just like on real hardware.

We keep the old PVR based way around for backwards compatibility, but
once user space uses the SET_ONE_REG based method, we drop the PVR logic.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:41 +02:00
Paul Mackerras
82ed36164c KVM: PPC: Book3s HV: Implement get_dirty_log using hardware changed bit
This changes the implementation of kvm_vm_ioctl_get_dirty_log() for
Book3s HV guests to use the hardware C (changed) bits in the guest
hashed page table.  Since this makes the implementation quite different
from the Book3s PR case, this moves the existing implementation from
book3s.c to book3s_pr.c and creates a new implementation in book3s_hv.c.
That implementation calls kvmppc_hv_get_dirty_log() to do the actual
work by calling kvm_test_clear_dirty on each page.  It iterates over
the HPTEs, clearing the C bit if set, and returns 1 if any C bit was
set (including the saved C bit in the rmap entry).

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:39 +02:00
Paul Mackerras
5551489373 KVM: PPC: Book3S HV: Use the hardware referenced bit for kvm_age_hva
This uses the host view of the hardware R (referenced) bit to speed
up kvm_age_hva() and kvm_test_age_hva().  Instead of removing all
the relevant HPTEs in kvm_age_hva(), we now just reset their R bits
if set.  Also, kvm_test_age_hva() now scans the relevant HPTEs to
see if any of them have R set.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:39 +02:00
Paul Mackerras
342d3db763 KVM: PPC: Implement MMU notifiers for Book3S HV guests
This adds the infrastructure to enable us to page out pages underneath
a Book3S HV guest, on processors that support virtualized partition
memory, that is, POWER7.  Instead of pinning all the guest's pages,
we now look in the host userspace Linux page tables to find the
mapping for a given guest page.  Then, if the userspace Linux PTE
gets invalidated, kvm_unmap_hva() gets called for that address, and
we replace all the guest HPTEs that refer to that page with absent
HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit
set, which will cause an HDSI when the guest tries to access them.
Finally, the page fault handler is extended to reinstantiate the
guest HPTE when the guest tries to access a page which has been paged
out.

Since we can't intercept the guest DSI and ISI interrupts on PPC970,
we still have to pin all the guest pages on PPC970.  We have a new flag,
kvm->arch.using_mmu_notifiers, that indicates whether we can page
guest pages out.  If it is not set, the MMU notifier callbacks do
nothing and everything operates as before.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:38 +02:00
Paul Mackerras
697d3899dc KVM: PPC: Implement MMIO emulation support for Book3S HV guests
This provides the low-level support for MMIO emulation in Book3S HV
guests.  When the guest tries to map a page which is not covered by
any memslot, that page is taken to be an MMIO emulation page.  Instead
of inserting a valid HPTE, we insert an HPTE that has the valid bit
clear but another hypervisor software-use bit set, which we call
HPTE_V_ABSENT, to indicate that this is an absent page.  An
absent page is treated much like a valid page as far as guest hcalls
(H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that
an absent HPTE doesn't need to be invalidated with tlbie since it
was never valid as far as the hardware is concerned.

When the guest accesses a page for which there is an absent HPTE, it
will take a hypervisor data storage interrupt (HDSI) since we now set
the VPM1 bit in the LPCR.  Our HDSI handler for HPTE-not-present faults
looks up the hash table and if it finds an absent HPTE mapping the
requested virtual address, will switch to kernel mode and handle the
fault in kvmppc_book3s_hv_page_fault(), which at present just calls
kvmppc_hv_emulate_mmio() to set up the MMIO emulation.

This is based on an earlier patch by Benjamin Herrenschmidt, but since
heavily reworked.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:37 +02:00
Paul Mackerras
c77162dee7 KVM: PPC: Only get pages when actually needed, not in prepare_memory_region()
This removes the code from kvmppc_core_prepare_memory_region() that
looked up the VMA for the region being added and called hva_to_page
to get the pfns for the memory.  We have no guarantee that there will
be anything mapped there at the time of the KVM_SET_USER_MEMORY_REGION
ioctl call; userspace can do that ioctl and then map memory into the
region later.

Instead we defer looking up the pfn for each memory page until it is
needed, which generally means when the guest does an H_ENTER hcall on
the page.  Since we can't call get_user_pages in real mode, if we don't
already have the pfn for the page, kvmppc_h_enter() will return
H_TOO_HARD and we then call kvmppc_virtmode_h_enter() once we get back
to kernel context.  That calls kvmppc_get_guest_page() to get the pfn
for the page, and then calls back to kvmppc_h_enter() to redo the HPTE
insertion.

When the first vcpu starts executing, we need to have the RMO or VRMA
region mapped so that the guest's real mode accesses will work.  Thus
we now have a check in kvmppc_vcpu_run() to see if the RMO/VRMA is set
up and if not, call kvmppc_hv_setup_rma().  It checks if the memslot
starting at guest physical 0 now has RMO memory mapped there; if so it
sets it up for the guest, otherwise on POWER7 it sets up the VRMA.
The function that does that, kvmppc_map_vrma, is now a bit simpler,
as it calls kvmppc_virtmode_h_enter instead of creating the HPTE itself.

Since we are now potentially updating entries in the slot_phys[]
arrays from multiple vcpu threads, we now have a spinlock protecting
those updates to ensure that we don't lose track of any references
to pages.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:36 +02:00
Paul Mackerras
93e602490c KVM: PPC: Add an interface for pinning guest pages in Book3s HV guests
This adds two new functions, kvmppc_pin_guest_page() and
kvmppc_unpin_guest_page(), and uses them to pin the guest pages where
the guest has registered areas of memory for the hypervisor to update,
(i.e. the per-cpu virtual processor areas, SLB shadow buffers and
dispatch trace logs) and then unpin them when they are no longer
required.

Although it is not strictly necessary to pin the pages at this point,
since all guest pages are already pinned, later commits in this series
will mean that guest pages aren't all pinned.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:36 +02:00
Alexander Graf
468a12c2b5 KVM: PPC: Use get/set for to_svcpu to help preemption
When running the 64-bit Book3s PR code without CONFIG_PREEMPT_NONE, we were
doing a few things wrong, most notably access to PACA fields without making
sure that the pointers stay stable accross the access (preempt_disable()).

This patch moves to_svcpu towards a get/put model which allows us to disable
preemption while accessing the shadow vcpu fields in the PACA. That way we
can run preemptible and everyone's happy!

Reported-by: Jörg Sommer <joerg@alea.gnuu.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:30 +02:00
Andreas Schwab
36cc66d638 KVM: PPC: move compute_tlbie_rb to book3s_64 common header
compute_tlbie_rb is only used on ppc64 and cannot be compiled on ppc32.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-12-26 13:28:00 +02:00
Alexander Graf
bb75c627fb Revert "KVM: PPC: Add support for explicit HIOR setting"
This reverts commit a15bd354f0.

It exceeded the padding on the SREGS struct, rendering the ABI
backwards-incompatible.

Conflicts:

	arch/powerpc/kvm/powerpc.c
	include/linux/kvm.h

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-11-17 16:30:25 +02:00
Paul Mackerras
0214394760 KVM: PPC: book3s_pr: Simplify transitions between virtual and real mode
This simplifies the way that the book3s_pr makes the transition to
real mode when entering the guest.  We now call kvmppc_entry_trampoline
(renamed from kvmppc_rmcall) in the base kernel using a normal function
call instead of doing an indirect call through a pointer in the vcpu.
If kvm is a module, the module loader takes care of generating a
trampoline as it does for other calls to functions outside the module.

kvmppc_entry_trampoline then disables interrupts and jumps to
kvmppc_handler_trampoline_enter in real mode using an rfi[d].
That then uses the link register as the address to return to
(potentially in module space) when the guest exits.

This also simplifies the way that we call the Linux interrupt handler
when we exit the guest due to an external, decrementer or performance
monitor interrupt.  Instead of turning on the MMU, then deciding that
we need to call the Linux handler and turning the MMU back off again,
we now go straight to the handler at the point where we would turn the
MMU on.  The handler will then return to the virtual-mode code
(potentially in the module).

Along the way, this moves the setting and clearing of the HID5 DCBZ32
bit into real-mode interrupts-off code, and also makes sure that
we clear the MSR[RI] bit before loading values into SRR0/1.

The net result is that we no longer need any code addresses to be
stored in vcpu->arch.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-09-25 19:52:29 +03:00
Alexander Graf
0254f07429 KVM: PPC: Add PAPR hypercall code for PR mode
When running a PAPR guest, we need to handle a few hypercalls in kernel space,
most prominently the page table invalidation (to sync the shadows).

So this patch adds handling for a few PAPR hypercalls to PR mode KVM. I tried
to share the code with HV mode, but it ended up being a lot easier this way
around, as the two differ too much in those details.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v1 -> v2:

  - whitespace fix
2011-09-25 19:52:24 +03:00
Alexander Graf
a15bd354f0 KVM: PPC: Add support for explicit HIOR setting
Until now, we always set HIOR based on the PVR, but this is just wrong.
Instead, we should be setting HIOR explicitly, so user space can decide
what the initial HIOR value is - just like on real hardware.

We keep the old PVR based way around for backwards compatibility, but
once user space uses the SREGS based method, we drop the PVR logic.

Signed-off-by: Alexander Graf <agraf@suse.de>
2011-09-25 19:52:23 +03:00
Alexander Graf
db507c300e KVM: PPC: move compute_tlbie_rb to book3s common header
We need the compute_tlbie_rb in _pr and _hv implementations for papr
soon, so let's move it over to a common header file that both
implementations can leverage.

Signed-off-by: Alexander Graf <agraf@suse.de>
2011-09-25 19:52:18 +03:00
Alexander Graf
29d03158f9 KVM: PPC: Remove prog_flags
Commit c8f729d408 (KVM: PPC: Deliver program interrupts right away instead
of queueing them) made away with all users of prog_flags, so we can just
remove it from the headers.

Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:17:00 +03:00
Paul Mackerras
aa04b4cc5b KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests
This adds infrastructure which will be needed to allow book3s_hv KVM to
run on older POWER processors, including PPC970, which don't support
the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
Offset (RMO) facility.  These processors require a physically
contiguous, aligned area of memory for each guest.  When the guest does
an access in real mode (MMU off), the address is compared against a
limit value, and if it is lower, the address is ORed with an offset
value (from the Real Mode Offset Register (RMOR)) and the result becomes
the real address for the access.  The size of the RMA has to be one of
a set of supported values, which usually includes 64MB, 128MB, 256MB
and some larger powers of 2.

Since we are unlikely to be able to allocate 64MB or more of physically
contiguous memory after the kernel has been running for a while, we
allocate a pool of RMAs at boot time using the bootmem allocator.  The
size and number of the RMAs can be set using the kvm_rma_size=xx and
kvm_rma_count=xx kernel command line options.

KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
of the pool of preallocated RMAs.  The capability value is 1 if the
processor can use an RMA but doesn't require one (because it supports
the VRMA facility), or 2 if the processor requires an RMA for each guest.

This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
pool and returns a file descriptor which can be used to map the RMA.  It
also returns the size of the RMA in the argument structure.

Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
ioctl calls from userspace.  To cope with this, we now preallocate the
kvm->arch.ram_pginfo array when the VM is created with a size sufficient
for up to 64GB of guest memory.  Subsequently we will get rid of this
array and use memory associated with each memslot instead.

This moves most of the code that translates the user addresses into
host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
to kvmppc_core_prepare_memory_region.  Also, instead of having to look
up the VMA for each page in order to check the page size, we now check
that the pages we get are compound pages of 16MB.  However, if we are
adding memory that is mapped to an RMA, we don't bother with calling
get_user_pages_fast and instead just offset from the base pfn for the
RMA.

Typically the RMA gets added after vcpus are created, which makes it
inconvenient to have the LPCR (logical partition control register) value
in the vcpu->arch struct, since the LPCR controls whether the processor
uses RMA or VRMA for the guest.  This moves the LPCR value into the
kvm->arch struct and arranges for the MER (mediated external request)
bit, which is the only bit that varies between vcpus, to be set in
assembly code when going into the guest if there is a pending external
interrupt request.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:57 +03:00
Paul Mackerras
de56a948b9 KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode.  Using hypervisor mode means
that the guest can use the processor's supervisor mode.  That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host.  This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.

This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses.  That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification.  In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.

Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.

This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.

With the guest running in supervisor mode, most exceptions go straight
to the guest.  We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest.  Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.

We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.

In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount.  Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.

The POWER7 processor has a restriction that all threads in a core have
to be in the same partition.  MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest.  At present we require the host and guest to run
in single-thread mode because of this hardware restriction.

This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA).  We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management.  This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.

This also adds a few new exports needed by the book3s_hv code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:54 +03:00
Paul Mackerras
f05ed4d56e KVM: PPC: Split out code from book3s.c into book3s_pr.c
In preparation for adding code to enable KVM to use hypervisor mode
on 64-bit Book 3S processors, this splits book3s.c into two files,
book3s.c and book3s_pr.c, where book3s_pr.c contains the code that is
specific to running the guest in problem state (user mode) and book3s.c
contains code which should apply to all Book 3S processors.

In doing this, we abstract some details, namely the interrupt offset,
updating the interrupt pending flag, and detecting if the guest is
in a critical section.  These are all things that will be different
when we use hypervisor mode.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:47 +03:00
Paul Mackerras
c4befc58a0 KVM: PPC: Move fields between struct kvm_vcpu_arch and kvmppc_vcpu_book3s
This moves the slb field, which represents the state of the emulated
SLB, from the kvmppc_vcpu_book3s struct to the kvm_vcpu_arch, and the
hpte_hash_[v]pte[_long] fields from kvm_vcpu_arch to kvmppc_vcpu_book3s.
This is in accord with the principle that the kvm_vcpu_arch struct
represents the state of the emulated CPU, and the kvmppc_vcpu_book3s
struct holds the auxiliary data structures used in the emulation.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:46 +03:00
Alexander Graf
a22a2daccf KVM: PPC: Resolve real-mode handlers through function exports
Up until now, Book3S KVM had variables stored in the kernel that a kernel module
or the kvm code in the kernel could read from to figure out where some real mode
helper functions are located.

This is all unnecessary. The high bits of the EA get ignore in real mode, so we
can just use the pointer as is. Also, it's a lot easier on relocations when we
use the normal way of resolving the address to a function, instead of jumping
through hoops.

This patch fixes compilation with CONFIG_RELOCATABLE=y.

Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:29 +03:00
Alexander Graf
8b6db3bc96 KVM: PPC: Implement correct SID mapping on Book3s_32
Up until now we were doing segment mappings wrong on Book3s_32. For Book3s_64
we were using a trick where we know that a single mmu_context gives us 16 bits
of context ids.

The mm system on Book3s_32 instead uses a clever algorithm to distribute VSIDs
across the available range, so a context id really only gives us 16 available
VSIDs.

To keep at least a few guest processes in the SID shadow, let's map a number of
contexts that we can use as VSID pool. This makes the code be actually correct
and shouldn't hurt performance too much.

Signed-off-by: Alexander Graf <agraf@suse.de>
2010-10-24 10:52:15 +02:00
Alexander Graf
df1bfa25d8 KVM: PPC: Put segment registers in shared page
Now that the actual mtsr doesn't do anything anymore, we can move the sr
contents over to the shared page, so a guest can directly read and write
its sr contents from guest context.

Signed-off-by: Alexander Graf <agraf@suse.de>
2010-10-24 10:52:11 +02:00
Alexander Graf
8e8651783f KVM: PPC: Interpret SR registers on demand
Right now we're examining the contents of Book3s_32's segment registers when
the register is written and put the interpreted contents into a struct.

There are two reasons this is bad. For starters, the struct has worse real-time
performance, as it occupies more ram. But the more important part is that with
segment registers being interpreted from their raw values, we can put them in
the shared page, allowing guests to mess with them directly.

This patch makes the internal representation of SRs be u32s.

Signed-off-by: Alexander Graf <agraf@suse.de>
2010-10-24 10:52:11 +02:00
Alexander Graf
2b05d71fef KVM: PPC: Make long relocations be ulong
On Book3S KVM we directly expose some asm pointers to C code as
variables. These need to be relocated and thus break on relocatable
kernels.

To make sure we can at least build, let's mark them as long instead
of u32 where 64bit relocations don't work.

This fixes the following build error:

WARNING: 2 bad relocations^M
> c000000000008590 R_PPC64_ADDR32    .text+0x4000000000008460^M
> c000000000008594 R_PPC64_ADDR32    .text+0x4000000000008598^M

Please keep in mind that actually using KVM on a relocated kernel
might still break. This only fixes the compile problem.

Reported-by: Subrata Modak <subrata@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:59 +02:00
Alexander Graf
e8508940a8 KVM: PPC: Magic Page Book3s support
We need to override EA as well as PA lookups for the magic page. When the guest
tells us to project it, the magic page overrides any guest mappings.

In order to reflect that, we need to hook into all the MMU layers of KVM to
force map the magic page if necessary.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:48 +02:00
Alexander Graf
d562de48de KVM: PPC: Convert DSISR to shared page
The DSISR register contains information about a data page fault. It is fully
read/write from inside the guest context and we don't need to worry about
interacting based on writes of this register.

This patch converts all users of the current field to the shared page.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:44 +02:00
Alexander Graf
fef093bec0 KVM: PPC: Make use of hash based Shadow MMU
We just introduced generic functions to handle shadow pages on PPC.
This patch makes the respective backends make use of them, getting
rid of a lot of duplicate code along the way.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:47:28 +03:00
Alexander Graf
a576f7a294 KVM: PPC: Remove obsolete kvmppc_mmu_find_pte
Initially we had to search for pte entries to invalidate them. Since
the logic has improved since then, we can just get rid of the search
function.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:47:12 +03:00
Alexander Graf
f7bc74e1c3 KVM: PPC: Improve split mode
When in split mode, instruction relocation and data relocation are not equal.

So far we implemented this mode by reserving a special pseudo-VSID for the
two cases and flushing all PTEs when going into split mode, which is slow.

Unfortunately 32bit Linux and Mac OS X use split mode extensively. So to not
slow down things too much, I came up with a different idea: Mark the split
mode with a bit in the VSID and then treat it like any other segment.

This means we can just flush the shadow segment cache, but keep the PTEs
intact. I verified that this works with ppc32 Linux and Mac OS X 10.4
guests and does speed them up.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:58 +03:00
Alexander Graf
af7b4d104b KVM: PPC: Convert u64 -> ulong
There are some pieces in the code that I overlooked that still use
u64s instead of longs. This slows down 32 bit hosts unnecessarily, so
let's just move them to ulong.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:55 +03:00
Alexander Graf
c7f38f46f2 KVM: PPC: Improve indirect svcpu accessors
We already have some inline fuctions we use to access vcpu or svcpu structs,
depending on whether we're on booke or book3s. Since we just put a few more
registers into the svcpu, we also need to make sure the respective callbacks
are available and get used.

So this patch moves direct use of the now in the svcpu struct fields to
inline function calls. While at it, it also moves the definition of those
inline function calls to respective header files for booke and book3s,
greatly improving readability.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:26 +03:00
Alexander Graf
2191d657c9 KVM: PPC: Name generic 64-bit code generic
We have quite some code that can be used by Book3S_32 and Book3S_64 alike,
so let's call it "Book3S" instead of "Book3S_64", so we can later on
use it from the 32 bit port too.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:14 +03:00
Alexander Graf
3ed9c6d2b5 KVM: PPC: Make bools bitfields
Bool defaults to at least byte width. We usually only want to waste a single
bit on this. So let's move all the bool values to bitfields, potentially
saving memory.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:20 +03:00
Alexander Graf
5a1b419fc9 KVM: PPC: Use ULL for big numbers
Some constants were bigger than ints. Let's mark them as such so we don't
accidently truncate them.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:18 +03:00
Alexander Graf
ad0a048b09 KVM: PPC: Add OSI hypercall interface
MOL uses its own hypercall interface to call back into userspace when
the guest wants to do something.

So let's implement that as an exit reason, specify it with a CAP and
only really use it when userspace wants us to.

The only user of it so far is MOL.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:10 +03:00
Alexander Graf
ca7f4203b9 KVM: PPC: Implement alignment interrupt
Mac OS X has some applications - namely the Finder - that require alignment
interrupts to work properly. So we need to implement them.

But the spec for 970 and 750 also looks different. While 750 requires the
DSISR and DAR fields to reflect some instruction bits (DSISR) and the fault
address (DAR), the 970 declares this as an optional feature. So we need
to reconstruct DSISR and DAR manually.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:07 +03:00
Alexander Graf
4b389ca2e7 KVM: PPC: Book3S_32 guest MMU fixes
This patch makes the VSID of mapped pages always reflecting all special cases
we have, like split mode.

It also changes the tlbie mask to 0x0ffff000 according to the spec. The mask
we used before was incorrect.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:54 +03:00
Alexander Graf
c8027f1652 KVM: PPC: Make DSISR 32 bits wide
DSISR is only defined as 32 bits wide. So let's reflect that in the
structs too.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:53 +03:00
Alexander Graf
3eeafd7da2 KVM: PPC: Ensure split mode works
On PowerPC we can go into MMU Split Mode. That means that either
data relocation is on but instruction relocation is off or vice
versa.

That mode didn't work properly, as we weren't always flushing
entries when going into a new split mode, potentially mapping
different code or data that we're supposed to.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:49 +03:00
Alexander Graf
831317b605 KVM: PPC: Implement Paired Single emulation
The one big thing about the Gekko is paired singles.

Paired singles are an extension to the instruction set, that adds 32 single
precision floating point registers (qprs), some SPRs to modify the behavior
of paired singled operations and instructions to deal with qprs to the
instruction set.

Unfortunately, it also changes semantics of existing operations that affect
single values in FPRs. In most cases they get mirrored to the coresponding
QPR.

Thanks to that we need to emulate all FPU operations and all the new paired
single operations too.

In order to achieve that, we use the just introduced FPU call helpers to
call the real FPU whenever the guest wants to modify an FPR. Additionally
we also fix up the QPR values along the way.

That way we can execute paired single FPU operations without implementing a
soft fpu.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:35:27 +03:00
Alexander Graf
aba3bd7ffe KVM: PPC: Make ext giveup non-static
We need to call the ext giveup handlers from code outside of book3s.c.
So let's make it non-static.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:35:12 +03:00
Alexander Graf
5467a97d0f KVM: PPC: Make software load/store return eaddr
The Book3S KVM implementation contains some helper functions to load and store
data from and to virtual addresses.

Unfortunately, this helper used to keep the physical address it so nicely
found out for us to itself. So let's change that and make it return the
physical address it resolved.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:35:09 +03:00
Alexander Graf
d6d549b207 KVM: PPC: Add Gekko SPRs
The Gekko has some SPR values that differ from other PPC core values and
also some additional ones.

Let's add support for them in our mfspr/mtspr emulator.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:34:53 +03:00
Alexander Graf
1c0006d8d1 KVM: PPC: Fix initial GPR settings
Commit 7d01b4c3ed2bb33ceaf2d270cb4831a67a76b51b introduced PACA backed vcpu
values. With this patch, when a userspace app was setting GPRs before it was
actually first loaded, the set values get discarded.

This is because vcpu_load loads them from the vcpu backing store that we use
whenever we're not owning the PACA.

That behavior is not really a major problem, because we don't need it for
qemu. Other users (like kvmctl) do have problems with it though, so let's
better do it right.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:55 -03:00
Alexander Graf
d5e528136c KVM: PPC: Add helper functions to call real mode loaders
Linux contains quite some bits of code to load FPU, Altivec and VSX lazily for
a task. It calls those bits in real mode, coming from an interrupt handler.

For KVM we better reuse those, so let's wrap a bit of trampoline magic around
them and then we can call them from normal module code.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:52 -03:00
Alexander Graf
4b5c9b7f9b KVM: PPC: Make large pages work
An SLB entry contains two pieces of information related to size:

  1) PTE size
  2) SLB size

The L bit defines the PTE be "large" (usually means 16MB),
SLB_VSID_B_1T defines that the SLB should span 1 GB instead of the
default 256MB.

Apparently I messed things up and just put those two in one box,
shaked it heavily and came up with the current code which handles
large pages incorrectly, because it also treats large page SLB entries
as "1TB" segment entries.

This patch splits those two features apart, making Linux guests boot
even when they have > 256MB.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:50 -03:00
Alexander Graf
25a8a02d26 KVM: PPC: Emulate trap SRR1 flags properly
Book3S needs some flags in SRR1 to get to know details about an interrupt.

One such example is the trap instruction. It tells the guest kernel that
a program interrupt is due to a trap using a bit in SRR1.

This patch implements above behavior, making WARN_ON behave like WARN_ON.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:49 -03:00
Alexander Graf
021ec9c69f KVM: PPC: Call SLB patching code in interrupt safe manner
Currently we're racy when doing the transition from IR=1 to IR=0, from
the module memory entry code to the real mode SLB switching code.

To work around that I took a look at the RTAS entry code which is faced
with a similar problem and did the same thing:

  A small helper in linear mapped memory that does mtmsr with IR=0 and
  then RFIs info the actual handler.

Thanks to that trick we can safely take page faults in the entry code
and only need to be really wary of what to do as of the SLB switching
part.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:49 -03:00
Alexander Graf
7e57cba060 KVM: PPC: Use PACA backed shadow vcpu
We're being horribly racy right now. All the entry and exit code hijacks
random fields from the PACA that could easily be used by different code in
case we get interrupted, for example by a #MC or even page fault.

After discussing this with Ben, we figured it's best to reserve some more
space in the PACA and just shove off some vcpu state to there.

That way we can drastically improve the readability of the code, make it
less racy and less complex.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:48 -03:00
Alexander Graf
e15a113700 powerpc/kvm: Sync guest visible MMU state
Currently userspace has no chance to find out which virtual address space we're
in and resolve addresses. While that is a big problem for migration, it's also
unpleasent when debugging, as gdb and the monitor don't work on virtual
addresses.

This patch exports enough of the MMU segment state to userspace to make
debugging work and thus also includes the groundwork for migration.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-08 16:02:50 +11:00
Alexander Graf
4e342025e6 Add asm/kvm_book3s.h
This adds the book3s specific header file that contains structs that
are only valid on book3s specific code.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:49:52 +11:00