2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

236 Commits

Author SHA1 Message Date
Michal Hocko
940519f0c8 mm, memory_hotplug: provide a more generic restrictions for memory hotplug
arch_add_memory, __add_pages take a want_memblock which controls whether
the newly added memory should get the sysfs memblock user API (e.g.
ZONE_DEVICE users do not want/need this interface).  Some callers even
want to control where do we allocate the memmap from by configuring
altmap.

Add a more generic hotplug context for arch_add_memory and __add_pages.
struct mhp_restrictions contains flags which contains additional features
to be enabled by the memory hotplug (MHP_MEMBLOCK_API currently) and
altmap for alternative memmap allocator.

This patch shouldn't introduce any functional change.

[akpm@linux-foundation.org: build fix]
Link: http://lkml.kernel.org/r/20190408082633.2864-3-osalvador@suse.de
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:49 -07:00
Christoph Hellwig
4afd58e14d initramfs: provide a generic free_initrd_mem implementation
For most architectures free_initrd_mem just expands to the same
free_reserved_area call.  Provide that as a generic implementation marked
__weak.

Link: http://lkml.kernel.org/r/20190213174621.29297-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Cc: Steven Price <steven.price@arm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:47 -07:00
Christophe Leroy
b4abe38fd6 powerpc/32: prepare shadow area for KASAN
This patch prepares a shadow area for KASAN.

The shadow area will be at the top of the kernel virtual
memory space above the fixmap area and will occupy one
eighth of the total kernel virtual memory space.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03 01:20:26 +10:00
Christophe Leroy
9d9f2cccde powerpc/mm: change #include "mmu_decl.h" to <mm/mmu_decl.h>
This patch make inclusion of mmu_decl.h independant of the location
of the file including it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-02 21:18:58 +10:00
Aneesh Kumar K.V
26ad26718d powerpc/mm: Fix section mismatch warning
This patch fix the below section mismatch warnings.

WARNING: vmlinux.o(.text+0x2d1f44): Section mismatch in reference from the function devm_memremap_pages_release() to the function .meminit.text:arch_remove_memory()
WARNING: vmlinux.o(.text+0x2d265c): Section mismatch in reference from the function devm_memremap_pages() to the function .meminit.text:arch_add_memory()

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-04-21 23:12:40 +10:00
Laurent Vivier
f172acbfae powerpc/mm: move warning from resize_hpt_for_hotplug()
resize_hpt_for_hotplug() reports a warning when it cannot
resize the hash page table ("Unable to resize hash page
table to target order") but in some cases it's not a problem
and can make user thinks something has not worked properly.

This patch moves the warning to arch_remove_memory() to
only report the problem when it is needed.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-04-20 22:02:26 +10:00
Michael Ellerman
d0055df0c9 Merge branch 'topic/dma' into next
Merge hch's big DMA rework series. This is in a topic branch in case he
wants to merge it to minimise conflicts.
2019-02-21 23:15:10 +11:00
Christoph Hellwig
31f940afda powerpc/dma: use the dma-direct allocator for coherent platforms
The generic code allows a few nice things such as node local allocations
and dipping into the CMA area.  The lookup of the right zone for a given
dma mask works a little different, but the results should be the same.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18 22:41:04 +11:00
Christophe Leroy
26b523356f powerpc: Drop page_is_ram() and walk_system_ram_range()
Since commit c40dd2f766 ("powerpc: Add System RAM to /proc/iomem")
it is possible to use the generic walk_system_ram_range() and
the generic page_is_ram().

To enable the use of walk_system_ram_range() by the IBM EHEA ethernet
driver, we still need an export of the generic function.

As powerpc was the only user of CONFIG_ARCH_HAS_WALK_MEMORY, the
ifdef around the generic walk_system_ram_range() has become useless
and can be dropped.

Fixes: c40dd2f766 ("powerpc: Add System RAM to /proc/iomem")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Keep the EXPORT_SYMBOL_GPL in powerpc code]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-04 21:22:06 +11:00
Oscar Salvador
2c2a5af6fe mm, memory_hotplug: add nid parameter to arch_remove_memory
Patch series "Do not touch pages in hot-remove path", v2.

This patchset aims for two things:

 1) A better definition about offline and hot-remove stage
 2) Solving bugs where we can access non-initialized pages
    during hot-remove operations [2] [3].

This is achieved by moving all page/zone handling to the offline
stage, so we do not need to access pages when hot-removing memory.

[1] https://patchwork.kernel.org/cover/10691415/
[2] https://patchwork.kernel.org/patch/10547445/
[3] https://www.spinics.net/lists/linux-mm/msg161316.html

This patch (of 5):

This is a preparation for the following-up patches.  The idea of passing
the nid is that it will allow us to get rid of the zone parameter
afterwards.

Link: http://lkml.kernel.org/r/20181127162005.15833-2-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:49 -08:00
Christoph Hellwig
25078dc1f7 powerpc: use mm zones more sensibly
Powerpc has somewhat odd usage where ZONE_DMA is used for all memory on
common 64-bit configfs, and ZONE_DMA32 is used for 31-bit schemes.

Move to a scheme closer to what other architectures use (and I dare to
say the intent of the system):

 - ZONE_DMA: optionally for memory < 31-bit (64-bit embedded only)
 - ZONE_NORMAL: everything addressable by the kernel
 - ZONE_HIGHMEM: memory > 32-bit for 32-bit kernels

Also provide information on how ZONE_DMA is used by defining
ARCH_ZONE_DMA_BITS.

Contains various fixes from Benjamin Herrenschmidt.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Christophe Leroy
5b3e84fc10 powerpc: change CONFIG_PPC_STD_MMU to CONFIG_PPC_BOOK3S
Today we have:

config PPC_BOOK3S
        def_bool y
        depends on PPC_BOOK3S_32 || PPC_BOOK3S_64

config PPC_STD_MMU
        def_bool y
        depends on PPC_BOOK3S

PPC_STD_MMU is therefore redundant with PPC_BOOK3S. Lets remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-11-26 22:33:37 +11:00
Mike Rapoport
57c8a661d9 mm: remove include/linux/bootmem.h
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.

The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>

@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>

[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
  Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:16 -07:00
Mike Rapoport
c6ffc5ca8f memblock: rename free_all_bootmem to memblock_free_all
The conversion is done using

sed -i 's@free_all_bootmem@memblock_free_all@' \
    $(git grep -l free_all_bootmem)

Link: http://lkml.kernel.org/r/1536927045-23536-26-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:16 -07:00
Christophe Leroy
34eb138ed7 powerpc/mm: don't use _PAGE_EXEC for calling hash_preload()
The 'access' parameter of hash_preload() is either 0 or _PAGE_EXEC.
Among the two versions of hash_preload(), only the PPC64 one is
doing something with this 'access' parameter.

In order to remove the use of _PAGE_EXEC outside platform code,
'access' parameter is replaced by 'is_exec' which will be either
true of false, and the PPC64 version of hash_preload() creates
the access flag based on 'is_exec'.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
c766ee7223 powerpc: handover page flags with a pgprot_t parameter
In order to avoid multiple conversions, handover directly a
pgprot_t to map_kernel_page() as already done for radix.

Do the same for __ioremap_caller() and __ioremap_at().

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Michael Neuling
51c3c62b58 powerpc: Avoid code patching freed init sections
This stops us from doing code patching in init sections after they've
been freed.

In this chain:
  kvm_guest_init() ->
    kvm_use_magic_page() ->
      fault_in_pages_readable() ->
	 __get_user() ->
	   __get_user_nocheck() ->
	     barrier_nospec();

We have a code patching location at barrier_nospec() and
kvm_guest_init() is an init function. This whole chain gets inlined,
so when we free the init section (hence kvm_guest_init()), this code
goes away and hence should no longer be patched.

We seen this as userspace memory corruption when using a memory
checker while doing partition migration testing on powervm (this
starts the code patching post migration via
/sys/kernel/mobility/migration). In theory, it could also happen when
using /sys/kernel/debug/powerpc/barrier_nospec.

Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-18 22:42:54 +10:00
Stefan Agner
d7dc899abe treewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAX
With PHYS_ADDR_MAX there is now a type safe variant for all bits set.
Make use of it.

Patch created using a semantic patch as follows:

// <smpl>
@@
typedef phys_addr_t;
@@
-(phys_addr_t)ULLONG_MAX
+PHYS_ADDR_MAX
// </smpl>

Link: http://lkml.kernel.org/r/20180419214204.19322-1-stefan@agner.ch
Signed-off-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15 07:55:25 +09:00
Nicholas Piggin
68662f85f3 powerpc/64s/radix: prefetch user address in update_mmu_cache
Prefetch the faulting address in update_mmu_cache to give the page
table walker perhaps 100 cycles head start as locks are dropped and
the interrupt completed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:35 +10:00
Balbir Singh
fb5924fddf powerpc/mm: Flush cache on memory hot(un)plug
This patch adds support for flushing potentially dirty cache lines
when memory is hot-plugged/hot-un-plugged. The support is currently
limited to 64 bit systems.

The bug was exposed when mappings for a device were actually
hot-unplugged and plugged in back later. A similar issue was observed
during the development of memtrace, but memtrace does it's own
flushing of region via a custom routine.

These patches do a flush both on hotplug/unplug to clear any stale
data in the cache w.r.t mappings, there is a small race window where a
clean cache line may be created again just prior to tearing down the
mapping.

The patches were tested by disabling the flush routines in memtrace
and doing I/O on the trace file. The system immediately
checkstops (quite reliablly if prior to the hot-unplug of the memtrace
region, we memset the regions we are about to hot unplug). After these
patches no custom flushing is needed in the memtrace code.

Fixes: 9d5171a8f2 ("powerpc/powernv: Enable removal of memory for in memory tracing")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24 09:46:56 +10:00
Jonathan Neuschäfer
f65e67c7e3 powerpc/mm: Use memblock API for PPC32 page_is_ram
To support accurate checking for different blocks of memory on PPC32,
use the same memblock-based approach that's already used on PPC64 also
on PPC32.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01 00:47:42 +11:00
Jonathan Neuschäfer
2615c93e5f powerpc/mm: Simplify page_is_ram by using memblock_is_memory
Instead of open-coding the search in page_is_ram, call memblock_is_memory.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01 00:47:42 +11:00
Michael Ellerman
f437c51748 Merge branch 'topic/paca' into next
Bring in yet another series that touches KVM code, and might need to
be merged into the kvm-ppc branch to resolve conflicts.

This required some changes in pnv_power9_force_smt4_catch/release()
due to the paca array becomming an array of pointers.
2018-03-31 09:09:36 +11:00
Nicholas Piggin
29ab6c4708 powerpc/mm: Pass node id into create_section_mapping
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move __map_kernel_page_nid() inside #ifdef SPARSEMEM_VMEMMAP]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:07:10 +11:00
Nicholas Piggin
9bd9be006c powerpc/mm/numa: move numa topology discovery earlier
Split sparsemem initialisation from basic numa topology discovery.
Move the parsing earlier in boot, before pacas are allocated.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:26 +11:00
Mauricio Faria de Oliveira
bde709a708 powerpc/mm: Fix section mismatch warning in stop_machine_change_mapping()
Fix the warning messages for stop_machine_change_mapping(), and a number
of other affected functions in its call chain.

All modified functions are under CONFIG_MEMORY_HOTPLUG, so __meminit
is okay (keeps them / does not discard them).

Boot-tested on powernv/power9/radix-mmu and pseries/power8/hash-mmu.

    $ make -j$(nproc) CONFIG_DEBUG_SECTION_MISMATCH=y vmlinux
    ...
      MODPOST vmlinux.o
    WARNING: vmlinux.o(.text+0x6b130): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping()
    The function stop_machine_change_mapping() references
    the function __meminit create_physical_mapping().
    This is often because stop_machine_change_mapping lacks a __meminit
    annotation or the annotation of create_physical_mapping is wrong.

    WARNING: vmlinux.o(.text+0x6b13c): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping()
    The function stop_machine_change_mapping() references
    the function __meminit create_physical_mapping().
    This is often because stop_machine_change_mapping lacks a __meminit
    annotation or the annotation of create_physical_mapping is wrong.
    ...

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27 23:44:55 +11:00
Linus Torvalds
3ff1b28caa libnvdimm for 4.16
* Require struct page by default for filesystem DAX to remove a number of
   surprising failure cases.  This includes failures with direct I/O, gdb and
   fork(2).
 
 * Add support for the new Platform Capabilities Structure added to the NFIT in
   ACPI 6.2a.  This new table tells us whether the platform supports flushing
   of CPU and memory controller caches on unexpected power loss events.
 
 * Revamp vmem_altmap and dev_pagemap handling to clean up code and better
   support future future PCI P2P uses.
 
 * Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has become
   out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL spec, and
   instead rely on the generic ND_CMD_CALL approach used by the two other IOCTL
   families, NVDIMM_FAMILY_{HPE,MSFT}.
 
 * Enhance nfit_test so we can test some of the new things added in version 1.6
   of the DSM specification.  This includes testing firmware download and
   simulating the Last Shutdown State (LSS) status.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaeOg0AAoJEJ/BjXdf9fLBAFoQAI/IgcgJ2h9lfEpgjBRTC44t
 2p8dxwT1Ofw3Y1aR/tI8nYRXjRtAGuP4UIeRVnb1CL/N7PagJyoMGU+6hmzg+ptY
 c7cEDvw6nZOhrFwXx/xn7R53sYG8zH+UE6+jTR/PP/G4mQJfFCg4iF9R72Y7z0n7
 aurf82Kz137NPUy6dNr4V9bmPMJWAaOci9WOj5SKddR5ZSNbjoxylTwQRvre5y4r
 7HQTScEkirABOdSf1JoXTSUXCH/RC9UFFXR03ScHstGb1HjCj3KdcicVc50Q++Ub
 qsEudhE6i44PEW1Hh4Qkg6hjHMEa8qHP+ShBuRuVaUmlghYTQn66niJAYLZilwdz
 EVjE7vR+toHA5g3YCalEmYVutUEhIDkh/xfpd7vM6ZorUGJy95a2elEJs2fHBffC
 gEhnCip7FROPcK5RDNUM8hBgnG/q5wwWPQMKY+6rKDZQx3mXssCrKp2Vlx7kBwMG
 rpblkEpYjPonbLEHxsSU8yTg9Uq55ciIWgnOToffcjZvjbihi8WUVlHcwHUMPf/o
 DWElg+4qmG0Sdd4S2NeAGwTl1Ewrf2RrtUGMjHtH4OUFs1wo6ZmfrxFzzMfoZ1Od
 ko/s65v4uwtTzECh2o+XQaNsReR5YETXxmA40N/Jpo7/7twABIoZ/ASvj/3ZBYj+
 sie+u2rTod8/gQWSfHpJ
 =MIMX
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Ross Zwisler:

 - Require struct page by default for filesystem DAX to remove a number
   of surprising failure cases. This includes failures with direct I/O,
   gdb and fork(2).

 - Add support for the new Platform Capabilities Structure added to the
   NFIT in ACPI 6.2a. This new table tells us whether the platform
   supports flushing of CPU and memory controller caches on unexpected
   power loss events.

 - Revamp vmem_altmap and dev_pagemap handling to clean up code and
   better support future future PCI P2P uses.

 - Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has
   become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL
   spec, and instead rely on the generic ND_CMD_CALL approach used by
   the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}.

 - Enhance nfit_test so we can test some of the new things added in
   version 1.6 of the DSM specification. This includes testing firmware
   download and simulating the Last Shutdown State (LSS) status.

* tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits)
  libnvdimm, namespace: remove redundant initialization of 'nd_mapping'
  acpi, nfit: fix register dimm error handling
  libnvdimm, namespace: make min namespace size 4K
  tools/testing/nvdimm: force nfit_test to depend on instrumented modules
  libnvdimm/nfit_test: adding support for unit testing enable LSS status
  libnvdimm/nfit_test: add firmware download emulation
  nfit-test: Add platform cap support from ACPI 6.2a to test
  libnvdimm: expose platform persistence attribute for nd_region
  acpi: nfit: add persistent memory control flag for nd_region
  acpi: nfit: Add support for detect platform CPU cache flush on power loss
  device-dax: Fix trailing semicolon
  libnvdimm, btt: fix uninitialized err_lock
  dax: require 'struct page' by default for filesystem dax
  ext2: auto disable dax instead of failing mount
  ext4: auto disable dax instead of failing mount
  mm, dax: introduce pfn_t_special()
  mm: Fix devm_memremap_pages() collision handling
  mm: Fix memory size alignment in devm_memremap_pages_release()
  memremap: merge find_dev_pagemap into get_dev_pagemap
  memremap: change devm_memremap_pages interface to use struct dev_pagemap
  ...
2018-02-06 10:41:33 -08:00
Christoph Hellwig
da024512a1 mm: pass the vmem_altmap to arch_remove_memory and __remove_pages
We can just pass this on instead of having to do a radix tree lookup
without proper locking 2 levels into the callchain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-01-08 11:46:23 -08:00
Christoph Hellwig
24e6d5a59a mm: pass the vmem_altmap to arch_add_memory and __add_pages
We can just pass this on instead of having to do a radix tree lookup
without proper locking 2 levels into the callchain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-01-08 11:46:23 -08:00
Joe Perches
f2c2cbcc35 powerpc: Use pr_warn instead of pr_warning
At some point, pr_warning will be removed so all logging messages use
a consistent <prefix>_warn style.

Update arch/powerpc/

Miscellanea:

o Coalesce formats
o Realign arguments
o Use %s, __func__ instead of embedded function names
o Remove unnecessary line continuations

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Geoff Levand <geoff@infradead.org>
[mpe: Rebase due to some %pOF changes.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-04 11:54:34 +11:00
Christophe Leroy
968159c003 powerpc/8xx: Getting rid of remaining use of CONFIG_8xx
Two config options exist to define powerpc MPC8xx:
* CONFIG_PPC_8xx
* CONFIG_8xx

arch/powerpc/platforms/Kconfig.cputype has contained the following
comment about CONFIG_8xx item for some years:
"# this is temp to handle compat with arch=ppc"

arch/powerpc is now the only place with remaining use of
CONFIG_8xx: get rid of them.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:12 +10:00
Michael Ellerman
029d9252b1 powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=y
Currently even with STRICT_KERNEL_RWX we leave the __init text marked
executable after init, which is bad.

Add a hook to mark it NX (no-execute) before we free it, and implement
it for radix and hash.

Note that we use __init_end as the end address, not _einittext,
because overlaps_kernel_text() uses __init_end, because there are
additional executable sections other than .init.text between
__init_begin and __init_end.

Tested on radix and hash with:

  0:mon> p $__init_begin
  *** 400 exception occurred

Fixes: 1e0fc9d1eb ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-18 19:54:24 +10:00
Linus Torvalds
d691b7e7d1 powerpc updates for 4.13
Highlights include:
 
  - Support for STRICT_KERNEL_RWX on 64-bit server CPUs.
 
  - Platform support for FSP2 (476fpe) board
 
  - Enable ZONE_DEVICE on 64-bit server CPUs.
 
  - Generic & powerpc spin loop primitives to optimise busy waiting
 
  - Convert VDSO update function to use new update_vsyscall() interface
 
  - Optimisations to hypercall/syscall/context-switch paths
 
  - Improvements to the CPU idle code on Power8 and Power9.
 
 As well as many other fixes and improvements.
 
 Thanks to:
   Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman Khandual, Anton
   Blanchard, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
   Lombard, Colin Ian King, Dan Carpenter, Gautham R. Shenoy, Hari Bathini, Ian
   Munsie, Ivan Mikhaylov, Javier Martinez Canillas, Madhavan Srinivasan,
   Masahiro Yamada, Matt Brown, Michael Neuling, Michal Suchanek, Murilo
   Opsfelder Araujo, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul
   Mackerras, Pavel Machek, Russell Currey, Santosh Sivaraj, Stephen Rothwell,
   Thiago Jung Bauermann, Yang Li.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZXyPCAAoJEFHr6jzI4aWAI9QQAISf2x5y//cqCi4ISyQB5pTq
 KLS/yQajNkQOw7c0fzBZOaH5Xd/SJ6AcKWDg8yDlpDR3+sRRsr98iIRECgKS5I7/
 DxD9ywcbSoMXFQQo1ZMCp5CeuMUIJRtugBnUQM+JXCSUCPbznCY5DchDTLyTBTpO
 MeMVhI//JxthhoOMA9MudiEGaYCU9ho442Z4OJUSiLUv8WRbvQX9pTqoc4vx1fxA
 BWf2mflztBVcIfKIyxIIIlDLukkMzix6gSYPMCbC7lzkbnU7JSqKiheJXjo1gJS2
 ePHKDxeNR2/QP0g/j3aT/MR1uTt9MaNBSX3gANE1xQ9OoJ8m1sOtCO4gNbSdLWae
 eXhDnoiEp930DRZOeEioOItuWWoxFaMyYk3BMmRKV4QNdYL3y3TRVeL2/XmRGqYL
 Lxz4IY/x5TteFEJNGcRX90uizNSi8AaEXPF16pUk8Ctt6eH3ZSwPMv2fHeYVCMr0
 KFlKHyaPEKEoztyzLcUR6u9QB56yxDN58bvLpd32AeHvKhqyxFoySy59x9bZbatn
 B2y8mmDItg860e0tIG6jrtplpOVvL8i5jla5RWFVoQDuxxrLAds3vG9JZQs+eRzx
 Fiic93bqeUAS6RzhXbJ6QQJYIyhE7yqpcgv7ME1W87SPef3HPBk9xlp3yIDwdA2z
 bBDvrRnvupusz8qCWrxe
 =w8rj
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Highlights include:

   - Support for STRICT_KERNEL_RWX on 64-bit server CPUs.

   - Platform support for FSP2 (476fpe) board

   - Enable ZONE_DEVICE on 64-bit server CPUs.

   - Generic & powerpc spin loop primitives to optimise busy waiting

   - Convert VDSO update function to use new update_vsyscall() interface

   - Optimisations to hypercall/syscall/context-switch paths

   - Improvements to the CPU idle code on Power8 and Power9.

  As well as many other fixes and improvements.

  Thanks to: Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman
  Khandual, Anton Blanchard, Balbir Singh, Benjamin Herrenschmidt,
  Christophe Leroy, Christophe Lombard, Colin Ian King, Dan Carpenter,
  Gautham R. Shenoy, Hari Bathini, Ian Munsie, Ivan Mikhaylov, Javier
  Martinez Canillas, Madhavan Srinivasan, Masahiro Yamada, Matt Brown,
  Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Naveen N.
  Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pavel Machek,
  Russell Currey, Santosh Sivaraj, Stephen Rothwell, Thiago Jung
  Bauermann, Yang Li"

* tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (158 commits)
  powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs
  powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix
  powerpc/mm/hash: Implement mark_rodata_ro() for hash
  powerpc/vmlinux.lds: Align __init_begin to 16M
  powerpc/lib/code-patching: Use alternate map for patch_instruction()
  powerpc/xmon: Add patch_instruction() support for xmon
  powerpc/kprobes/optprobes: Use patch_instruction()
  powerpc/kprobes: Move kprobes over to patch_instruction()
  powerpc/mm/radix: Fix execute permissions for interrupt_vectors
  powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp()
  powerpc/64s: Blacklist rtas entry/exit from kprobes
  powerpc/64s: Blacklist functions invoked on a trap
  powerpc/64s: Un-blacklist system_call() from kprobes
  powerpc/64s: Move system_call() symbol to just after setting MSR_EE
  powerpc/64s: Blacklist system_call() and system_call_common() from kprobes
  powerpc/64s: Convert .L__replay_interrupt_return to a local label
  powerpc64/elfv1: Only dereference function descriptor for non-text symbols
  cxl: Export library to support IBM XSL
  powerpc/dts: Use #include "..." to include local DT
  powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8
  ...
2017-07-07 13:55:45 -07:00
Michal Hocko
3d79a728f9 mm, memory_hotplug: replace for_device by want_memblock in arch_add_memory
arch_add_memory gets for_device argument which then controls whether we
want to create memblocks for created memory sections.  Simplify the
logic by telling whether we want memblocks directly rather than going
through pointless negation.  This also makes the api easier to
understand because it is clear what we want rather than nothing telling
for_device which can mean anything.

This shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170515085827.16474-13-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:32 -07:00
Michal Hocko
f1dd2cd13c mm, memory_hotplug: do not associate hotadded memory to zones until online
The current memory hotplug implementation relies on having all the
struct pages associate with a zone/node during the physical hotplug
phase (arch_add_memory->__add_pages->__add_section->__add_zone).  In the
vast majority of cases this means that they are added to ZONE_NORMAL.
This has been so since 9d99aaa31f ("[PATCH] x86_64: Support memory
hotadd without sparsemem") and it wasn't a big deal back then because
movable onlining didn't exist yet.

Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
onlining 511c2aba8f ("mm, memory-hotplug: dynamic configure movable
memory and portion memory") and then things got more complicated.
Rather than reconsidering the zone association which was no longer
needed (because the memory hotplug already depended on SPARSEMEM) a
convoluted semantic of zone shifting has been developed.  Only the
currently last memblock or the one adjacent to the zone_movable can be
onlined movable.  This essentially means that the online type changes as
the new memblocks are added.

Let's simulate memory hot online manually
  $ echo 0x100000000 > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory32/valid_zones
  Normal Movable

  $ echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable

  $ echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal
  /sys/devices/system/memory/memory34/valid_zones:Normal Movable

  $ echo online_movable > /sys/devices/system/memory/memory34/state
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Movable Normal

This is an awkward semantic because an udev event is sent as soon as the
block is onlined and an udev handler might want to online it based on
some policy (e.g.  association with a node) but it will inherently race
with new blocks showing up.

This patch changes the physical online phase to not associate pages with
any zone at all.  All the pages are just marked reserved and wait for
the onlining phase to be associated with the zone as per the online
request.  There are only two requirements

	- existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap

	- ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses

the latter one is not an inherent requirement and can be changed in the
future.  It preserves the current behavior and made the code slightly
simpler.  This is subject to change in future.

This means that the same physical online steps as above will lead to the
following state: Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Movable

Implementation:
The current move_pfn_range is reimplemented to check the above
requirements (allow_online_pfn_range) and then updates the respective
zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
pfn range with the zone/node.  __add_pages is updated to not require the
zone and only initializes sections in the range.  This allowed to
simplify the arch_add_memory code (s390 could get rid of quite some of
code).

devm_memremap_pages is the only user of arch_add_memory which relies on
the zone association because it only hooks into the memory hotplug only
half way.  It uses it to associate the new memory with ZONE_DEVICE but
doesn't allow it to be {on,off}lined via sysfs.  This means that this
particular code path has to call move_pfn_range_to_zone explicitly.

The original zone shifting code is kept in place and will be removed in
the follow up patch for an easier review.

Please note that this patch also changes the original behavior when
offlining a memory block adjacent to another zone (Normal vs.  Movable)
used to allow to change its movable type.  This will be handled later.

[richard.weiyang@gmail.com: simplify zone_intersects()]
  Link: http://lkml.kernel.org/r/20170616092335.5177-1-richard.weiyang@gmail.com
[richard.weiyang@gmail.com: remove duplicate call for set_page_links]
  Link: http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiyang@gmail.com
[akpm@linux-foundation.org: remove unused local `i']
Link: http://lkml.kernel.org/r/20170515085827.16474-12-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # For s390 bits
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:32 -07:00
Michal Hocko
1b862aecfb mm, memory_hotplug: get rid of is_zone_device_section
Device memory hotplug hooks into regular memory hotplug only half way.
It needs memory sections to track struct pages but there is no
need/desire to associate those sections with memory blocks and export
them to the userspace via sysfs because they cannot be onlined anyway.

This is currently expressed by for_device argument to arch_add_memory
which then makes sure to associate the given memory range with
ZONE_DEVICE.  register_new_memory then relies on is_zone_device_section
to distinguish special memory hotplug from the regular one.  While this
works now, later patches in this series want to move __add_zone outside
of arch_add_memory path so we have to come up with something else.

Add want_memblock down the __add_pages path and use it to control
whether the section->memblock association should be done.
arch_add_memory then just trivially want memblock for everything but
for_device hotplug.

remove_memory_section doesn't need is_zone_device_section either.  We
can simply skip all the memblock specific cleanup if there is no
memblock for the given section.

This shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170515085827.16474-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:32 -07:00
Oliver O'Halloran
b584c25440 powerpc/vmemmap: Add altmap support
Adds support to powerpc for the altmap feature of ZONE_DEVICE memory. An
altmap is a driver provided region that is used to provide the backing
storage for the struct pages of ZONE_DEVICE memory. In situations where
large amount of ZONE_DEVICE memory is being added to the system the
altmap reduces pressure on main system memory by allowing the mm/
metadata to be stored on the device itself rather in main memory.

Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:27 +10:00
Christophe Leroy
4386c096c2 powerpc/mm: Rename map_page() to map_kernel_page() on 32-bit
These two functions implement the same semantics, so unify their naming so we
can share code that calls them. The longer name is more descriptive so use it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 19:59:03 +10:00
David Gibson
438cc81a41 powerpc/pseries: Automatically resize HPT for memory hot add/remove
We've now implemented code in the pseries platform to use the new PAPR
interface to allow resizing the hash page table (HPT) at runtime.

This patch uses that interface to automatically attempt to resize the HPT
when memory is hot added or removed.  This tries to always keep the HPT at
a reasonable size for our current memory size.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:02 +11:00
Benjamin Herrenschmidt
fecbfabe1d powerpc: Fix build with CONFIG_MEMORY_HOTPLUG on some configs
For memory hotplug to work, the MMU code needs to provide the functions
create_section_mapping() and remove_section_mapping() to respectively
map and unmap portions of the linear mapping.

At the moment only hash64 provides these, so we provide weak stubs that
just error out. This fixes the build with configurations such as 64-bit
BookE with CONFIG_MEMORY_HOTPLUG enabled.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-07 16:33:27 +10:00
Oliver O'Halloran
3079abe555 powerpc/mm: Ensure "special" zones are empty
The mm zone mechanism was traditionally used by arch specific code to
partition memory into allocation zones. However there are several zones
that are managed by the mm subsystem rather than the architecture. Most
architectures set the max PFN of these special zones to zero, however on
powerpc we set them to ~0ul. This, in conjunction with a bug in
free_area_init_nodes() results in all of system memory being placed in
ZONE_DEVICE when enabled. Device memory cannot be used for regular kernel
memory allocations so this will cause a kernel panic at boot. Given the
planned addition of more mm managed zones (ZONE_CMA) we should aim to be
consistent with every other architecture and set the max PFN for these
zones to zero.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-16 16:03:21 +10:00
Gavin Shan
171cb719da powerpc/mm: Improve readability of update_mmu_cache()
The function is used to update the MMU with software PTE. It can
be called by data access exception handler (0x300) or instruction
access exception handler (0x400). If the function is called by
0x400 handler, the local variable @access is set to _PAGE_EXEC
to indicate the software PTE should have that flag set. When the
function is called by 0x300 handler, @access is set to zero.

This improves the readability of the function by replacing if
statements with switch. No logical changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:54:09 +10:00
Oliver O'Halloran
dd0b52c47a powerpc/mm: define TOP_ZONE as a constant
The zone that contains the top of memory will be either ZONE_NORMAL
or ZONE_HIGHMEM depending on the kernel config. There are two functions
that require this information and both of them use an #ifdef to set
a local variable (top_zone). This is a little silly so lets just make it
a constant.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Cc: linux-mm@kvack.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:54:08 +10:00
Aneesh Kumar K.V
a3dece6d69 powerpc/radix: Update MMU cache
With radix there is no MMU cache. Hence we don't need to do anything in
update_mmu_cache().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:53 +10:00
Linus Torvalds
d5e2d00898 powerpc updates for 4.6
Highlights:
  - Restructure Linux PTE on Book3S/64 to Radix format from Paul Mackerras
  - Book3s 64 MMU cleanup in preparation for Radix MMU from Aneesh Kumar K.V
  - Add POWER9 cputable entry from Michael Neuling
  - FPU/Altivec/VSX save/restore optimisations from Cyril Bur
  - Add support for new ftrace ABI on ppc64le from Torsten Duwe
 
 Various cleanups & minor fixes from:
  - Adam Buchbinder, Andrew Donnellan, Balbir Singh, Christophe Leroy, Cyril
    Bur, Luis Henriques, Madhavan Srinivasan, Pan Xinhui, Russell Currey,
    Sukadev Bhattiprolu, Suraj Jitindar Singh.
 
 General:
  - atomics: Allow architectures to define their own __atomic_op_* helpers from
    Boqun Feng
  - Implement atomic{, 64}_*_return_* variants and acquire/release/relaxed
    variants for (cmp)xchg from Boqun Feng
  - Add powernv_defconfig from Jeremy Kerr
  - Fix BUG_ON() reporting in real mode from Balbir Singh
  - Add xmon command to dump OPAL msglog from Andrew Donnellan
  - Add xmon command to dump process/task similar to ps(1) from Douglas Miller
  - Clean up memory hotplug failure paths from David Gibson
 
 pci/eeh:
  - Redesign SR-IOV on PowerNV to give absolute isolation between VFs from Wei
    Yang.
  - EEH Support for SRIOV VFs from Wei Yang and Gavin Shan.
  - PCI/IOV: Rename and export virtfn_{add, remove} from Wei Yang
  - PCI: Add pcibios_bus_add_device() weak function from Wei Yang
  - MAINTAINERS: Update EEH details and maintainership from Russell Currey
 
 cxl:
  - Support added to the CXL driver for running on both bare-metal and
    hypervisor systems, from Christophe Lombard and Frederic Barrat.
  - Ignore probes for virtual afu pci devices from Vaibhav Jain
 
 perf:
  - Export Power8 generic and cache events to sysfs from Sukadev Bhattiprolu
  - hv-24x7: Fix usage with chip events, display change in counter values,
    display domain indices in sysfs, eliminate domain suffix in event names,
    from Sukadev Bhattiprolu
 
 Freescale:
  - Updates from Scott: "Highlights include 8xx optimizations, 32-bit checksum
    optimizations, 86xx consolidation, e5500/e6500 cpu hotplug, more fman and
    other dt bits, and minor fixes/cleanup."
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW69OrAAoJEFHr6jzI4aWAe5EQAJw/hE6WBQc6a7Tj70AnXOqR
 qk/m5pZjuTwQxfBteIvHR1pE5eXdlvtAjcD254LVkFkAbIn19W/h2k0VX/nlee7P
 n/VRHRifjtGmukqHrPYJJ7ua9mNlY7pxh3leGSixBFASnSWqMxNNNziNQtSTcuCs
 TjHiw6NkZ/kzeunA4bAfE4yHVUZjmL74oiS9JbLyaVHqoW4fqWLlh26AKo2yYMZI
 qPicBBG4HBi3FGvoexnKxlJNdcV4HO7LzDjJmCSfUKYCJi+Pw19T5qmhso0q0qVz
 vHg/A8HNeG4Hn83pNVmLeQSAIQRZ3DvTtcLgbjPo+TVwm/hzrRRBWipTeOVbkLW8
 2bcOXT4t7LWUq15EAJ1LYgYZGzcLrfRfUeOcuQ1TWd3+PcfY9pE7FmizsxAAfaVe
 E9j9mpz4XnIqBtWkFHneTIHkQ5OWptyKuZJEaYH0nut4VsP0k8NarkseafGqBPu7
 5eG83gbiQbCVixfOgblV9eocJ29JcwpjPAY4CZSGJimShg909FV7WRgZgJkKWrbK
 dBRco8Jcp4VglGfo2qymv7Uj4KwQoypBREOhiKUvrAsVlDxPfx+bcskhjGu9xGDC
 xs/+nme0/lKa/wg5K4C3mQ1GAlkMWHI0ojhJjsyODbetup5UbkEu03wjAaTdO9dT
 Y6ptGm0rYAJluPNlziFj
 =qkAt
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "This was delayed a day or two by some build-breakage on old toolchains
  which we've now fixed.

  There's two PCI commits both acked by Bjorn.

  There's one commit to mm/hugepage.c which is (co)authored by Kirill.

  Highlights:
   - Restructure Linux PTE on Book3S/64 to Radix format from Paul
     Mackerras
   - Book3s 64 MMU cleanup in preparation for Radix MMU from Aneesh
     Kumar K.V
   - Add POWER9 cputable entry from Michael Neuling
   - FPU/Altivec/VSX save/restore optimisations from Cyril Bur
   - Add support for new ftrace ABI on ppc64le from Torsten Duwe

  Various cleanups & minor fixes from:
   - Adam Buchbinder, Andrew Donnellan, Balbir Singh, Christophe Leroy,
     Cyril Bur, Luis Henriques, Madhavan Srinivasan, Pan Xinhui, Russell
     Currey, Sukadev Bhattiprolu, Suraj Jitindar Singh.

  General:
   - atomics: Allow architectures to define their own __atomic_op_*
     helpers from Boqun Feng
   - Implement atomic{, 64}_*_return_* variants and acquire/release/
     relaxed variants for (cmp)xchg from Boqun Feng
   - Add powernv_defconfig from Jeremy Kerr
   - Fix BUG_ON() reporting in real mode from Balbir Singh
   - Add xmon command to dump OPAL msglog from Andrew Donnellan
   - Add xmon command to dump process/task similar to ps(1) from Douglas
     Miller
   - Clean up memory hotplug failure paths from David Gibson

  pci/eeh:
   - Redesign SR-IOV on PowerNV to give absolute isolation between VFs
     from Wei Yang.
   - EEH Support for SRIOV VFs from Wei Yang and Gavin Shan.
   - PCI/IOV: Rename and export virtfn_{add, remove} from Wei Yang
   - PCI: Add pcibios_bus_add_device() weak function from Wei Yang
   - MAINTAINERS: Update EEH details and maintainership from Russell
     Currey

  cxl:
   - Support added to the CXL driver for running on both bare-metal and
     hypervisor systems, from Christophe Lombard and Frederic Barrat.
   - Ignore probes for virtual afu pci devices from Vaibhav Jain

  perf:
   - Export Power8 generic and cache events to sysfs from Sukadev
     Bhattiprolu
   - hv-24x7: Fix usage with chip events, display change in counter
     values, display domain indices in sysfs, eliminate domain suffix in
     event names, from Sukadev Bhattiprolu

  Freescale:
   - Updates from Scott: "Highlights include 8xx optimizations, 32-bit
     checksum optimizations, 86xx consolidation, e5500/e6500 cpu
     hotplug, more fman and other dt bits, and minor fixes/cleanup"

* tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits)
  powerpc: Fix unrecoverable SLB miss during restore_math()
  powerpc/8xx: Fix do_mtspr_cpu6() build on older compilers
  powerpc/rcpm: Fix build break when SMP=n
  powerpc/book3e-64: Use hardcoded mttmr opcode
  powerpc/fsl/dts: Add "jedec,spi-nor" flash compatible
  powerpc/T104xRDB: add tdm riser card node to device tree
  powerpc32: PAGE_EXEC required for inittext
  powerpc/mpc85xx: Add pcsphy nodes to FManV3 device tree
  powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)
  powerpc/86xx: Introduce and use common dtsi
  powerpc/86xx: Update device tree
  powerpc/86xx: Move dts files to fsl directory
  powerpc/86xx: Switch to kconfig fragments approach
  powerpc/86xx: Update defconfigs
  powerpc/86xx: Consolidate common platform code
  powerpc32: Remove one insn in mulhdu
  powerpc32: small optimisation in flush_icache_range()
  powerpc: Simplify test in __dma_sync()
  powerpc32: move xxxxx_dcache_range() functions inline
  powerpc32: Remove clear_pages() and define clear_page() inline
  ...
2016-03-19 15:38:41 -07:00
David Gibson
1dace6c665 powerpc/mm: Clean up memory hotplug failure paths
This makes a number of cleanups to handling of mapping failures during
memory hotplug on Power:

For errors creating the linear mapping for the hot-added region:
  * This is now reported with EFAULT which is more appropriate than the
    previous EINVAL (the failure is unlikely to be related to the
    function's parameters)
  * An error in this path now prints a warning message, rather than just
    silently failing to add the extra memory.
  * Previously a failure here could result in the region being partially
    mapped.  We now clean up any partial mapping before failing.

For errors creating the vmemmap for the hot-added region:
   * This is now reported with EFAULT instead of causing a BUG() - this
     could happen for external reason (e.g. full hash table) so it's better
     to handle this non-fatally
   * An error message is also printed, so the failure won't be silent
   * As above a failure could cause a partially mapped region, we now
     clean this up. [mpe: move htab_remove_mapping() out of #ifdef
     CONFIG_MEMORY_HOTPLUG to enable this]

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-01 22:04:18 +11:00
Toshi Kani
35d98e93fe arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM
Set IORESOURCE_SYSTEM_RAM in flags of resource ranges with
"System RAM", "Kernel code", "Kernel data", and "Kernel bss".

Note that:

 - IORESOURCE_SYSRAM (i.e. modifier bit) is set in flags when
   IORESOURCE_MEM is already set. IORESOURCE_SYSTEM_RAM is defined
   as (IORESOURCE_MEM|IORESOURCE_SYSRAM).

 - Some archs do not set 'flags' for children nodes, such as
   "Kernel code".  This patch does not change 'flags' in this
   case.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux-mm <linux-mm@kvack.org>
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1453841853-11383-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 09:49:57 +01:00
Vasant Hegde
e256caa7d0 powerpc/mm: Allow user space to map rtas_rmo_buf
With commit 90a545e9 (restrict /dev/mem to idle io memory ranges) mapping
rtas_rmo_buf from user space is failing. Hence we are not able to make
RTAS syscall.

This patch calls page_is_rtas_user_buf before calling iomem_is_exclusive
in  devmem_is_allowed(). This will allow user space to map rtas_rmo_buf
and we are able to make RTAS syscall.

Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
CC: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-25 16:31:13 +11:00
Linus Torvalds
12f03ee606 libnvdimm for 4.3:
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
    mechanism for adding device-driver-discovered memory regions to the
    kernel's direct map.  This facility is used by the pmem driver to
    enable pfn_to_page() operations on the page frames returned by DAX
    ('direct_access' in 'struct block_device_operations'). For now, the
    'memmap' allocation for these "device" pages comes from "System
    RAM".  Support for allocating the memmap from device memory will
    arrive in a later kernel.
 
 2/ Introduce memremap() to replace usages of ioremap_cache() and
    ioremap_wt().  memremap() drops the __iomem annotation for these
    mappings to memory that do not have i/o side effects.  The
    replacement of ioremap_cache() with memremap() is limited to the
    pmem driver to ease merging the api change in v4.3.  Completion of
    the conversion is targeted for v4.4.
 
 3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
    driver, update the VFS DAX implementation and PMEM api to provide
    persistence guarantees for kernel operations on a DAX mapping.
 
 4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
    cacheable to improve performance.
 
 5/ Miscellaneous updates and fixes to libnvdimm including support
    for issuing "address range scrub" commands, clarifying the optimal
    'sector size' of pmem devices, a clarification of the usage of the
    ACPI '_STA' (status) property for DIMM devices, and other minor
    fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
 JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
 OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
 nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
 NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
 zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
 1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
 sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
 bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
 o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
 dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
 slsw6DkrWT60CRE42nbK
 =o57/
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "This update has successfully completed a 0day-kbuild run and has
  appeared in a linux-next release.  The changes outside of the typical
  drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
  removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
  the introduction of ZONE_DEVICE + devm_memremap_pages().

  Summary:

   - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
     mechanism for adding device-driver-discovered memory regions to the
     kernel's direct map.

     This facility is used by the pmem driver to enable pfn_to_page()
     operations on the page frames returned by DAX ('direct_access' in
     'struct block_device_operations').

     For now, the 'memmap' allocation for these "device" pages comes
     from "System RAM".  Support for allocating the memmap from device
     memory will arrive in a later kernel.

   - Introduce memremap() to replace usages of ioremap_cache() and
     ioremap_wt().  memremap() drops the __iomem annotation for these
     mappings to memory that do not have i/o side effects.  The
     replacement of ioremap_cache() with memremap() is limited to the
     pmem driver to ease merging the api change in v4.3.

     Completion of the conversion is targeted for v4.4.

   - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
     driver, update the VFS DAX implementation and PMEM api to provide
     persistence guarantees for kernel operations on a DAX mapping.

   - Convert the ACPI NFIT 'BLK' driver to map the block apertures as
     cacheable to improve performance.

   - Miscellaneous updates and fixes to libnvdimm including support for
     issuing "address range scrub" commands, clarifying the optimal
     'sector size' of pmem devices, a clarification of the usage of the
     ACPI '_STA' (status) property for DIMM devices, and other minor
     fixes"

* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
  libnvdimm, pmem: direct map legacy pmem by default
  libnvdimm, pmem: 'struct page' for pmem
  libnvdimm, pfn: 'struct page' provider infrastructure
  x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
  add devm_memremap_pages
  mm: ZONE_DEVICE for "device memory"
  mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
  dax: drop size parameter to ->direct_access()
  nd_blk: change aperture mapping from WC to WB
  nvdimm: change to use generic kvfree()
  pmem, dax: have direct_access use __pmem annotation
  dax: update I/O path to do proper PMEM flushing
  pmem: add copy_from_iter_pmem() and clear_pmem()
  pmem, x86: clean up conditional pmem includes
  pmem: remove layer when calling arch_has_wmb_pmem()
  pmem, x86: move x86 PMEM API to new pmem.h header
  libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
  pmem: switch to devm_ allocations
  devres: add devm_memremap
  libnvdimm, btt: write and validate parent_uuid
  ...
2015-09-08 14:35:59 -07:00
Dan Williams
033fbae988 mm: ZONE_DEVICE for "device memory"
While pmem is usable as a block device or via DAX mappings to userspace
there are several usage scenarios that can not target pmem due to its
lack of struct page coverage. In preparation for "hot plugging" pmem
into the vmemmap add ZONE_DEVICE as a new zone to tag these pages
separately from the ones that are subject to standard page allocations.
Importantly "device memory" can be removed at will by userspace
unbinding the driver of the device.

Having a separate zone prevents allocation and otherwise marks these
pages that are distinct from typical uniform memory.  Device memory has
different lifetime and performance characteristics than RAM.  However,
since we have run out of ZONES_SHIFT bits this functionality currently
depends on sacrificing ZONE_DMA.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Jerome Glisse <j.glisse@gmail.com>
[hch: various simplifications in the arch interface]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-27 19:40:58 -04:00
Scott Wood
2f7d2b74a9 powerpc/mm: Don't call __flush_dcache_icache_phys() with PA>VA
__flush_dcache_icache_phys() requires the ability to access the
memory with the MMU disabled, which means that on a 32-bit system
any memory above 4 GiB is inaccessible.  In particular, mpc86xx is
32-bit and can have more than 4 GiB of RAM.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 22:59:20 -05:00
Scott Wood
6c0cc62715 powerpc/mm: Use PFN_PHYS() in devmem_is_allowed()
This function can run on systems where physical addresses don't
fit in unsigned long, so make sure to use the macro that contains the
proper cast.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-06-02 21:37:23 -05:00
Michael Ellerman
f691fa1080 powerpc: Replace mem_init_done with slab_is_available()
We have a powerpc specific global called mem_init_done which is "set on
boot once kmalloc can be called".

But that's not *quite* true. We set it at the bottom of mem_init(), and
rely on the fact that mm_init() calls kmem_cache_init() immediately
after that, and nothing is running in parallel.

So replace it with the generic and 100% correct slab_is_available().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-10 20:02:48 +10:00
Anton Blanchard
21098b9e07 powerpc: Move sparse_init() into initmem_init
We did part of sparse initialisation in setup_arch and part in
initmem_init. Put them together.

Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-10 09:59:26 +11:00
Anton Blanchard
10239733ee powerpc: Remove bootmem allocator
At the moment we transition from the memblock alloctor to the bootmem
allocator. Gitting rid of the bootmem allocator removes a bunch of
complicated code (most of which I owe the dubious honour of being
responsible for writing).

Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-10 09:59:25 +11:00
Anton Blanchard
16d0f5c4af powerpc: Remove ppc_md.remove_memory
We have an extra level of indirection on memory hot remove which is not
matched on memory hot add. Memory hotplug is book3s only, so there is
no need for it.

This also enables means remove_memory() (ie memory hot unplug) works
on powernv.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-05 21:00:46 +11:00
Michael Ellerman
75d43b2d0a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git
Freescale updates from Scott (27 commits):

  "Highlights include DMA32 zone support (SATA, USB, etc now works on 64-bit
   FSL kernels), MSI changes, 8xx optimizations and cleanup, t104x board
   support, and PrPMC PCI enumeration."
2014-10-04 08:59:06 +10:00
Scott Wood
6db35ad237 powerpc/mm: Use common paging_init() for NUMA
Commit 1c98025c6c "powerpc: Dynamic DMA
zone limits" updated how zones are created in paging_init(), but missed
the NUMA version of paging_init().  This was noticed via a linker
error, since dma_pfn_limit_to_zone() was, like the non-NUMA
paging_init(), limited by #ifndef CONFIG_NEED_MULTIPLE_NODES.

It turns out that the NUMA paging_init() was not actually doing
anything different from the standard paging_init(), other than a couple
debug prints, a couple 32-bit-only ifdef sections, and a call to
mark_nonram_nosave().  It's not clear whether mark_nonram_nosave() is
inherently wrong to do for NUMA, or just not useful on targets that
have NUMA, but for now I'm preserving the existing behavior.

Fixes: 1c98025c6c "powerpc: Dynamic DMA zone limits"
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-09-19 15:01:05 -05:00
Scott Wood
1c98025c6c powerpc: Dynamic DMA zone limits
Platform code can call limit_zone_pfn() to set appropriate limits
for ZONE_DMA and ZONE_DMA32, and dma_direct_alloc_coherent() will
select a suitable zone based on a device's mask and the pfn limits that
platform code has configured.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
2014-09-03 17:58:21 -05:00
Wang Nan
f51202de8f memory-hotplug: ppc: suitable memory should go to ZONE_MOVABLE
This patch introduces zone_for_memory() to arch_add_memory() on powerpc
to ensure new, higher memory added into ZONE_MOVABLE if movable zone has
already setup.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Mel Gorman" <mgorman@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:21 -07:00
Nathan Fontenot
9ac8cde938 powerpc/pseries: Use remove_memory() to remove memory
The memory remove code for powerpc/pseries should call remove_memory()
so that we are holding the hotplug_memory lock during memory remove
operations.

This patch updates the memory node remove handler to call remove_memory()
and adds a ppc_md.remove_memory() entry to handle pseries specific work
that is called from arch_remove_memory().

During memory remove in pseries_remove_memblock() we have to stay with
removing memory one section at a time. This is needed because of how memory
resources are handled. During memory add for pseries (via the probe file in
sysfs) we add memory one section at a time which gives us a memory resource
for each section. Future patches will aim to address this so will not have
to remove memory one section at a time.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:53:13 +11:00
Linus Torvalds
1b17366d69 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
 "So here's my next branch for powerpc.  A bit late as I was on vacation
  last week.  It's mostly the same stuff that was in next already, I
  just added two patches today which are the wiring up of lockref for
  powerpc, which for some reason fell through the cracks last time and
  is trivial.

  The highlights are, in addition to a bunch of bug fixes:

   - Reworked Machine Check handling on kernels running without a
     hypervisor (or acting as a hypervisor).  Provides hooks to handle
     some errors in real mode such as TLB errors, handle SLB errors,
     etc...

   - Support for retrieving memory error information from the service
     processor on IBM servers running without a hypervisor and routing
     them to the memory poison infrastructure.

   - _PAGE_NUMA support on server processors

   - 32-bit BookE relocatable kernel support

   - FSL e6500 hardware tablewalk support

   - A bunch of new/revived board support

   - FSL e6500 deeper idle states and altivec powerdown support

  You'll notice a generic mm change here, it has been acked by the
  relevant authorities and is a pre-req for our _PAGE_NUMA support"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (121 commits)
  powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked()
  powerpc: Add support for the optimised lockref implementation
  powerpc/powernv: Call OPAL sync before kexec'ing
  powerpc/eeh: Escalate error on non-existing PE
  powerpc/eeh: Handle multiple EEH errors
  powerpc: Fix transactional FP/VMX/VSX unavailable handlers
  powerpc: Don't corrupt transactional state when using FP/VMX in kernel
  powerpc: Reclaim two unused thread_info flag bits
  powerpc: Fix races with irq_work
  Move precessing of MCE queued event out from syscall exit path.
  pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines
  powerpc: Make add_system_ram_resources() __init
  powerpc: add SATA_MV to ppc64_defconfig
  powerpc/powernv: Increase candidate fw image size
  powerpc: Add debug checks to catch invalid cpu-to-node mappings
  powerpc: Fix the setup of CPU-to-Node mappings during CPU online
  powerpc/iommu: Don't detach device without IOMMU group
  powerpc/eeh: Hotplug improvement
  powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config space
  powerpc/eeh: Add restore_config operation
  ...
2014-01-27 21:11:26 -08:00
Tang Chen
e7e8de5918 memblock: make memblock_set_node() support different memblock_type
[sfr@canb.auug.org.au: fix powerpc build]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
Cc: Chen Tang <imtangchen@gmail.com>
Cc: Gong Chen <gong.chen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Liu Jiang <jiang.liu@huawei.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:44 -08:00
Benjamin Herrenschmidt
fac515db45 Merge remote-tracking branch 'scott/next' into next
Freescale updates from Scott:

<<
Highlights include 32-bit booke relocatable support, e6500 hardware
tablewalk support, various e500 SPE fixes, some new/revived boards, and
e6500 deeper idle and altivec powerdown modes.
>>
2014-01-15 14:22:35 +11:00
Geert Uytterhoeven
4f7709248d powerpc: Make add_system_ram_resources() __init
add_system_ram_resources() is a subsys_initcall.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:58:52 +11:00
Scott Wood
28efc35fe6 powerpc/e6500: TLB miss handler with hardware tablewalk support
There are a few things that make the existing hw tablewalk handlers
unsuitable for e6500:

 - Indirect entries go in TLB1 (though the resulting direct entries go in
   TLB0).

 - It has threads, but no "tlbsrx." -- so we need a spinlock and
   a normal "tlbsx".  Because we need this lock, hardware tablewalk
   is mandatory on e6500 unless we want to add spinlock+tlbsx to
   the normal bolted TLB miss handler.

 - TLB1 has no HES (nor next-victim hint) so we need software round robin
   (TODO: integrate this round robin data with hugetlb/KVM)

 - The existing tablewalk handlers map half of a page table at a time,
   because IBM hardware has a fixed 1MiB indirect page size.  e6500
   has variable size indirect entries, with a minimum of 2MiB.
   So we can't do the half-page indirect mapping, and even if we
   could it would be less efficient than mapping the full page.

 - Like on e5500, the linear mapping is bolted, so we don't need the
   overhead of supporting nested tlb misses.

Note that hardware tablewalk does not work in rev1 of e6500.
We do not expect to support e6500 rev1 in mainline Linux.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
2014-01-09 17:52:19 -06:00
Nathan Fontenot
f7e3334a6b powerpc: Fix memory hotplug with sparse vmemmap
Previous commit 46723bfa540... introduced a new config option
HAVE_BOOTMEM_INFO_NODE that ended up breaking memory hot-remove for ppc
when sparse vmemmap is not defined.

This patch defines HAVE_BOOTMEM_INFO_NODE for ppc and adds the call to
register_page_bootmem_info_node. Without this we get a BUG_ON for memory
hot remove in put_page_bootmem().

This also adds a stub for register_page_bootmem_memmap to allow ppc to build
with sparse vmemmap defined. Leaving this as a stub is fine since the same
vmemmap addresses are also handled in vmemmap_populate and as such are
properly mapped.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.9+]
2013-10-03 17:21:38 +10:00
Nathan Fontenot
f748edafac powerpc/mm: Mark Memory Resources as busy
Memory I/O resources need to be marked as busy or else we cannot remove
them when doing memory hot remove.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-27 14:35:11 +10:00
Linus Torvalds
65b97fb730 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
 "This is the powerpc changes for the 3.11 merge window.  In addition to
  the usual bug fixes and small updates, the main highlights are:

   - Support for transparent huge pages by Aneesh Kumar for 64-bit
     server processors.  This allows the use of 16M pages as transparent
     huge pages on kernels compiled with a 64K base page size.

   - Base VFIO support for KVM on power by Alexey Kardashevskiy

   - Wiring up of our nvram to the pstore infrastructure, including
     putting compressed oopses in there by Aruna Balakrishnaiah

   - Move, rework and improve our "EEH" (basically PCI error handling
     and recovery) infrastructure.  It is no longer specific to pseries
     but is now usable by the new "powernv" platform as well (no
     hypervisor) by Gavin Shan.

   - I fixed some bugs in our math-emu instruction decoding and made it
     usable to emulate some optional FP instructions on processors with
     hard FP that lack them (such as fsqrt on Freescale embedded
     processors).

   - Support for Power8 "Event Based Branch" facility by Michael
     Ellerman.  This facility allows what is basically "userspace
     interrupts" for performance monitor events.

   - A bunch of Transactional Memory vs.  Signals bug fixes and HW
     breakpoint/watchpoint fixes by Michael Neuling.

  And more ...  I appologize in advance if I've failed to highlight
  something that somebody deemed worth it."

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
  pstore: Add hsize argument in write_buf call of pstore_ftrace_call
  powerpc/fsl: add MPIC timer wakeup support
  powerpc/mpic: create mpic subsystem object
  powerpc/mpic: add global timer support
  powerpc/mpic: add irq_set_wake support
  powerpc/85xx: enable coreint for all the 64bit boards
  powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx
  powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig
  powerpc/mpic: Add get_version API both for internal and external use
  powerpc: Handle both new style and old style reserve maps
  powerpc/hw_brk: Fix off by one error when validating DAWR region end
  powerpc/pseries: Support compression of oops text via pstore
  powerpc/pseries: Re-organise the oops compression code
  pstore: Pass header size in the pstore write callback
  powerpc/powernv: Fix iommu initialization again
  powerpc/pseries: Inform the hypervisor we are using EBB regs
  powerpc/perf: Add power8 EBB support
  powerpc/perf: Core EBB support for 64-bit book3s
  powerpc/perf: Drop MMCRA from thread_struct
  powerpc/perf: Don't enable if we have zero events
  ...
2013-07-04 10:29:23 -07:00
Jiang Liu
602ddc70ec mm/PPC: prepare for killing free_all_bootmem_node()
Prepare for killing free_all_bootmem_node() by using free_all_bootmem().

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Alexander Graf <agraf@suse.de>
Cc: "Suzuki K. Poulose" <suzuki@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:39 -07:00
Jiang Liu
369a9d8523 mm/ppc: prepare for removing num_physpages and simplify mem_init()
Prepare for removing num_physpages and simplify mem_init().

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:37 -07:00
Jiang Liu
0c98853473 mm: concentrate modification of totalram_pages into the mm core
Concentrate code to modify totalram_pages into the mm core, so the arch
memory initialized code doesn't need to take care of it.  With these
changes applied, only following functions from mm core modify global
variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
free_all_bootmem_node(), adjust_managed_page_count().

With this patch applied, it will be much more easier for us to keep
totalram_pages and zone->managed_pages in consistence.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: <sworddragon2@aol.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:33 -07:00
Jiang Liu
dbe67df4ba mm: enhance free_reserved_area() to support poisoning memory with zero
Address more review comments from last round of code review.
1) Enhance free_reserved_area() to support poisoning freed memory with
   pattern '0'. This could be used to get rid of poison_init_mem()
   on ARM64.
2) A previous patch has disabled memory poison for initmem on s390
   by mistake, so restore to the original behavior.
3) Remove redundant PAGE_ALIGN() when calling free_reserved_area().

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: <sworddragon2@aol.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:32 -07:00
Jiang Liu
11199692d8 mm: change signature of free_reserved_area() to fix building warnings
Change signature of free_reserved_area() according to Russell King's
suggestion to fix following build warnings:

  arch/arm/mm/init.c: In function 'mem_init':
  arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default]
    free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
    ^
  In file included from include/linux/mman.h:4:0,
                   from arch/arm/mm/init.c:15:
  include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *'
   extern unsigned long free_reserved_area(unsigned long start, unsigned long end,

   mm/page_alloc.c: In function 'free_reserved_area':
>> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default]
   In file included from arch/mips/include/asm/page.h:49:0,
                    from include/linux/mmzone.h:20,
                    from include/linux/gfp.h:4,
                    from include/linux/mm.h:8,
                    from mm/page_alloc.c:18:
   arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int'
   mm/page_alloc.c: In function 'free_area_init_nodes':
   mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds]

Also address some minor code review comments.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: <sworddragon2@aol.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:32 -07:00
Aneesh Kumar K.V
0ac52dd766 powerpc: Make linux pagetable walk safe with THP enabled
We need to have irqs disabled to handle all the possible parallel update for
linux page table without holding locks.

Events that we are intersted in while walking page tables are
1) Page fault
2) umap
3) THP split
4) THP collapse

A) local_irq_disabled:
------------------------
1) page fault:
A none to valid transition via page fault is not an issue because we
would either see a none or valid. If it is none, we would error out
the page table walk. We may need to use on stack values when checking for
type of page table elements, because if we do

if (!is_hugepd()) {
    if (!pmd_none() {
       if (pmd_bad() {

We could take that bad condition because the pmd got converted to a hugepd
after the !is_hugepd check via a hugetlb fault.

The right way would be to check for pmd_none higher up or use on stack value.

2) A valid to none conversion via unmap:
We can safely walk the upper level table, because we don't remove the the
page table entries until rcu grace period. So even if we followed a
wrong pointer we still have the pointer valid till the grace period.

A PTE pointer returned need to be atomically checked for _PAGE_PRESENT and
 _PAGE_BUSY. A valid pointer returned could becoming none later. To prevent
pte_clear we take _PAGE_BUSY.

3) THP split:
A valid transparent hugepage is converted to nomal page. Before we split we
do pmd_splitting_flush, which sets the hugepage PTE to _PAGE_SPLITTING
So when walking page table we need to check for pmd_trans_splitting and
handle that. The pte returned should also need to be checked for
_PAGE_SPLITTING before setting _PAGE_BUSY similar to _PAGE_PRESENT. We save
the value of PTE on stack and check for the flag in the local pte value.
If we don't have the value set we can safely operate on the local pte value
and we atomicaly set _PAGE_BUSY.

4) THP collapse:
A normal page gets converted to hugepage. In the collapse path, we
mark the pmd none early (pmdp_clear_flush). With irq disabled, if we
are aleady walking page table we would see the pmd_none and won't continue.
If we see a valid PMD, we should still check for _PAGE_PRESENT before
setting _PAGE_BUSY, to make sure we didn't collapse the PTE to a Huge PTE.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:56 +10:00
Linus Torvalds
5a148af669 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc update from Benjamin Herrenschmidt:
 "The main highlights this time around are:

   - A pile of addition POWER8 bits and nits, such as updated
     performance counter support (Michael Ellerman), new branch history
     buffer support (Anshuman Khandual), base support for the new PCI
     host bridge when not using the hypervisor (Gavin Shan) and other
     random related bits and fixes from various contributors.

   - Some rework of our page table format by Aneesh Kumar which fixes a
     thing or two and paves the way for THP support.  THP itself will
     not make it this time around however.

   - More Freescale updates, including Altivec support on the new e6500
     cores, new PCI controller support, and a pile of new boards support
     and updates.

   - The usual batch of trivial cleanups & fixes"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
  powerpc: Fix build error for book3e
  powerpc: Context switch the new EBB SPRs
  powerpc: Turn on the EBB H/FSCR bits
  powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S
  powerpc: Setup BHRB instructions facility in HFSCR for POWER8
  powerpc: Fix interrupt range check on debug exception
  powerpc: Update tlbie/tlbiel as per ISA doc
  powerpc: Print page size info during boot
  powerpc: print both base and actual page size on hash failure
  powerpc: Fix hpte_decode to use the correct decoding for page sizes
  powerpc: Decode the pte-lp-encoding bits correctly.
  powerpc: Use encode avpn where we need only avpn values
  powerpc: Reduce PTE table memory wastage
  powerpc: Move the pte free routines from common header
  powerpc: Reduce the PTE_INDEX_SIZE
  powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format
  powerpc: New hugepage directory format
  powerpc: Don't truncate pgd_index wrongly
  powerpc: Don't hard code the size of pte page
  powerpc: Save DAR and DSISR in pt_regs on MCE
  ...
2013-05-02 10:16:16 -07:00
Jiang Liu
7db6f78cfe mm/PPC: use free_highmem_page() to free highmem pages into buddy system
Use helper function free_highmem_page() to free highmem pages into
the buddy system.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: "Suzuki K. Poulose" <suzuki@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:32 -07:00
Jiang Liu
5d585e5c48 mm/ppc: use common help functions to free reserved pages
Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:30 -07:00
Valentina Manea
8040bda31a powerpc: place EXPORT_SYMBOL macro right after declaration
This fixes the following checkpatch.pl warnings:

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(kmap_prot);

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(kmap_pte);

Signed-off-by: Valentina Manea <valentina.manea.m@gmail.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:49 +10:00
Linus Torvalds
5ce1a70e2f Merge branch 'akpm' (more incoming from Andrew)
Merge second patch-bomb from Andrew Morton:

 - A little DM fix

 - the MM queue

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits)
  ksm: allocate roots when needed
  mm: cleanup "swapcache" in do_swap_page
  mm,ksm: swapoff might need to copy
  mm,ksm: FOLL_MIGRATION do migration_entry_wait
  ksm: shrink 32-bit rmap_item back to 32 bytes
  ksm: treat unstable nid like in stable tree
  ksm: add some comments
  tmpfs: fix mempolicy object leaks
  tmpfs: fix use-after-free of mempolicy object
  mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages
  mm: export mmu notifier invalidates
  mm: accelerate mm_populate() treatment of THP pages
  mm: use long type for page counts in mm_populate() and get_user_pages()
  mm: accurately document nr_free_*_pages functions with code comments
  HWPOISON: change order of error_states[]'s elements
  HWPOISON: fix misjudgement of page_action() for errors on mlocked pages
  memcg: stop warning on memcg_propagate_kmem
  net: change type of virtio_chan->p9_max_pages
  vmscan: change type of vm_total_pages to unsigned long
  fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
  ...
2013-02-23 17:50:35 -08:00
Wen Congyang
24d335ca36 memory-hotplug: introduce new arch_remove_memory() for removing page table
For removing memory, we need to remove page tables.  But it depends on
architecture.  So the patch introduce arch_remove_memory() for removing
page table.  Now it only calls __remove_pages().

Note: __remove_pages() for some archtecuture is not implemented
      (I don't know how to implement it for s390).

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:12 -08:00
Cody P Schafer
4e8309baed powerpc/mm: Eliminate unneeded for_each_memblock
The only persistent change made by this loop is calling
memblock_set_node() once for each memblock, which is not useful (and has
no effect) as memblock_set_node() is not called with any
memblock-specific parameters.

Subsistute a single memblock_set_node().

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-29 11:34:25 +11:00
Jia Hongtao
688ba1dbee powerpc/swiotlb: Enable at early stage and disable if not necessary
Remove the dependency on PCI initialization for SWIOTLB initialization.
So that PCI can be initialized at proper time.

SWIOTLB is partly determined by PCI inbound/outbound map which is assigned
in PCI initialization. But swiotlb_init() should be done at the stage of
mem_init() which is much earlier than PCI initialization. So we reserve the
memory for SWIOTLB first and free it if not necessary.

All boards are converted to fit this change.

Signed-off-by: Jia Hongtao <B38951@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Acked-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-09-12 14:57:09 -05:00
Suzuki Poulose
a84fcd4687 powerpc: Change memory_limit from phys_addr_t to unsigned long long
There are some device-tree nodes, whose values are of type phys_addr_t.
The phys_addr_t is variable sized based on the CONFIG_PHSY_T_64BIT.

Change these to a fixed unsigned long long for consistency.

This patch does the change only for memory_limit.

The following is a list of such variables which need the change:

 1) kernel_end, crashk_size - in arch/powerpc/kernel/machine_kexec.c

 2) (struct resource *)crashk_res.start - We could export a local static
    variable from machine_kexec.c.

Changing the above values might break the kexec-tools. So, I will
fix kexec-tools first to handle the different sized values and then change
 the above.

Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-07 11:44:30 +10:00
Alexander Graf
249ba1ee0f KVM: PPC: Add cache flush on page map
When we map a page that wasn't icache cleared before, do so when first
mapping it in KVM using the same information bits as the Linux mapping
logic. That way we are 100% sure that any page we map does not have stale
entries in the icache.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-08-16 14:14:53 +02:00
Cong Wang
2480b20892 powerpc: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:14 +08:00
Linus Torvalds
e4e88f31bc Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (185 commits)
  powerpc: fix compile error with 85xx/p1010rdb.c
  powerpc: fix compile error with 85xx/p1023_rds.c
  powerpc/fsl: add MSI support for the Freescale hypervisor
  arch/powerpc/sysdev/fsl_rmu.c: introduce missing kfree
  powerpc/fsl: Add support for Integrated Flash Controller
  powerpc/fsl: update compatiable on fsl 16550 uart nodes
  powerpc/85xx: fix PCI and localbus properties in p1022ds.dts
  powerpc/85xx: re-enable ePAPR byte channel driver in corenet32_smp_defconfig
  powerpc/fsl: Update defconfigs to enable some standard FSL HW features
  powerpc: Add TBI PHY node to first MDIO bus
  sbc834x: put full compat string in board match check
  powerpc/fsl-pci: Allow 64-bit PCIe devices to DMA to any memory address
  powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exit
  offb: Fix setting of the pseudo-palette for >8bpp
  offb: Add palette hack for qemu "standard vga" framebuffer
  offb: Fix bug in calculating requested vram size
  powerpc/boot: Change the WARN to INFO for boot wrapper overlap message
  powerpc/44x: Fix build error on currituck platform
  powerpc/boot: Change the load address for the wrapper to fit the kernel
  powerpc/44x: Enable CRASH_DUMP for 440x
  ...

Fix up a trivial conflict in arch/powerpc/include/asm/cputime.h due to
the additional sparse-checking code for cputime_t.
2012-01-06 17:58:22 -08:00
Tejun Heo
1d7cfe18ec powerpc: Use HAVE_MEMBLOCK_NODE_MAP
powerpc doesn't access early_node_map[] directly and enabling
HAVE_MEMBLOCK_NODE_MAP is trivial - replacing add_active_range() calls
with memblock_set_node() and selecting HAVE_MEMBLOCK_NODE_MAP is
enough.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
2011-12-08 10:22:08 -08:00
sukadev@linux.vnet.ibm.com
8a3e3d31d1 powerpc: Punch a hole in /dev/mem for librtas
With CONFIG_STRICT_DEVMEM=y, user space cannot read any part of /dev/mem.
Since this breaks librtas, punch a hole in /dev/mem to allow access to the
rmo_buffer that librtas needs.

Anton Blanchard reported the problem and helped with the fix.

A quick test for this patch:

       # cat /proc/rtas/rmo_buffer
       000000000f190000 10000

       # python -c "print 0x000000000f190000 / 0x10000"
       3865

       # dd if=/dev/mem of=/tmp/foo count=1 bs=64k skip=3865
       1+0 records in
       1+0 records out
       65536 bytes (66 kB) copied, 0.000205235 s, 319 MB/s

       # dd if=/dev/mem of=/tmp/foo
       dd: reading `/dev/mem': Operation not permitted
       0+0 records in
       0+0 records out
       0 bytes (0 B) copied, 0.00022519 s, 0.0 kB/s

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-08 14:22:52 +11:00
Becky Bruce
d93e4d7d72 powerpc/book3e: Change hugetlb preload to take vma argument
This avoids an extra find_vma() and is less error-prone.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-07 16:26:24 +11:00
sukadev@linux.vnet.ibm.com
1d54cf2b97 powerpc: Implement CONFIG_STRICT_DEVMEM
As described in the help text in the patch, this token restricts general
access to /dev/mem as a way of increasing the security. Specifically, access
to exclusive IOMEM and kernel RAM is denied unless CONFIG_STRICT_DEVMEM is
set to 'n'.

Implement the 'devmem_is_allowed()' interface for Powerpc. It will be
called from range_is_allowed() when userpsace attempts to access /dev/mem.

This patch is based on an earlier patch from Steve Best and with input from
Paul Mackerras and Scott Wood.

[BenH] Fixed a typo or two and removed the generic change which should
       be submitted as a separate patch

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-11-28 11:42:08 +11:00
Anton Blanchard
c40dd2f766 powerpc: Add System RAM to /proc/iomem
We've resisted adding System RAM to /proc/iomem because it is
the wrong place for it. Unfortunately we continue to find tools
that rely on this behaviour so give up and add it in.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-11-08 14:51:46 +11:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Paul Gortmaker
4b16f8e2d6 powerpc: various straight conversions from module.h --> export.h
All these files were including module.h just for the basic
EXPORT_SYMBOL infrastructure.  We can shift them off to the
export.h header which is a way smaller footprint and thus
realize some compile time gains.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:44 -04:00
Anton Blanchard
a11940978b powerpc: Fix oops when echoing bad values to /sys/devices/system/memory/probe
If we echo an address the hypervisor doesn't like to
/sys/devices/system/memory/probe we oops the box:

# echo 0x10000000000 > /sys/devices/system/memory/probe

kernel BUG at arch/powerpc/mm/hash_utils_64.c:541!

The backtrace is:

create_section_mapping
arch_add_memory
add_memory
memory_probe_store
sysdev_class_store
sysfs_write_file
vfs_write
SyS_write

In create_section_mapping we BUG if htab_bolt_mapping returned
an error. A better approach is to return an error which will
propagate back to userspace.

Rerunning the test with this patch applied:

# echo 0x10000000000 > /sys/devices/system/memory/probe
-bash: echo: write error: Invalid argument

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-20 15:53:23 +10:00
Becky Bruce
41151e77a4 powerpc: Hugetlb for BookE
Enable hugepages on Freescale BookE processors.  This allows the kernel to
use huge TLB entries to map pages, which can greatly reduce the number of
TLB misses and the amount of TLB thrashing experienced by applications with
large memory footprints.  Care should be taken when using this on FSL
processors, as the number of large TLB entries supported by the core is low
(16-64) on current processors.

The supported set of hugepage sizes include 4m, 16m, 64m, 256m, and 1g.
Page sizes larger than the max zone size are called "gigantic" pages and
must be allocated on the command line (and cannot be deallocated).

This is currently only fully implemented for Freescale 32-bit BookE
processors, but there is some infrastructure in the code for
64-bit BooKE.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-20 09:19:40 +10:00
Tony Breeds
f7ba2991e9 powerpc/mm: Fix output of total_ram.
On 32bit platforms that support >= 4GB memory total_ram was truncated.
This creates a confusing printk:
	Top of RAM: 0x100000000, Total RAM: 0x0
Fix that:
	Top of RAM: 0x100000000, Total RAM: 0x100000000

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-07-19 15:13:04 +10:00
Becky Bruce
3160b09796 powerpc: Create next_tlbcam_idx percpu variable for FSL_BOOKE
This is used to round-robin TLBCAM entries.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2011-07-08 00:21:34 -05:00
Dave Carroll
a9c0f41b3a powerpc: Add printk companion for ppc_md.progress
This patch adds a printk companion to replace the udbg progress function
when initmem is freed.

Suggested-by: Milton Miller <miltonm@bga.com>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Carroll <dcarroll@astekcorp.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-30 15:28:05 +10:00
Dave Carroll
2773fcc8c4 powerpc: Move free_initmem to common code
The free_initmem function is basically duplicated in mm/init_32,
and init_64, and is moved to the common 32/64-bit mm/mem.c.

All other sections except init were removed in v2.6.15 by
6c45ab992e (powerpc: Remove section
free() and linker script bits), and therefore the bulk of the executed
code is identical.

This patch also removes updating ppc_md.progress to NULL in the powermac
late_initcall.

Suggested-by: Milton Miller <miltonm@bga.com>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Carroll <dcarroll@astekcorp.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-30 15:28:05 +10:00