- Support for userspace to send requests directly to the on-chip GZIP
accelerator on Power9.
- Rework of our lockless page table walking (__find_linux_pte()) to make it
safe against parallel page table manipulations without relying on an IPI for
serialisation.
- A series of fixes & enhancements to make our machine check handling more
robust.
- Lots of plumbing to add support for "prefixed" (64-bit) instructions on
Power10.
- Support for using huge pages for the linear mapping on 8xx (32-bit).
- Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound driver.
- Removal of some obsolete 40x platforms and associated cruft.
- Initial support for booting on Power10.
- Lots of other small features, cleanups & fixes.
Thanks to:
Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Andrey Abramov,
Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent Abali, Cédric Le
Goater, Chen Zhou, Christian Zigotzky, Christophe JAILLET, Christophe Leroy,
Dmitry Torokhov, Emmanuel Nicolet, Erhard F., Gautham R. Shenoy, Geoff Levand,
George Spelvin, Greg Kurz, Gustavo A. R. Silva, Gustavo Walbon, Haren Myneni,
Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Kees Cook, Leonardo
Bras, Madhavan Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael
Neuling, Michal Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao,
Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram
Pai, Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler, Wolfram
Sang, Xiongfeng Wang.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl7aYZ8THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPiKD/9zNCuZLFMAFrIdbm0HlYA2RGYZFT75
GUHsqYyei1pxA7PgM3KwJiXELVODsBv0eQbgNh1tbecKrxPRegN/cywd1KLjPZ7I
v5/qweQP8MvR0RhzjbhvUcO0jq/f8u2LbJr5mUfVzjU6tAvrvcWo3oZqDElsekCS
kgyOH3r1vZ2PLTMiGFhb0gWi2iqc+6BHU1AFCGPCMjB1Vu5d5+54VvZ/6lllGsOF
yg9CBXmmVvQ+Bn6tH4zdEB78FYxnAIwBqlbmL79i5ca+HQJ0Sw6HuPRy9XYq35p6
2EiXS4Wrgp7i7+1TN3HO362u5Onb8TSyQU7NS6yCFPoJ6JQxcJMBIw6mHhnXOPuZ
CrjgcdwUMjx8uDoKmX1Epbfuex2w+AysW+4yBHPFiSgl3klKC3D0wi95mR485w2F
rN8uzJtrDeFKcYZJG7IoB/cgFCCPKGf9HaXr8q0S/jBKMffx91ul3cfzlfdIXOCw
FDNw/+ZX7UD6ddFEG12ZTO+vdL8yf1uCRT/DIZwUiDMIA0+M6F4nc7j3lfyZfoO1
65f9UlhoLxScq7VH2fKH4UtZatO9cPID2z1CmiY4UbUIPtFDepSuYClgLF+Duf4b
rkfxhKU0+Ja1zNH5XNc+L+Bc5/W4lFiJXz02dYIjtHoUpWkc1aToOETVwzggYFNM
G3PXIBOI0jRgRw==
=o0WU
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Support for userspace to send requests directly to the on-chip GZIP
accelerator on Power9.
- Rework of our lockless page table walking (__find_linux_pte()) to
make it safe against parallel page table manipulations without
relying on an IPI for serialisation.
- A series of fixes & enhancements to make our machine check handling
more robust.
- Lots of plumbing to add support for "prefixed" (64-bit) instructions
on Power10.
- Support for using huge pages for the linear mapping on 8xx (32-bit).
- Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound
driver.
- Removal of some obsolete 40x platforms and associated cruft.
- Initial support for booting on Power10.
- Lots of other small features, cleanups & fixes.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan,
Andrey Abramov, Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent
Abali, Cédric Le Goater, Chen Zhou, Christian Zigotzky, Christophe
JAILLET, Christophe Leroy, Dmitry Torokhov, Emmanuel Nicolet, Erhard F.,
Gautham R. Shenoy, Geoff Levand, George Spelvin, Greg Kurz, Gustavo A.
R. Silva, Gustavo Walbon, Haren Myneni, Hari Bathini, Joel Stanley,
Jordan Niethe, Kajol Jain, Kees Cook, Leonardo Bras, Madhavan
Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Michal
Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram Pai,
Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler,
Wolfram Sang, Xiongfeng Wang.
* tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (299 commits)
powerpc/pseries: Make vio and ibmebus initcalls pseries specific
cxl: Remove dead Kconfig options
powerpc: Add POWER10 architected mode
powerpc/dt_cpu_ftrs: Add MMA feature
powerpc/dt_cpu_ftrs: Enable Prefixed Instructions
powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected
powerpc: Add support for ISA v3.1
powerpc: Add new HWCAP bits
powerpc/64s: Don't set FSCR bits in INIT_THREAD
powerpc/64s: Save FSCR to init_task.thread.fscr after feature init
powerpc/64s: Don't let DT CPU features set FSCR_DSCR
powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()
powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG
powerpc/module_64: Use special stub for _mcount() with -mprofile-kernel
powerpc/module_64: Simplify check for -mprofile-kernel ftrace relocations
powerpc/module_64: Consolidate ftrace code
powerpc/32: Disable KASAN with pages bigger than 16k
powerpc/uaccess: Don't set KUEP by default on book3s/32
powerpc/uaccess: Don't set KUAP by default on book3s/32
powerpc/8xx: Reduce time spent in allow_user_access() and friends
...
We want to support kmap_atomic_prot() on all architectures and it makes
sense to define kmap_atomic() to use the default kmap_prot.
So we ensure all arch's have a globally available kmap_prot either as a
define or exported symbol.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-9-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
free_area_init() has effectively became a wrapper for
free_area_init_nodes() and there is no point of keeping it. Still
free_area_init() name is shorter and more general as it does not imply
necessity to initialize multiple nodes.
Rename free_area_init_nodes() to free_area_init(), update the callers and
drop old version of free_area_init().
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Hoan Tran <hoan@os.amperecomputing.com> [arm64]
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200412194859.12663-6-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
devm_memremap_pages() is currently used by the PCI P2PDMA code to create
struct page mappings for IO memory. At present, these mappings are
created with PAGE_KERNEL which implies setting the PAT bits to be WB.
However, on x86, an mtrr register will typically override this and force
the cache type to be UC-. In the case firmware doesn't set this
register it is effectively WB and will typically result in a machine
check exception when it's accessed.
Other arches are not currently likely to function correctly seeing they
don't have any MTRR registers to fall back on.
To solve this, provide a way to specify the pgprot value explicitly to
arch_add_memory().
Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a
simple change to pass the pgprot_t down to their respective functions
which set up the page tables. For x86_32, set the page tables
explicitly using _set_memory_prot() (seeing they are already mapped).
For ia64, s390 and sh, reject anything but PAGE_KERNEL settings -- this
should be fine, for now, seeing these architectures don't support
ZONE_DEVICE.
A check in __add_pages() is also added to ensure the pgprot parameter
was set for all arches.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-7-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In prepartion to support a pgprot_t argument for arch_add_memory().
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-6-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The mhp_restrictions struct really doesn't specify anything resembling a
restriction anymore so rename it to be mhp_params as it is a list of
extended parameters.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-3-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stefan reported a strange kernel fault which turned out to be due to a
missing KUAP disable in flush_coherent_icache() called from
flush_icache_range().
The fault looks like:
Kernel attempted to access user page (7fffc30d9c00) - exploit attempt? (uid: 1009)
BUG: Unable to handle kernel data access on read at 0x7fffc30d9c00
Faulting instruction address: 0xc00000000007232c
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
CPU: 35 PID: 5886 Comm: sigtramp Not tainted 5.6.0-rc2-gcc-8.2.0-00003-gfc37a1632d40 #79
NIP: c00000000007232c LR: c00000000003b7fc CTR: 0000000000000000
REGS: c000001e11093940 TRAP: 0300 Not tainted (5.6.0-rc2-gcc-8.2.0-00003-gfc37a1632d40)
MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28000884 XER: 00000000
CFAR: c0000000000722fc DAR: 00007fffc30d9c00 DSISR: 08000000 IRQMASK: 0
GPR00: c00000000003b7fc c000001e11093bd0 c0000000023ac200 00007fffc30d9c00
GPR04: 00007fffc30d9c18 0000000000000000 c000001e11093bd4 0000000000000000
GPR08: 0000000000000000 0000000000000001 0000000000000000 c000001e1104ed80
GPR12: 0000000000000000 c000001fff6ab380 c0000000016be2d0 4000000000000000
GPR16: c000000000000000 bfffffffffffffff 0000000000000000 0000000000000000
GPR20: 00007fffc30d9c00 00007fffc30d8f58 00007fffc30d9c18 00007fffc30d9c20
GPR24: 00007fffc30d9c18 0000000000000000 c000001e11093d90 c000001e1104ed80
GPR28: c000001e11093e90 0000000000000000 c0000000023d9d18 00007fffc30d9c00
NIP flush_icache_range+0x5c/0x80
LR handle_rt_signal64+0x95c/0xc2c
Call Trace:
0xc000001e11093d90 (unreliable)
handle_rt_signal64+0x93c/0xc2c
do_notify_resume+0x310/0x430
ret_from_except_lite+0x70/0x74
Instruction dump:
409e002c 7c0802a6 3c62ff31 3863f6a0 f8010080 48195fed 60000000 48fe4c8d
60000000 e8010080 7c0803a6 7c0004ac <7c00ffac> 7c0004ac 4c00012c 38210070
This path through handle_rt_signal64() to setup_trampoline() and
flush_icache_range() is only triggered by 64-bit processes that have
unmapped their VDSO, which is rare.
flush_icache_range() takes a range of addresses to flush. In
flush_coherent_icache() we implement an optimisation for CPUs where we
know we don't actually have to flush the whole range, we just need to
do a single icbi.
However we still execute the icbi on the user address of the start of
the range we're flushing. On CPUs that also implement KUAP (Power9)
that leads to the spurious fault above.
We should be able to pass any address, including a kernel address, to
the icbi on these CPUs, which would avoid any interaction with KUAP.
But I don't want to make that change in a bug fix, just in case it
surfaces some strange behaviour on some CPU.
So for now just disable KUAP around the icbi. Note the icbi is treated
as a load, so we allow read access, not write as you'd expect.
Fixes: 890274c2dc ("powerpc/64s: Implement KUAP for Radix MMU")
Cc: stable@vger.kernel.org # v5.2+
Reported-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200303235708.26004-1-mpe@ellerman.id.au
Commit 8d30c14cab ("powerpc/mm: Rework I$/D$ coherency (v3)") and
commit 90ac19a8b2 ("[POWERPC] Abolish iopa(), mm_ptov(),
io_block_mapping() from arch/powerpc") removed the use of get_pteptr()
outside of mm/pgtable_32.c
In mm/pgtable_32.c, the only user of get_pteptr() is change_page_attr()
which operates on kernel context and on lowmem pages only.
Make virt_to_kpte() available outside of mm/mem.c and use it instead
of get_pteptr(), and drop get_pteptr()
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/788378c6c3ba5c5298caab7c7f95e6c3c88244b8.1578558199.git.christophe.leroy@c-s.fr
At several places pmd pointer is retrieved through the same action:
pmd = pmd_offset(pud_offset(pgd_offset(mm, addr), addr), addr);
or
pmd = pmd_offset(pud_offset(pgd_offset_k(addr), addr), addr);
Refactor this by implementing two helpers pmd_ptr() and pmd_ptr_k()
This will help when adding the p4d level.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7b065c5be35726af4066cab238ee35cabceda1fa.1578558199.git.christophe.leroy@c-s.fr
Add support of KASAN_VMALLOC on PPC32.
To allow this, the early shadow covering the VMALLOC space
need to be removed once high_memory var is set and before
freeing memblock.
And the VMALLOC area need to be aligned such that boundaries
are covered by a full shadow page.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/031dec5487bde9b2181c8b3c9800e1879cf98c1a.1579024426.git.christophe.leroy@c-s.fr
We currently try to shrink a single zone when removing memory. We use
the zone of the first page of the memory we are removing. If that
memmap was never initialized (e.g., memory was never onlined), we will
read garbage and can trigger kernel BUGs (due to a stale pointer):
BUG: unable to handle page fault for address: 000000000000353d
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
RIP: 0010:clear_zone_contiguous+0x5/0x10
Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840
RSP: 0018:ffffad2400043c98 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000
RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40
RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680
FS: 0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__remove_pages+0x4b/0x640
arch_remove_memory+0x63/0x8d
try_remove_memory+0xdb/0x130
__remove_memory+0xa/0x11
acpi_memory_device_remove+0x70/0x100
acpi_bus_trim+0x55/0x90
acpi_device_hotplug+0x227/0x3a0
acpi_hotplug_work_fn+0x1a/0x30
process_one_work+0x221/0x550
worker_thread+0x50/0x3b0
kthread+0x105/0x140
ret_from_fork+0x3a/0x50
Modules linked in:
CR2: 000000000000353d
Instead, shrink the zones when offlining memory or when onlining failed.
Introduce and use remove_pfn_range_from_zone(() for that. We now
properly shrink the zones, even if we have DIMMs whereby
- Some memory blocks fall into no zone (never onlined)
- Some memory blocks fall into multiple zones (offlined+re-onlined)
- Multiple memory blocks that fall into different zones
Drop the zone parameter (with a potential dubious value) from
__remove_pages() and __remove_section().
Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: <stable@vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some powerpc platforms (e.g. 85xx) limit DMA-able memory way below 4G.
If a system has more physical memory than this limit, the swiotlb
buffer is not addressable because it is allocated from memblock using
top-down mode.
Force memblock to bottom-up mode before calling swiotlb_init() to
ensure that the swiotlb buffer is DMA-able.
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191204123524.22919-1-rppt@kernel.org
Highlights:
- Infrastructure for secure boot on some bare metal Power9 machines. The
firmware support is still in development, so the code here won't actually
activate secure boot on any existing systems.
- A change to xmon (our crash handler / pseudo-debugger) to restrict it to
read-only mode when the kernel is lockdown'ed, otherwise it's trivial to drop
into xmon and modify kernel data, such as the lockdown state.
- Support for KASLR on 32-bit BookE machines (Freescale / NXP).
- Fixes for our flush_icache_range() and __kernel_sync_dicache() (VDSO) to work
with memory ranges >4GB.
- Some reworks of the pseries CMM (Cooperative Memory Management) driver to
make it behave more like other balloon drivers and enable some cleanups of
generic mm code.
- A series of fixes to our hardware breakpoint support to properly handle
unaligned watchpoint addresses.
Plus a bunch of other smaller improvements, fixes and cleanups.
Thanks to:
Alastair D'Silva, Andrew Donnellan, Aneesh Kumar K.V, Anthony Steinhauser,
Cédric Le Goater, Chris Packham, Chris Smart, Christophe Leroy, Christopher M.
Riedl, Christoph Hellwig, Claudio Carvalho, Daniel Axtens, David Hildenbrand,
Deb McLemore, Diana Craciun, Eric Richter, Geert Uytterhoeven, Greg
Kroah-Hartman, Greg Kurz, Gustavo L. F. Walbon, Hari Bathini, Harish, Jason
Yan, Krzysztof Kozlowski, Leonardo Bras, Mathieu Malaterre, Mauro S. M.
Rodrigues, Michal Suchanek, Mimi Zohar, Nathan Chancellor, Nathan Lynch, Nayna
Jain, Nick Desaulniers, Oliver O'Halloran, Qian Cai, Rasmus Villemoes, Ravi
Bangoria, Sam Bobroff, Santosh Sivaraj, Scott Wood, Thomas Huth, Tyrel
Datwyler, Vaibhav Jain, Valentin Longchamp, YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl3hBycTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgApBEACk91MEQDYJ9MF9I6uN+85qb5p4pMsp
rGzqnpt+XFidbDAc3eP63pYfIDSo3jtkQ2YL7shAnDOTvkO0md+Vqkl9Aq/G6FIf
lDBlwbgkXMSxS/O2Lpvfn4NZAoK6dKmiV55LSgfliM62X3e2Saeg6TR55wBTgJ6/
SlYPDwZfcVHOAiFS3UmfB+hkiIZk+AI5Zr5VAZvT2ZmeH36yAWkq4JgJI1uAk6m1
/7iCnlfUjx/nl/BhnA3kjjmAgGCJ5s/WuVgwFMz47XpMBWGBhLWpMh/NqDTFb8ca
kpiVQoVPQe2xyO3pL/kOwBy6sii26ftfHDhLKMy1hJdEhVQzS5LerPIMeh1qsU8Q
hV/Cj+jfsrS/vBDOehj3jwx93t+861PmTOqgLnpYQ6Ltrt+2B/74+fufGMHE1kI3
Ffo7xvNw4sw6bSziDxDFqUx2P1dFN5D5EJsJsYM98ekkVAAkzNqCDRvfD2QI8Pif
VXWPYXqtNJTrVPJA0D7Yfo9FDNwhANd0f1zi7r/U5mVXBFUyKOlGqTQSkXgMrVeK
3I7wHPOVGgdA5UUkfcd3pcuqsY081U9E//o5PUfj8ybO5JCwly8NoatbG+xHmKia
a72uJT8MjCo9mGCHKDrwi9l/kqms6ZSv8RP+yMhGuB52YoiGc6PpVyab5jXIUd1N
yTtBlC0YGW1JYw==
=JHzg
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights:
- Infrastructure for secure boot on some bare metal Power9 machines.
The firmware support is still in development, so the code here
won't actually activate secure boot on any existing systems.
- A change to xmon (our crash handler / pseudo-debugger) to restrict
it to read-only mode when the kernel is lockdown'ed, otherwise it's
trivial to drop into xmon and modify kernel data, such as the
lockdown state.
- Support for KASLR on 32-bit BookE machines (Freescale / NXP).
- Fixes for our flush_icache_range() and __kernel_sync_dicache()
(VDSO) to work with memory ranges >4GB.
- Some reworks of the pseries CMM (Cooperative Memory Management)
driver to make it behave more like other balloon drivers and enable
some cleanups of generic mm code.
- A series of fixes to our hardware breakpoint support to properly
handle unaligned watchpoint addresses.
Plus a bunch of other smaller improvements, fixes and cleanups.
Thanks to: Alastair D'Silva, Andrew Donnellan, Aneesh Kumar K.V,
Anthony Steinhauser, Cédric Le Goater, Chris Packham, Chris Smart,
Christophe Leroy, Christopher M. Riedl, Christoph Hellwig, Claudio
Carvalho, Daniel Axtens, David Hildenbrand, Deb McLemore, Diana
Craciun, Eric Richter, Geert Uytterhoeven, Greg Kroah-Hartman, Greg
Kurz, Gustavo L. F. Walbon, Hari Bathini, Harish, Jason Yan, Krzysztof
Kozlowski, Leonardo Bras, Mathieu Malaterre, Mauro S. M. Rodrigues,
Michal Suchanek, Mimi Zohar, Nathan Chancellor, Nathan Lynch, Nayna
Jain, Nick Desaulniers, Oliver O'Halloran, Qian Cai, Rasmus Villemoes,
Ravi Bangoria, Sam Bobroff, Santosh Sivaraj, Scott Wood, Thomas Huth,
Tyrel Datwyler, Vaibhav Jain, Valentin Longchamp, YueHaibing"
* tag 'powerpc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (144 commits)
powerpc/fixmap: fix crash with HIGHMEM
x86/efi: remove unused variables
powerpc: Define arch_is_kernel_initmem_freed() for lockdep
powerpc/prom_init: Use -ffreestanding to avoid a reference to bcmp
powerpc: Avoid clang warnings around setjmp and longjmp
powerpc: Don't add -mabi= flags when building with Clang
powerpc: Fix Kconfig indentation
powerpc/fixmap: don't clear fixmap area in paging_init()
selftests/powerpc: spectre_v2 test must be built 64-bit
powerpc/powernv: Disable native PCIe port management
powerpc/kexec: Move kexec files into a dedicated subdir.
powerpc/32: Split kexec low level code out of misc_32.S
powerpc/sysdev: drop simple gpio
powerpc/83xx: map IMMR with a BAT.
powerpc/32s: automatically allocate BAT in setbat()
powerpc/ioremap: warn on early use of ioremap()
powerpc: Add support for GENERIC_EARLY_IOREMAP
powerpc/fixmap: Use __fix_to_virt() instead of fix_to_virt()
powerpc/8xx: use the fixmapped IMMR in cpm_reset()
powerpc/8xx: add __init to cpm1 init functions
...
This operation takes a significant amount of time when hotplugging
large amounts of memory (~50 seconds with 890GB of persistent memory).
This was orignally in commit fb5924fddf
("powerpc/mm: Flush cache on memory hot(un)plug") to support memtrace,
but the flush on add is not needed as it is flushed on remove.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191104023305.9581-7-alastair@au1.ibm.com
When presented with large amounts of memory being hotplugged
(in my test case, ~890GB), the call to flush_dcache_range takes
a while (~50 seconds), triggering RCU stalls.
This patch breaks up the call into 1GB chunks, calling
cond_resched() inbetween to allow the scheduler to run.
Fixes: fb5924fddf ("powerpc/mm: Flush cache on memory hot(un)plug")
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191104023305.9581-6-alastair@au1.ibm.com
Similar to commit 22e9c88d48
("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
this patch converts the following ASM symbols to C:
flush_icache_range()
__flush_dcache_icache()
__flush_dcache_icache_phys()
This was done as we discovered a long-standing bug where the length of the
range was truncated due to using a 32 bit shift instead of a 64 bit one.
By converting these functions to C, it becomes easier to maintain.
flush_dcache_icache_phys() retains a critical assembler section as we must
ensure there are no memory accesses while the data MMU is disabled
(authored by Christophe Leroy). Since this has no external callers, it has
also been made static, allowing the compiler to inline it within
flush_dcache_icache_page().
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[mpe: Minor fixups, don't export __flush_dcache_icache()]
Link: https://lore.kernel.org/r/20191104023305.9581-5-alastair@au1.ibm.com
Some architectures, notably ARM, are interested in tweaking this
depending on their runtime DMA addressing limitations.
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This switches to using common code for the DMA allocations, including
potential use of the CMA allocator if configured.
Switching to the generic code enables DMA allocations from atomic
context, which is required by the DMA API documentation, and also
adds various other minor features drivers start relying upon. It
also makes sure we have on tested code base for all architectures
that require uncached pte bits for coherent DMA allocations.
Another advantage is that consistent memory allocations now share
the general vmalloc pool instead of needing an explicit careout
from it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # tested on 8xx
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190814132230.31874-2-hch@lst.de
update_mmu_cache() is only for BOOK3S, and can be simplified for
BOOK3S32.
Move it out of mem.c into respective BOOK3S32 and BOOK3S64 files
containing hash utils.
BOOK3S64 version of hash_preload() is only used locally, declare it
static.
Remove the radix_enabled() stuff in BOOK3S32 version.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/107aaf43583a5f5d09e0d4e84c4c4390ecfcd512.1565933217.git.christophe.leroy@c-s.fr
Only BOOK3S and FSL_BOOK3E have a usefull update_mmu_cache().
For the others, just define it static inline.
In the meantime, simplify the FSL_BOOK3E related ifdef as
book3e_hugetlb_preload() only exists when CONFIG_PPC_FSL_BOOK3E
is selected.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/668aba4db6b9af6d8a151174e11a4289f1a6bbcd.1565933217.git.christophe.leroy@c-s.fr
An assortment of non-regression fixes that have accumulated since the start of
the merge window.
A fix for a user triggerable oops on machines where transactional memory is
disabled, eg. Power9 bare metal, Power8 with TM disabled on the command line, or
all Power7 or earlier machines.
Three fixes for handling of PMU and power saving registers when running nested
KVM on Power9.
Two fixes for bugs found while stress testing the XIVE interrupt controller
code, also on Power9.
A fix to allow guests to boot under Qemu/KVM on Power9 using the the Hash MMU
with >= 1TB of memory.
Two fixes for bugs in the recent DMA cleanup, one of which could lead to
checkstops.
And finally three fixes for the PAPR SCM nvdimm driver.
Thanks to:
Alexey Kardashevskiy, Andrea Arcangeli, Cédric Le Goater, Christoph Hellwig,
David Gibson, Gautham R. Shenoy, Michael Neuling, Oliver O'Halloran,, Satheesh
Rajendran, Shawn Anastasio, Suraj Jitindar Singh, Vaibhav Jain.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdOF2iAAoJEFHr6jzI4aWAVmsQAJ//UY1a+lz39y/5jmkybJbH
HVnja6ZhsKd3+ZAnljGmqr1zuwDmy8+X3pT+1832zBBm4Z1cNKj1c0wuK5fuhAfq
o0XkO0N9GFcQu8HUPb5wSBOoyXwK0qUhExfCVobl7YsDAyAI2//nSQTwxNX3W4Hv
P7hz48pBbiqRzQAOSHV8ZlcOBETbSVAXeNalSXrXqSJmXQbVWCQcd6vucMSwZ7S5
ZiiL/gCBoO0kd0ZQRsGXCbwcjcR4NlTDN0M40og8Y9KTDkId8HdmJyXW3tMcZo/g
W3LeMR94bUh/KrK88lMBrRXKUlxL+loZKWZaeNlA5+ShCYk/ZafkKri/QUX/glOq
ahm8uqokdZ5VS1tgSYoJIKdA5qMGvv8V+CpHRJnZqaEhUCduQa5XmWPnDnEKkDt0
94VBsk0D2vHYKyygv5JMgYHQVlU7XrQF8fw2pKShpqLMY7ZMpeDDmKN9AuzxhawF
9b7HigbwNt5LvNJ0xn097KW+svCK7i3ZgiQe83W36wjSl2ystgjJ3T7yrH6Q1rKH
o4loEGA4gASTDjTmWQM20lHT1xQHY4fQBC/wi/67as3m0TDeGXYI0fOZC5qtEBFr
Ln/0e78VMhut/RWicDlRveszef1MCi1warR9R4I/bQ8M6O1BzHYsQ9zr6H111uxL
vQ92Yp8G2PoqN7wlFlSG
=Yc9T
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"An assortment of non-regression fixes that have accumulated since the
start of the merge window.
- A fix for a user triggerable oops on machines where transactional
memory is disabled, eg. Power9 bare metal, Power8 with TM disabled
on the command line, or all Power7 or earlier machines.
- Three fixes for handling of PMU and power saving registers when
running nested KVM on Power9.
- Two fixes for bugs found while stress testing the XIVE interrupt
controller code, also on Power9.
- A fix to allow guests to boot under Qemu/KVM on Power9 using the
the Hash MMU with >= 1TB of memory.
- Two fixes for bugs in the recent DMA cleanup, one of which could
lead to checkstops.
- And finally three fixes for the PAPR SCM nvdimm driver.
Thanks to: Alexey Kardashevskiy, Andrea Arcangeli, Cédric Le Goater,
Christoph Hellwig, David Gibson, Gautham R. Shenoy, Michael Neuling,
Oliver O'Halloran, Satheesh Rajendran, Shawn Anastasio, Suraj Jitindar
Singh, Vaibhav Jain"
* tag 'powerpc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails
powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL
powerpc/pseries: Update SCM hcall op-codes in hvcall.h
powerpc/tm: Fix oops on sigreturn on systems without TM
powerpc/dma: Fix invalid DMA mmap behavior
KVM: PPC: Book3S HV: XIVE: fix rollback when kvmppc_xive_create fails
powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask()
powerpc: fix off by one in max_zone_pfn initialization for ZONE_DMA
KVM: PPC: Book3S HV: Save and restore guest visible PSSCR bits on pseries
powerpc/pmu: Set pmcregs_in_use in paca when running as LPAR
KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting
powerpc/mm: Limit rma_size to 1TB when running without HV mode
We want to improve error handling while adding memory by allowing to use
arch_remove_memory() and __remove_pages() even if
CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
arch_add_memory()
rc = do_something();
if (rc) {
arch_remove_memory();
}
We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
quite some dependencies for memory offlining.
Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
25078dc1f7 first introduced an off by
one error in the ZONE_DMA initialization of PPC_BOOK3E_64=y and since
9739ab7eda the off by one applies to
PPC32=y too. This simply corrects the off by one and should resolve
crashes like below:
[ 65.179101] page 0x7fff outside node 0 zone DMA [ 0x0 - 0x7fff ]
Unfortunately in various MM places "max" means a non inclusive end of
range. free_area_init_nodes max_zone_pfn parameter is one case and
MAX_ORDER is another one (unrelated) that comes by memory.
Reported-by: Zorro Lang <zlang@redhat.com>
Fixes: 25078dc1f7 ("powerpc: use mm zones more sensibly")
Fixes: 9739ab7eda ("powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190625141727.2883-1-aarcange@redhat.com
Improvements and bug fixes for the hmm interface in the kernel:
- Improve clarity, locking and APIs related to the 'hmm mirror' feature
merged last cycle. In linux-next we now see AMDGPU and nouveau to be
using this API.
- Remove old or transitional hmm APIs. These are hold overs from the past
with no users, or APIs that existed only to manage cross tree conflicts.
There are still a few more of these cleanups that didn't make the merge
window cut off.
- Improve some core mm APIs:
* export alloc_pages_vma() for driver use
* refactor into devm_request_free_mem_region() to manage
DEVICE_PRIVATE resource reservations
* refactor duplicative driver code into the core dev_pagemap
struct
- Remove hmm wrappers of improved core mm APIs, instead have drivers use
the simplified API directly
- Remove DEVICE_PUBLIC
- Simplify the kconfig flow for the hmm users and core code
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl0k1zkACgkQOG33FX4g
mxrO+w//QF/yI/9Hh30RWEBq8W107cODkDlaT0Z/7cVEXfGetZzIUpqzxnJofRfQ
xTw1XmYkc9WpJe/mTTuFZFewNQwWuMM6X0Xi25fV438/Y64EclevlcJTeD49TIH1
CIMsz8bX7CnCEq5sz+UypLg9LPnaD9L/JLyuSbyjqjms/o+yzqa7ji7p/DSINuhZ
Qva9OZL1ZSEDJfNGi8uGpYBqryHoBAonIL12R9sCF5pbJEnHfWrH7C06q7AWOAjQ
4vjN/p3F4L9l/v2IQ26Kn/S0AhmN7n3GT//0K66e2gJPfXa8fxRKGuFn/Kd79EGL
YPASn5iu3cM23up1XkbMNtzacL8yiIeTOcMdqw26OaOClojy/9OJduv5AChe6qL/
VUQIAn1zvPsJTyC5U7mhmkrGuTpP6ivHpxtcaUp+Ovvi1cyK40nLCmSNvLnbN5ES
bxbb0SjE4uupDG5qU6Yct/hFp6uVMSxMqXZOb9Xy8ZBkbMsJyVOLj71G1/rVIfPU
hO1AChX5CRG1eJoMo6oBIpiwmSvcOaPp3dqIOQZvwMOqrO869LR8qv7RXyh/g9gi
FAEKnwLl4GK3YtEO4Kt/1YI5DXYjSFUbfgAs0SPsRKS6hK2+RgRk2M/B/5dAX0/d
lgOf9WPODPwiSXBYLtJB8qHVDX0DIY8faOyTx6BYIKClUtgbBI8=
=wKvp
-----END PGP SIGNATURE-----
Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull HMM updates from Jason Gunthorpe:
"Improvements and bug fixes for the hmm interface in the kernel:
- Improve clarity, locking and APIs related to the 'hmm mirror'
feature merged last cycle. In linux-next we now see AMDGPU and
nouveau to be using this API.
- Remove old or transitional hmm APIs. These are hold overs from the
past with no users, or APIs that existed only to manage cross tree
conflicts. There are still a few more of these cleanups that didn't
make the merge window cut off.
- Improve some core mm APIs:
- export alloc_pages_vma() for driver use
- refactor into devm_request_free_mem_region() to manage
DEVICE_PRIVATE resource reservations
- refactor duplicative driver code into the core dev_pagemap
struct
- Remove hmm wrappers of improved core mm APIs, instead have drivers
use the simplified API directly
- Remove DEVICE_PUBLIC
- Simplify the kconfig flow for the hmm users and core code"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits)
mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR
mm: remove the HMM config option
mm: sort out the DEVICE_PRIVATE Kconfig mess
mm: simplify ZONE_DEVICE page private data
mm: remove hmm_devmem_add
mm: remove hmm_vma_alloc_locked_page
nouveau: use devm_memremap_pages directly
nouveau: use alloc_page_vma directly
PCI/P2PDMA: use the dev_pagemap internal refcount
device-dax: use the dev_pagemap internal refcount
memremap: provide an optional internal refcount in struct dev_pagemap
memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag
memremap: remove the data field in struct dev_pagemap
memremap: add a migrate_to_ram method to struct dev_pagemap_ops
memremap: lift the devmap_enable manipulation into devm_memremap_pages
memremap: pass a struct dev_pagemap to ->kill and ->cleanup
memremap: move dev_pagemap callbacks into a separate structure
memremap: validate the pagemap type passed to devm_memremap_pages
mm: factor out a devm_request_free_mem_region helper
mm: export alloc_pages_vma
...
Notable changes:
- Removal of the NPU DMA code, used by the out-of-tree Nvidia driver, as well
as some other functions only used by drivers that haven't (yet?) made it
upstream.
- A fix for a bug in our handling of hardware watchpoints (eg. perf record -e
mem: ...) which could lead to register corruption and kernel crashes.
- Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for vmalloc
when using the Radix MMU.
- A large but incremental rewrite of our exception handling code to use gas
macros rather than multiple levels of nested CPP macros.
And the usual small fixes, cleanups and improvements.
Thanks to:
Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju
T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Cédric Le Goater,
Christian Lamparter, Christophe Leroy, Christophe Lombard, Christoph Hellwig,
Daniel Axtens, Denis Efremov, Enrico Weigelt, Frederic Barrat, Gautham R.
Shenoy, Geert Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg
Kurz, Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro
Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N. Rao,
Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi Bangoria,
Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher Boessenkool, Shaokun
Zhang, Shawn Anastasio, Stewart Smith, Suraj Jitindar Singh, Thiago Jung
Bauermann, YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdKVoLAAoJEFHr6jzI4aWA0kIP/A6shIbbE7H5W2hFrqt/PPPK
3+VrvPKbOFF+W6hcE/RgSZmEnUo0svdNjHUd/eMfFS1vb/uRt2QDdrsHUNNwURQL
M2mcLXFwYpnjSjb/XMgDbHpAQxjeGfTdYLonUIejN7Rk8KQUeLyKQ3SBn6kfMc46
DnUUcPcjuRGaETUmVuZZ4e40ZWbJp8PKDrSJOuUrTPXMaK5ciNbZk5mCWXGbYl6G
BMQAyv4ld/417rNTjBEP/T2foMJtioAt4W6mtlgdkOTdIEZnFU67nNxDBthNSu2c
95+I+/sML4KOp1R4yhqLSLIDDbc3bg3c99hLGij0d948z3bkSZ8bwnPaUuy70C4v
U8rvl/+N6C6H3DgSsPE/Gnkd8DnudqWY8nULc+8p3fXljGwww6/Qgt+6yCUn8BdW
WgixkSjKgjDmzTw8trIUNEqORrTVle7cM2hIyIK2Q5T4kWzNQxrLZ/x/3wgoYjUa
1KwIzaRo5JKZ9D3pJnJ5U+knE2/90rJIyfcp0W6ygyJsWKi2GNmq1eN3sKOw0IxH
Tg86RENIA/rEMErNOfP45sLteMuTR7of7peCG3yumIOZqsDVYAzerpvtSgip2cvK
aG+9HcYlBFOOOF9Dabi8GXsTBLXLfwiyjjLSpA9eXPwW8KObgiNfTZa7ujjTPvis
4mk9oukFTFUpfhsMmI3T
=3dBZ
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Notable changes:
- Removal of the NPU DMA code, used by the out-of-tree Nvidia driver,
as well as some other functions only used by drivers that haven't
(yet?) made it upstream.
- A fix for a bug in our handling of hardware watchpoints (eg. perf
record -e mem: ...) which could lead to register corruption and
kernel crashes.
- Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for
vmalloc when using the Radix MMU.
- A large but incremental rewrite of our exception handling code to
use gas macros rather than multiple levels of nested CPP macros.
And the usual small fixes, cleanups and improvements.
Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab,
Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann,
Athira Rajeev, Cédric Le Goater, Christian Lamparter, Christophe
Leroy, Christophe Lombard, Christoph Hellwig, Daniel Axtens, Denis
Efremov, Enrico Weigelt, Frederic Barrat, Gautham R. Shenoy, Geert
Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg Kurz,
Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro
Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N.
Rao, Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi
Bangoria, Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher
Boessenkool, Shaokun Zhang, Shawn Anastasio, Stewart Smith, Suraj
Jitindar Singh, Thiago Jung Bauermann, YueHaibing"
* tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (163 commits)
powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state.
powerpc/eeh: Handle hugepages in ioremap space
ocxl: Update for AFU descriptor template version 1.1
powerpc/boot: pass CONFIG options in a simpler and more robust way
powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h
powerpc/irq: Don't WARN continuously in arch_local_irq_restore()
powerpc/module64: Use symbolic instructions names.
powerpc/module32: Use symbolic instructions names.
powerpc: Move PPC_HA() PPC_HI() and PPC_LO() to ppc-opcode.h
powerpc/module64: Fix comment in R_PPC64_ENTRY handling
powerpc/boot: Add lzo support for uImage
powerpc/boot: Add lzma support for uImage
powerpc/boot: don't force gzipped uImage
powerpc/8xx: Add microcode patch to move SMC parameter RAM.
powerpc/8xx: Use IO accessors in microcode programming.
powerpc/8xx: replace #ifdefs by IS_ENABLED() in microcode.c
powerpc/8xx: refactor programming of microcode CPM params.
powerpc/8xx: refactor printing of microcode patch name.
powerpc/8xx: Refactor microcode write
powerpc/8xx: refactor writing of CPM microcode arrays
...
On most arches having function flush_dcache_range(), including PPC32,
this function does a writeback and invalidation of the cache bloc.
On PPC64, flush_dcache_range() only does a writeback while
flush_inval_dcache_range() does the invalidation in addition.
In addition it looks like within arch/powerpc/, there are no PPC64
platforms using flush_dcache_range()
This patch drops the existing 64 bits version of flush_dcache_range()
and renames flush_inval_dcache_range() into flush_dcache_range().
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add a flags field to struct dev_pagemap to replace the altmap_valid
boolean to be a little more extensible. Also add a pgmap_altmap() helper
to find the optional altmap and clean up the code using the altmap using
it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Seven fixes, all for bugs introduced this cycle.
The commit to add KASAN support broke booting on 32-bit SMP machines, due to a
refactoring that moved some setup out of the secondary CPU path.
A fix for another 32-bit SMP bug introduced by the fast syscall entry
implementation for 32-bit BOOKE. And a build fix for the same commit.
Our change to allow the DAWR to be force enabled on Power9 introduced a bug in
KVM, where we clobber r3 leading to a host crash.
The same commit also exposed a previously unreachable bug in the nested KVM
handling of DAWR, which could lead to an oops in a nested host.
One of the DMA reworks broke the b43legacy WiFi driver on some people's
powermacs, fix it by enabling a 30-bit ZONE_DMA on 32-bit.
A fix for TLB flushing in KVM introduced a new bug, as it neglected to also
flush the ERAT, this could lead to memory corruption in the guest.
Thanks to:
Aaro Koskinen, Christoph Hellwig, Christophe Leroy, Larry Finger, Michael
Neuling, Suraj Jitindar Singh.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdDg4YAAoJEFHr6jzI4aWACKYP/RG1cqYDjEWz0N9bxjAOanx6
z//hPZqZrObEORx0mek07LNj6JDy4eL7CB9WaEudJjHt7mYugLYq0g7hUMVvBWnB
irFEuzGJ8EgWl1aMbmz+fgf49PBIuroy2o/4pyzzQXoDaw44QyUaCke2VEBskQNG
RW64C2rDVrPgpRHzBB9EZVNv7svmo6ERJsEpRvqP3PZG1ZxgXW+DXbEdSmJCcgAt
8oI+z6frRv+0ez+nge7TULo8DuheShfxc7l0jFrd48i35v2qB/IowPr8cof9fRwM
TqnB+3dZXHPKPz6J9mz80p9ZDe1omLzg6i9EiR2/7a3XGpRBo7kCg3Iri7N5pu0j
LotK9l1+mXWLy5P6lOHH5/tEHv52Wqsvh5IetpNJ2tgXp3MzbOc1/Ut9h7Ag7cRw
WRa7tNXQ5Ud8uPM1Pds8Ymhd+/nZ9RItjGcu6S095/OGpM1FJR9a0QnfUHMyfyuX
kAGrJDWcAkCd/Q9tKHsQotuZAFmRCQe4JFkzTiGzwdjYWYgtTA1c/eIv3+SG7eLV
1dsaIYzIS56b+Qz2Qc/pKHwho+I9o505Y7LFXxlCGXDDjyI72ioTQDwiSBzaZdc9
ORwNchLfpXNpiNXRoRqAnqmhWxavYmA6oJ13RDBiMBxIUWHynVbEzLlX9fPNdBFj
Kw3Zd15znokXBzU+1mDE
=Ju1y
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"This is a frustratingly large batch at rc5. Some of these were sent
earlier but were missed by me due to being distracted by other things,
and some took a while to track down due to needing manual bisection on
old hardware. But still we clearly need to improve our testing of KVM,
and of 32-bit, so that we catch these earlier.
Summary: seven fixes, all for bugs introduced this cycle.
- The commit to add KASAN support broke booting on 32-bit SMP
machines, due to a refactoring that moved some setup out of the
secondary CPU path.
- A fix for another 32-bit SMP bug introduced by the fast syscall
entry implementation for 32-bit BOOKE. And a build fix for the same
commit.
- Our change to allow the DAWR to be force enabled on Power9
introduced a bug in KVM, where we clobber r3 leading to a host
crash.
- The same commit also exposed a previously unreachable bug in the
nested KVM handling of DAWR, which could lead to an oops in a
nested host.
- One of the DMA reworks broke the b43legacy WiFi driver on some
people's powermacs, fix it by enabling a 30-bit ZONE_DMA on 32-bit.
- A fix for TLB flushing in KVM introduced a new bug, as it neglected
to also flush the ERAT, this could lead to memory corruption in the
guest.
Thanks to: Aaro Koskinen, Christoph Hellwig, Christophe Leroy, Larry
Finger, Michael Neuling, Suraj Jitindar Singh"
* tag 'powerpc-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac
KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real mode
KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()
powerpc/32: fix build failure on book3e with KVM
powerpc/booke: fix fast syscall entry on SMP
powerpc/32s: fix initial setup of segment registers on secondary CPU
With the strict dma mask checking introduced with the switch to
the generic DMA direct code common wifi chips on 32-bit powerbooks
stopped working. Add a 30-bit ZONE_DMA to the 32-bit pmac builds
to allow them to reliably allocate dma coherent memory.
Fixes: 65a21b71f9 ("powerpc/dma: remove dma_nommu_dma_supported")
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All callers of arch_remove_memory() ignore errors. And we should really
try to remove any errors from the memory removal path. No more errors are
reported from __remove_pages(). BUG() in s390x code in case
arch_remove_memory() is triggered. We may implement that properly later.
WARN in case powerpc code failed to remove the section mapping, which is
better than ignoring the error completely right now.
Link: http://lkml.kernel.org/r/20190409100148.24703-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch_add_memory, __add_pages take a want_memblock which controls whether
the newly added memory should get the sysfs memblock user API (e.g.
ZONE_DEVICE users do not want/need this interface). Some callers even
want to control where do we allocate the memmap from by configuring
altmap.
Add a more generic hotplug context for arch_add_memory and __add_pages.
struct mhp_restrictions contains flags which contains additional features
to be enabled by the memory hotplug (MHP_MEMBLOCK_API currently) and
altmap for alternative memmap allocator.
This patch shouldn't introduce any functional change.
[akpm@linux-foundation.org: build fix]
Link: http://lkml.kernel.org/r/20190408082633.2864-3-osalvador@suse.de
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For most architectures free_initrd_mem just expands to the same
free_reserved_area call. Provide that as a generic implementation marked
__weak.
Link: http://lkml.kernel.org/r/20190213174621.29297-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Steven Price <steven.price@arm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch prepares a shadow area for KASAN.
The shadow area will be at the top of the kernel virtual
memory space above the fixmap area and will occupy one
eighth of the total kernel virtual memory space.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch make inclusion of mmu_decl.h independant of the location
of the file including it.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch fix the below section mismatch warnings.
WARNING: vmlinux.o(.text+0x2d1f44): Section mismatch in reference from the function devm_memremap_pages_release() to the function .meminit.text:arch_remove_memory()
WARNING: vmlinux.o(.text+0x2d265c): Section mismatch in reference from the function devm_memremap_pages() to the function .meminit.text:arch_add_memory()
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
resize_hpt_for_hotplug() reports a warning when it cannot
resize the hash page table ("Unable to resize hash page
table to target order") but in some cases it's not a problem
and can make user thinks something has not worked properly.
This patch moves the warning to arch_remove_memory() to
only report the problem when it is needed.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The generic code allows a few nice things such as node local allocations
and dipping into the CMA area. The lookup of the right zone for a given
dma mask works a little different, but the results should be the same.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Since commit c40dd2f766 ("powerpc: Add System RAM to /proc/iomem")
it is possible to use the generic walk_system_ram_range() and
the generic page_is_ram().
To enable the use of walk_system_ram_range() by the IBM EHEA ethernet
driver, we still need an export of the generic function.
As powerpc was the only user of CONFIG_ARCH_HAS_WALK_MEMORY, the
ifdef around the generic walk_system_ram_range() has become useless
and can be dropped.
Fixes: c40dd2f766 ("powerpc: Add System RAM to /proc/iomem")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Keep the EXPORT_SYMBOL_GPL in powerpc code]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Patch series "Do not touch pages in hot-remove path", v2.
This patchset aims for two things:
1) A better definition about offline and hot-remove stage
2) Solving bugs where we can access non-initialized pages
during hot-remove operations [2] [3].
This is achieved by moving all page/zone handling to the offline
stage, so we do not need to access pages when hot-removing memory.
[1] https://patchwork.kernel.org/cover/10691415/
[2] https://patchwork.kernel.org/patch/10547445/
[3] https://www.spinics.net/lists/linux-mm/msg161316.html
This patch (of 5):
This is a preparation for the following-up patches. The idea of passing
the nid is that it will allow us to get rid of the zone parameter
afterwards.
Link: http://lkml.kernel.org/r/20181127162005.15833-2-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Powerpc has somewhat odd usage where ZONE_DMA is used for all memory on
common 64-bit configfs, and ZONE_DMA32 is used for 31-bit schemes.
Move to a scheme closer to what other architectures use (and I dare to
say the intent of the system):
- ZONE_DMA: optionally for memory < 31-bit (64-bit embedded only)
- ZONE_NORMAL: everything addressable by the kernel
- ZONE_HIGHMEM: memory > 32-bit for 32-bit kernels
Also provide information on how ZONE_DMA is used by defining
ARCH_ZONE_DMA_BITS.
Contains various fixes from Benjamin Herrenschmidt.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Today we have:
config PPC_BOOK3S
def_bool y
depends on PPC_BOOK3S_32 || PPC_BOOK3S_64
config PPC_STD_MMU
def_bool y
depends on PPC_BOOK3S
PPC_STD_MMU is therefore redundant with PPC_BOOK3S. Lets remove it.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.
The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>
@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>
[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The 'access' parameter of hash_preload() is either 0 or _PAGE_EXEC.
Among the two versions of hash_preload(), only the PPC64 one is
doing something with this 'access' parameter.
In order to remove the use of _PAGE_EXEC outside platform code,
'access' parameter is replaced by 'is_exec' which will be either
true of false, and the PPC64 version of hash_preload() creates
the access flag based on 'is_exec'.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In order to avoid multiple conversions, handover directly a
pgprot_t to map_kernel_page() as already done for radix.
Do the same for __ioremap_caller() and __ioremap_at().
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This stops us from doing code patching in init sections after they've
been freed.
In this chain:
kvm_guest_init() ->
kvm_use_magic_page() ->
fault_in_pages_readable() ->
__get_user() ->
__get_user_nocheck() ->
barrier_nospec();
We have a code patching location at barrier_nospec() and
kvm_guest_init() is an init function. This whole chain gets inlined,
so when we free the init section (hence kvm_guest_init()), this code
goes away and hence should no longer be patched.
We seen this as userspace memory corruption when using a memory
checker while doing partition migration testing on powervm (this
starts the code patching post migration via
/sys/kernel/mobility/migration). In theory, it could also happen when
using /sys/kernel/debug/powerpc/barrier_nospec.
Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With PHYS_ADDR_MAX there is now a type safe variant for all bits set.
Make use of it.
Patch created using a semantic patch as follows:
// <smpl>
@@
typedef phys_addr_t;
@@
-(phys_addr_t)ULLONG_MAX
+PHYS_ADDR_MAX
// </smpl>
Link: http://lkml.kernel.org/r/20180419214204.19322-1-stefan@agner.ch
Signed-off-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prefetch the faulting address in update_mmu_cache to give the page
table walker perhaps 100 cycles head start as locks are dropped and
the interrupt completed.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch adds support for flushing potentially dirty cache lines
when memory is hot-plugged/hot-un-plugged. The support is currently
limited to 64 bit systems.
The bug was exposed when mappings for a device were actually
hot-unplugged and plugged in back later. A similar issue was observed
during the development of memtrace, but memtrace does it's own
flushing of region via a custom routine.
These patches do a flush both on hotplug/unplug to clear any stale
data in the cache w.r.t mappings, there is a small race window where a
clean cache line may be created again just prior to tearing down the
mapping.
The patches were tested by disabling the flush routines in memtrace
and doing I/O on the trace file. The system immediately
checkstops (quite reliablly if prior to the hot-unplug of the memtrace
region, we memset the regions we are about to hot unplug). After these
patches no custom flushing is needed in the memtrace code.
Fixes: 9d5171a8f2 ("powerpc/powernv: Enable removal of memory for in memory tracing")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
To support accurate checking for different blocks of memory on PPC32,
use the same memblock-based approach that's already used on PPC64 also
on PPC32.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Instead of open-coding the search in page_is_ram, call memblock_is_memory.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Bring in yet another series that touches KVM code, and might need to
be merged into the kvm-ppc branch to resolve conflicts.
This required some changes in pnv_power9_force_smt4_catch/release()
due to the paca array becomming an array of pointers.
Split sparsemem initialisation from basic numa topology discovery.
Move the parsing earlier in boot, before pacas are allocated.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix the warning messages for stop_machine_change_mapping(), and a number
of other affected functions in its call chain.
All modified functions are under CONFIG_MEMORY_HOTPLUG, so __meminit
is okay (keeps them / does not discard them).
Boot-tested on powernv/power9/radix-mmu and pseries/power8/hash-mmu.
$ make -j$(nproc) CONFIG_DEBUG_SECTION_MISMATCH=y vmlinux
...
MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x6b130): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping()
The function stop_machine_change_mapping() references
the function __meminit create_physical_mapping().
This is often because stop_machine_change_mapping lacks a __meminit
annotation or the annotation of create_physical_mapping is wrong.
WARNING: vmlinux.o(.text+0x6b13c): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping()
The function stop_machine_change_mapping() references
the function __meminit create_physical_mapping().
This is often because stop_machine_change_mapping lacks a __meminit
annotation or the annotation of create_physical_mapping is wrong.
...
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Require struct page by default for filesystem DAX to remove a number of
surprising failure cases. This includes failures with direct I/O, gdb and
fork(2).
* Add support for the new Platform Capabilities Structure added to the NFIT in
ACPI 6.2a. This new table tells us whether the platform supports flushing
of CPU and memory controller caches on unexpected power loss events.
* Revamp vmem_altmap and dev_pagemap handling to clean up code and better
support future future PCI P2P uses.
* Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has become
out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL spec, and
instead rely on the generic ND_CMD_CALL approach used by the two other IOCTL
families, NVDIMM_FAMILY_{HPE,MSFT}.
* Enhance nfit_test so we can test some of the new things added in version 1.6
of the DSM specification. This includes testing firmware download and
simulating the Last Shutdown State (LSS) status.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaeOg0AAoJEJ/BjXdf9fLBAFoQAI/IgcgJ2h9lfEpgjBRTC44t
2p8dxwT1Ofw3Y1aR/tI8nYRXjRtAGuP4UIeRVnb1CL/N7PagJyoMGU+6hmzg+ptY
c7cEDvw6nZOhrFwXx/xn7R53sYG8zH+UE6+jTR/PP/G4mQJfFCg4iF9R72Y7z0n7
aurf82Kz137NPUy6dNr4V9bmPMJWAaOci9WOj5SKddR5ZSNbjoxylTwQRvre5y4r
7HQTScEkirABOdSf1JoXTSUXCH/RC9UFFXR03ScHstGb1HjCj3KdcicVc50Q++Ub
qsEudhE6i44PEW1Hh4Qkg6hjHMEa8qHP+ShBuRuVaUmlghYTQn66niJAYLZilwdz
EVjE7vR+toHA5g3YCalEmYVutUEhIDkh/xfpd7vM6ZorUGJy95a2elEJs2fHBffC
gEhnCip7FROPcK5RDNUM8hBgnG/q5wwWPQMKY+6rKDZQx3mXssCrKp2Vlx7kBwMG
rpblkEpYjPonbLEHxsSU8yTg9Uq55ciIWgnOToffcjZvjbihi8WUVlHcwHUMPf/o
DWElg+4qmG0Sdd4S2NeAGwTl1Ewrf2RrtUGMjHtH4OUFs1wo6ZmfrxFzzMfoZ1Od
ko/s65v4uwtTzECh2o+XQaNsReR5YETXxmA40N/Jpo7/7twABIoZ/ASvj/3ZBYj+
sie+u2rTod8/gQWSfHpJ
=MIMX
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Ross Zwisler:
- Require struct page by default for filesystem DAX to remove a number
of surprising failure cases. This includes failures with direct I/O,
gdb and fork(2).
- Add support for the new Platform Capabilities Structure added to the
NFIT in ACPI 6.2a. This new table tells us whether the platform
supports flushing of CPU and memory controller caches on unexpected
power loss events.
- Revamp vmem_altmap and dev_pagemap handling to clean up code and
better support future future PCI P2P uses.
- Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has
become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL
spec, and instead rely on the generic ND_CMD_CALL approach used by
the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}.
- Enhance nfit_test so we can test some of the new things added in
version 1.6 of the DSM specification. This includes testing firmware
download and simulating the Last Shutdown State (LSS) status.
* tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits)
libnvdimm, namespace: remove redundant initialization of 'nd_mapping'
acpi, nfit: fix register dimm error handling
libnvdimm, namespace: make min namespace size 4K
tools/testing/nvdimm: force nfit_test to depend on instrumented modules
libnvdimm/nfit_test: adding support for unit testing enable LSS status
libnvdimm/nfit_test: add firmware download emulation
nfit-test: Add platform cap support from ACPI 6.2a to test
libnvdimm: expose platform persistence attribute for nd_region
acpi: nfit: add persistent memory control flag for nd_region
acpi: nfit: Add support for detect platform CPU cache flush on power loss
device-dax: Fix trailing semicolon
libnvdimm, btt: fix uninitialized err_lock
dax: require 'struct page' by default for filesystem dax
ext2: auto disable dax instead of failing mount
ext4: auto disable dax instead of failing mount
mm, dax: introduce pfn_t_special()
mm: Fix devm_memremap_pages() collision handling
mm: Fix memory size alignment in devm_memremap_pages_release()
memremap: merge find_dev_pagemap into get_dev_pagemap
memremap: change devm_memremap_pages interface to use struct dev_pagemap
...
We can just pass this on instead of having to do a radix tree lookup
without proper locking 2 levels into the callchain.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We can just pass this on instead of having to do a radix tree lookup
without proper locking 2 levels into the callchain.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
At some point, pr_warning will be removed so all logging messages use
a consistent <prefix>_warn style.
Update arch/powerpc/
Miscellanea:
o Coalesce formats
o Realign arguments
o Use %s, __func__ instead of embedded function names
o Remove unnecessary line continuations
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Geoff Levand <geoff@infradead.org>
[mpe: Rebase due to some %pOF changes.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Two config options exist to define powerpc MPC8xx:
* CONFIG_PPC_8xx
* CONFIG_8xx
arch/powerpc/platforms/Kconfig.cputype has contained the following
comment about CONFIG_8xx item for some years:
"# this is temp to handle compat with arch=ppc"
arch/powerpc is now the only place with remaining use of
CONFIG_8xx: get rid of them.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently even with STRICT_KERNEL_RWX we leave the __init text marked
executable after init, which is bad.
Add a hook to mark it NX (no-execute) before we free it, and implement
it for radix and hash.
Note that we use __init_end as the end address, not _einittext,
because overlaps_kernel_text() uses __init_end, because there are
additional executable sections other than .init.text between
__init_begin and __init_end.
Tested on radix and hash with:
0:mon> p $__init_begin
*** 400 exception occurred
Fixes: 1e0fc9d1eb ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Highlights include:
- Support for STRICT_KERNEL_RWX on 64-bit server CPUs.
- Platform support for FSP2 (476fpe) board
- Enable ZONE_DEVICE on 64-bit server CPUs.
- Generic & powerpc spin loop primitives to optimise busy waiting
- Convert VDSO update function to use new update_vsyscall() interface
- Optimisations to hypercall/syscall/context-switch paths
- Improvements to the CPU idle code on Power8 and Power9.
As well as many other fixes and improvements.
Thanks to:
Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman Khandual, Anton
Blanchard, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
Lombard, Colin Ian King, Dan Carpenter, Gautham R. Shenoy, Hari Bathini, Ian
Munsie, Ivan Mikhaylov, Javier Martinez Canillas, Madhavan Srinivasan,
Masahiro Yamada, Matt Brown, Michael Neuling, Michal Suchanek, Murilo
Opsfelder Araujo, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul
Mackerras, Pavel Machek, Russell Currey, Santosh Sivaraj, Stephen Rothwell,
Thiago Jung Bauermann, Yang Li.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZXyPCAAoJEFHr6jzI4aWAI9QQAISf2x5y//cqCi4ISyQB5pTq
KLS/yQajNkQOw7c0fzBZOaH5Xd/SJ6AcKWDg8yDlpDR3+sRRsr98iIRECgKS5I7/
DxD9ywcbSoMXFQQo1ZMCp5CeuMUIJRtugBnUQM+JXCSUCPbznCY5DchDTLyTBTpO
MeMVhI//JxthhoOMA9MudiEGaYCU9ho442Z4OJUSiLUv8WRbvQX9pTqoc4vx1fxA
BWf2mflztBVcIfKIyxIIIlDLukkMzix6gSYPMCbC7lzkbnU7JSqKiheJXjo1gJS2
ePHKDxeNR2/QP0g/j3aT/MR1uTt9MaNBSX3gANE1xQ9OoJ8m1sOtCO4gNbSdLWae
eXhDnoiEp930DRZOeEioOItuWWoxFaMyYk3BMmRKV4QNdYL3y3TRVeL2/XmRGqYL
Lxz4IY/x5TteFEJNGcRX90uizNSi8AaEXPF16pUk8Ctt6eH3ZSwPMv2fHeYVCMr0
KFlKHyaPEKEoztyzLcUR6u9QB56yxDN58bvLpd32AeHvKhqyxFoySy59x9bZbatn
B2y8mmDItg860e0tIG6jrtplpOVvL8i5jla5RWFVoQDuxxrLAds3vG9JZQs+eRzx
Fiic93bqeUAS6RzhXbJ6QQJYIyhE7yqpcgv7ME1W87SPef3HPBk9xlp3yIDwdA2z
bBDvrRnvupusz8qCWrxe
=w8rj
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights include:
- Support for STRICT_KERNEL_RWX on 64-bit server CPUs.
- Platform support for FSP2 (476fpe) board
- Enable ZONE_DEVICE on 64-bit server CPUs.
- Generic & powerpc spin loop primitives to optimise busy waiting
- Convert VDSO update function to use new update_vsyscall() interface
- Optimisations to hypercall/syscall/context-switch paths
- Improvements to the CPU idle code on Power8 and Power9.
As well as many other fixes and improvements.
Thanks to: Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman
Khandual, Anton Blanchard, Balbir Singh, Benjamin Herrenschmidt,
Christophe Leroy, Christophe Lombard, Colin Ian King, Dan Carpenter,
Gautham R. Shenoy, Hari Bathini, Ian Munsie, Ivan Mikhaylov, Javier
Martinez Canillas, Madhavan Srinivasan, Masahiro Yamada, Matt Brown,
Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Naveen N.
Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pavel Machek,
Russell Currey, Santosh Sivaraj, Stephen Rothwell, Thiago Jung
Bauermann, Yang Li"
* tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (158 commits)
powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs
powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix
powerpc/mm/hash: Implement mark_rodata_ro() for hash
powerpc/vmlinux.lds: Align __init_begin to 16M
powerpc/lib/code-patching: Use alternate map for patch_instruction()
powerpc/xmon: Add patch_instruction() support for xmon
powerpc/kprobes/optprobes: Use patch_instruction()
powerpc/kprobes: Move kprobes over to patch_instruction()
powerpc/mm/radix: Fix execute permissions for interrupt_vectors
powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp()
powerpc/64s: Blacklist rtas entry/exit from kprobes
powerpc/64s: Blacklist functions invoked on a trap
powerpc/64s: Un-blacklist system_call() from kprobes
powerpc/64s: Move system_call() symbol to just after setting MSR_EE
powerpc/64s: Blacklist system_call() and system_call_common() from kprobes
powerpc/64s: Convert .L__replay_interrupt_return to a local label
powerpc64/elfv1: Only dereference function descriptor for non-text symbols
cxl: Export library to support IBM XSL
powerpc/dts: Use #include "..." to include local DT
powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8
...
arch_add_memory gets for_device argument which then controls whether we
want to create memblocks for created memory sections. Simplify the
logic by telling whether we want memblocks directly rather than going
through pointless negation. This also makes the api easier to
understand because it is clear what we want rather than nothing telling
for_device which can mean anything.
This shouldn't introduce any functional change.
Link: http://lkml.kernel.org/r/20170515085827.16474-13-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current memory hotplug implementation relies on having all the
struct pages associate with a zone/node during the physical hotplug
phase (arch_add_memory->__add_pages->__add_section->__add_zone). In the
vast majority of cases this means that they are added to ZONE_NORMAL.
This has been so since 9d99aaa31f ("[PATCH] x86_64: Support memory
hotadd without sparsemem") and it wasn't a big deal back then because
movable onlining didn't exist yet.
Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
onlining 511c2aba8f ("mm, memory-hotplug: dynamic configure movable
memory and portion memory") and then things got more complicated.
Rather than reconsidering the zone association which was no longer
needed (because the memory hotplug already depended on SPARSEMEM) a
convoluted semantic of zone shifting has been developed. Only the
currently last memblock or the one adjacent to the zone_movable can be
onlined movable. This essentially means that the online type changes as
the new memblocks are added.
Let's simulate memory hot online manually
$ echo 0x100000000 > /sys/devices/system/memory/probe
$ grep . /sys/devices/system/memory/memory32/valid_zones
Normal Movable
$ echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe
$ grep . /sys/devices/system/memory/memory3?/valid_zones
/sys/devices/system/memory/memory32/valid_zones:Normal
/sys/devices/system/memory/memory33/valid_zones:Normal Movable
$ echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe
$ grep . /sys/devices/system/memory/memory3?/valid_zones
/sys/devices/system/memory/memory32/valid_zones:Normal
/sys/devices/system/memory/memory33/valid_zones:Normal
/sys/devices/system/memory/memory34/valid_zones:Normal Movable
$ echo online_movable > /sys/devices/system/memory/memory34/state
$ grep . /sys/devices/system/memory/memory3?/valid_zones
/sys/devices/system/memory/memory32/valid_zones:Normal
/sys/devices/system/memory/memory33/valid_zones:Normal Movable
/sys/devices/system/memory/memory34/valid_zones:Movable Normal
This is an awkward semantic because an udev event is sent as soon as the
block is onlined and an udev handler might want to online it based on
some policy (e.g. association with a node) but it will inherently race
with new blocks showing up.
This patch changes the physical online phase to not associate pages with
any zone at all. All the pages are just marked reserved and wait for
the onlining phase to be associated with the zone as per the online
request. There are only two requirements
- existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap
- ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses
the latter one is not an inherent requirement and can be changed in the
future. It preserves the current behavior and made the code slightly
simpler. This is subject to change in future.
This means that the same physical online steps as above will lead to the
following state: Normal Movable
/sys/devices/system/memory/memory32/valid_zones:Normal Movable
/sys/devices/system/memory/memory33/valid_zones:Normal Movable
/sys/devices/system/memory/memory32/valid_zones:Normal Movable
/sys/devices/system/memory/memory33/valid_zones:Normal Movable
/sys/devices/system/memory/memory34/valid_zones:Normal Movable
/sys/devices/system/memory/memory32/valid_zones:Normal Movable
/sys/devices/system/memory/memory33/valid_zones:Normal Movable
/sys/devices/system/memory/memory34/valid_zones:Movable
Implementation:
The current move_pfn_range is reimplemented to check the above
requirements (allow_online_pfn_range) and then updates the respective
zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
pfn range with the zone/node. __add_pages is updated to not require the
zone and only initializes sections in the range. This allowed to
simplify the arch_add_memory code (s390 could get rid of quite some of
code).
devm_memremap_pages is the only user of arch_add_memory which relies on
the zone association because it only hooks into the memory hotplug only
half way. It uses it to associate the new memory with ZONE_DEVICE but
doesn't allow it to be {on,off}lined via sysfs. This means that this
particular code path has to call move_pfn_range_to_zone explicitly.
The original zone shifting code is kept in place and will be removed in
the follow up patch for an easier review.
Please note that this patch also changes the original behavior when
offlining a memory block adjacent to another zone (Normal vs. Movable)
used to allow to change its movable type. This will be handled later.
[richard.weiyang@gmail.com: simplify zone_intersects()]
Link: http://lkml.kernel.org/r/20170616092335.5177-1-richard.weiyang@gmail.com
[richard.weiyang@gmail.com: remove duplicate call for set_page_links]
Link: http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiyang@gmail.com
[akpm@linux-foundation.org: remove unused local `i']
Link: http://lkml.kernel.org/r/20170515085827.16474-12-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # For s390 bits
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Device memory hotplug hooks into regular memory hotplug only half way.
It needs memory sections to track struct pages but there is no
need/desire to associate those sections with memory blocks and export
them to the userspace via sysfs because they cannot be onlined anyway.
This is currently expressed by for_device argument to arch_add_memory
which then makes sure to associate the given memory range with
ZONE_DEVICE. register_new_memory then relies on is_zone_device_section
to distinguish special memory hotplug from the regular one. While this
works now, later patches in this series want to move __add_zone outside
of arch_add_memory path so we have to come up with something else.
Add want_memblock down the __add_pages path and use it to control
whether the section->memblock association should be done.
arch_add_memory then just trivially want memblock for everything but
for_device hotplug.
remove_memory_section doesn't need is_zone_device_section either. We
can simply skip all the memblock specific cleanup if there is no
memblock for the given section.
This shouldn't introduce any functional change.
Link: http://lkml.kernel.org/r/20170515085827.16474-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adds support to powerpc for the altmap feature of ZONE_DEVICE memory. An
altmap is a driver provided region that is used to provide the backing
storage for the struct pages of ZONE_DEVICE memory. In situations where
large amount of ZONE_DEVICE memory is being added to the system the
altmap reduces pressure on main system memory by allowing the mm/
metadata to be stored on the device itself rather in main memory.
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
These two functions implement the same semantics, so unify their naming so we
can share code that calls them. The longer name is more descriptive so use it.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We've now implemented code in the pseries platform to use the new PAPR
interface to allow resizing the hash page table (HPT) at runtime.
This patch uses that interface to automatically attempt to resize the HPT
when memory is hot added or removed. This tries to always keep the HPT at
a reasonable size for our current memory size.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For memory hotplug to work, the MMU code needs to provide the functions
create_section_mapping() and remove_section_mapping() to respectively
map and unmap portions of the linear mapping.
At the moment only hash64 provides these, so we provide weak stubs that
just error out. This fixes the build with configurations such as 64-bit
BookE with CONFIG_MEMORY_HOTPLUG enabled.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The mm zone mechanism was traditionally used by arch specific code to
partition memory into allocation zones. However there are several zones
that are managed by the mm subsystem rather than the architecture. Most
architectures set the max PFN of these special zones to zero, however on
powerpc we set them to ~0ul. This, in conjunction with a bug in
free_area_init_nodes() results in all of system memory being placed in
ZONE_DEVICE when enabled. Device memory cannot be used for regular kernel
memory allocations so this will cause a kernel panic at boot. Given the
planned addition of more mm managed zones (ZONE_CMA) we should aim to be
consistent with every other architecture and set the max PFN for these
zones to zero.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The function is used to update the MMU with software PTE. It can
be called by data access exception handler (0x300) or instruction
access exception handler (0x400). If the function is called by
0x400 handler, the local variable @access is set to _PAGE_EXEC
to indicate the software PTE should have that flag set. When the
function is called by 0x300 handler, @access is set to zero.
This improves the readability of the function by replacing if
statements with switch. No logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The zone that contains the top of memory will be either ZONE_NORMAL
or ZONE_HIGHMEM depending on the kernel config. There are two functions
that require this information and both of them use an #ifdef to set
a local variable (top_zone). This is a little silly so lets just make it
a constant.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Cc: linux-mm@kvack.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With radix there is no MMU cache. Hence we don't need to do anything in
update_mmu_cache().
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Highlights:
- Restructure Linux PTE on Book3S/64 to Radix format from Paul Mackerras
- Book3s 64 MMU cleanup in preparation for Radix MMU from Aneesh Kumar K.V
- Add POWER9 cputable entry from Michael Neuling
- FPU/Altivec/VSX save/restore optimisations from Cyril Bur
- Add support for new ftrace ABI on ppc64le from Torsten Duwe
Various cleanups & minor fixes from:
- Adam Buchbinder, Andrew Donnellan, Balbir Singh, Christophe Leroy, Cyril
Bur, Luis Henriques, Madhavan Srinivasan, Pan Xinhui, Russell Currey,
Sukadev Bhattiprolu, Suraj Jitindar Singh.
General:
- atomics: Allow architectures to define their own __atomic_op_* helpers from
Boqun Feng
- Implement atomic{, 64}_*_return_* variants and acquire/release/relaxed
variants for (cmp)xchg from Boqun Feng
- Add powernv_defconfig from Jeremy Kerr
- Fix BUG_ON() reporting in real mode from Balbir Singh
- Add xmon command to dump OPAL msglog from Andrew Donnellan
- Add xmon command to dump process/task similar to ps(1) from Douglas Miller
- Clean up memory hotplug failure paths from David Gibson
pci/eeh:
- Redesign SR-IOV on PowerNV to give absolute isolation between VFs from Wei
Yang.
- EEH Support for SRIOV VFs from Wei Yang and Gavin Shan.
- PCI/IOV: Rename and export virtfn_{add, remove} from Wei Yang
- PCI: Add pcibios_bus_add_device() weak function from Wei Yang
- MAINTAINERS: Update EEH details and maintainership from Russell Currey
cxl:
- Support added to the CXL driver for running on both bare-metal and
hypervisor systems, from Christophe Lombard and Frederic Barrat.
- Ignore probes for virtual afu pci devices from Vaibhav Jain
perf:
- Export Power8 generic and cache events to sysfs from Sukadev Bhattiprolu
- hv-24x7: Fix usage with chip events, display change in counter values,
display domain indices in sysfs, eliminate domain suffix in event names,
from Sukadev Bhattiprolu
Freescale:
- Updates from Scott: "Highlights include 8xx optimizations, 32-bit checksum
optimizations, 86xx consolidation, e5500/e6500 cpu hotplug, more fman and
other dt bits, and minor fixes/cleanup."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW69OrAAoJEFHr6jzI4aWAe5EQAJw/hE6WBQc6a7Tj70AnXOqR
qk/m5pZjuTwQxfBteIvHR1pE5eXdlvtAjcD254LVkFkAbIn19W/h2k0VX/nlee7P
n/VRHRifjtGmukqHrPYJJ7ua9mNlY7pxh3leGSixBFASnSWqMxNNNziNQtSTcuCs
TjHiw6NkZ/kzeunA4bAfE4yHVUZjmL74oiS9JbLyaVHqoW4fqWLlh26AKo2yYMZI
qPicBBG4HBi3FGvoexnKxlJNdcV4HO7LzDjJmCSfUKYCJi+Pw19T5qmhso0q0qVz
vHg/A8HNeG4Hn83pNVmLeQSAIQRZ3DvTtcLgbjPo+TVwm/hzrRRBWipTeOVbkLW8
2bcOXT4t7LWUq15EAJ1LYgYZGzcLrfRfUeOcuQ1TWd3+PcfY9pE7FmizsxAAfaVe
E9j9mpz4XnIqBtWkFHneTIHkQ5OWptyKuZJEaYH0nut4VsP0k8NarkseafGqBPu7
5eG83gbiQbCVixfOgblV9eocJ29JcwpjPAY4CZSGJimShg909FV7WRgZgJkKWrbK
dBRco8Jcp4VglGfo2qymv7Uj4KwQoypBREOhiKUvrAsVlDxPfx+bcskhjGu9xGDC
xs/+nme0/lKa/wg5K4C3mQ1GAlkMWHI0ojhJjsyODbetup5UbkEu03wjAaTdO9dT
Y6ptGm0rYAJluPNlziFj
=qkAt
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"This was delayed a day or two by some build-breakage on old toolchains
which we've now fixed.
There's two PCI commits both acked by Bjorn.
There's one commit to mm/hugepage.c which is (co)authored by Kirill.
Highlights:
- Restructure Linux PTE on Book3S/64 to Radix format from Paul
Mackerras
- Book3s 64 MMU cleanup in preparation for Radix MMU from Aneesh
Kumar K.V
- Add POWER9 cputable entry from Michael Neuling
- FPU/Altivec/VSX save/restore optimisations from Cyril Bur
- Add support for new ftrace ABI on ppc64le from Torsten Duwe
Various cleanups & minor fixes from:
- Adam Buchbinder, Andrew Donnellan, Balbir Singh, Christophe Leroy,
Cyril Bur, Luis Henriques, Madhavan Srinivasan, Pan Xinhui, Russell
Currey, Sukadev Bhattiprolu, Suraj Jitindar Singh.
General:
- atomics: Allow architectures to define their own __atomic_op_*
helpers from Boqun Feng
- Implement atomic{, 64}_*_return_* variants and acquire/release/
relaxed variants for (cmp)xchg from Boqun Feng
- Add powernv_defconfig from Jeremy Kerr
- Fix BUG_ON() reporting in real mode from Balbir Singh
- Add xmon command to dump OPAL msglog from Andrew Donnellan
- Add xmon command to dump process/task similar to ps(1) from Douglas
Miller
- Clean up memory hotplug failure paths from David Gibson
pci/eeh:
- Redesign SR-IOV on PowerNV to give absolute isolation between VFs
from Wei Yang.
- EEH Support for SRIOV VFs from Wei Yang and Gavin Shan.
- PCI/IOV: Rename and export virtfn_{add, remove} from Wei Yang
- PCI: Add pcibios_bus_add_device() weak function from Wei Yang
- MAINTAINERS: Update EEH details and maintainership from Russell
Currey
cxl:
- Support added to the CXL driver for running on both bare-metal and
hypervisor systems, from Christophe Lombard and Frederic Barrat.
- Ignore probes for virtual afu pci devices from Vaibhav Jain
perf:
- Export Power8 generic and cache events to sysfs from Sukadev
Bhattiprolu
- hv-24x7: Fix usage with chip events, display change in counter
values, display domain indices in sysfs, eliminate domain suffix in
event names, from Sukadev Bhattiprolu
Freescale:
- Updates from Scott: "Highlights include 8xx optimizations, 32-bit
checksum optimizations, 86xx consolidation, e5500/e6500 cpu
hotplug, more fman and other dt bits, and minor fixes/cleanup"
* tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits)
powerpc: Fix unrecoverable SLB miss during restore_math()
powerpc/8xx: Fix do_mtspr_cpu6() build on older compilers
powerpc/rcpm: Fix build break when SMP=n
powerpc/book3e-64: Use hardcoded mttmr opcode
powerpc/fsl/dts: Add "jedec,spi-nor" flash compatible
powerpc/T104xRDB: add tdm riser card node to device tree
powerpc32: PAGE_EXEC required for inittext
powerpc/mpc85xx: Add pcsphy nodes to FManV3 device tree
powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)
powerpc/86xx: Introduce and use common dtsi
powerpc/86xx: Update device tree
powerpc/86xx: Move dts files to fsl directory
powerpc/86xx: Switch to kconfig fragments approach
powerpc/86xx: Update defconfigs
powerpc/86xx: Consolidate common platform code
powerpc32: Remove one insn in mulhdu
powerpc32: small optimisation in flush_icache_range()
powerpc: Simplify test in __dma_sync()
powerpc32: move xxxxx_dcache_range() functions inline
powerpc32: Remove clear_pages() and define clear_page() inline
...
This makes a number of cleanups to handling of mapping failures during
memory hotplug on Power:
For errors creating the linear mapping for the hot-added region:
* This is now reported with EFAULT which is more appropriate than the
previous EINVAL (the failure is unlikely to be related to the
function's parameters)
* An error in this path now prints a warning message, rather than just
silently failing to add the extra memory.
* Previously a failure here could result in the region being partially
mapped. We now clean up any partial mapping before failing.
For errors creating the vmemmap for the hot-added region:
* This is now reported with EFAULT instead of causing a BUG() - this
could happen for external reason (e.g. full hash table) so it's better
to handle this non-fatally
* An error message is also printed, so the failure won't be silent
* As above a failure could cause a partially mapped region, we now
clean this up. [mpe: move htab_remove_mapping() out of #ifdef
CONFIG_MEMORY_HOTPLUG to enable this]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Set IORESOURCE_SYSTEM_RAM in flags of resource ranges with
"System RAM", "Kernel code", "Kernel data", and "Kernel bss".
Note that:
- IORESOURCE_SYSRAM (i.e. modifier bit) is set in flags when
IORESOURCE_MEM is already set. IORESOURCE_SYSTEM_RAM is defined
as (IORESOURCE_MEM|IORESOURCE_SYSRAM).
- Some archs do not set 'flags' for children nodes, such as
"Kernel code". This patch does not change 'flags' in this
case.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux-mm <linux-mm@kvack.org>
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1453841853-11383-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With commit 90a545e9 (restrict /dev/mem to idle io memory ranges) mapping
rtas_rmo_buf from user space is failing. Hence we are not able to make
RTAS syscall.
This patch calls page_is_rtas_user_buf before calling iomem_is_exclusive
in devmem_is_allowed(). This will allow user space to map rtas_rmo_buf
and we are able to make RTAS syscall.
Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
CC: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map. This facility is used by the pmem driver to
enable pfn_to_page() operations on the page frames returned by DAX
('direct_access' in 'struct block_device_operations'). For now, the
'memmap' allocation for these "device" pages comes from "System
RAM". Support for allocating the memmap from device memory will
arrive in a later kernel.
2/ Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3. Completion of
the conversion is targeted for v4.4.
3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
5/ Miscellaneous updates and fixes to libnvdimm including support
for issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
slsw6DkrWT60CRE42nbK
=o57/
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This update has successfully completed a 0day-kbuild run and has
appeared in a linux-next release. The changes outside of the typical
drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
the introduction of ZONE_DEVICE + devm_memremap_pages().
Summary:
- Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map.
This facility is used by the pmem driver to enable pfn_to_page()
operations on the page frames returned by DAX ('direct_access' in
'struct block_device_operations').
For now, the 'memmap' allocation for these "device" pages comes
from "System RAM". Support for allocating the memmap from device
memory will arrive in a later kernel.
- Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3.
Completion of the conversion is targeted for v4.4.
- Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
- Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
- Miscellaneous updates and fixes to libnvdimm including support for
issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes"
* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
libnvdimm, pmem: direct map legacy pmem by default
libnvdimm, pmem: 'struct page' for pmem
libnvdimm, pfn: 'struct page' provider infrastructure
x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
add devm_memremap_pages
mm: ZONE_DEVICE for "device memory"
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
dax: drop size parameter to ->direct_access()
nd_blk: change aperture mapping from WC to WB
nvdimm: change to use generic kvfree()
pmem, dax: have direct_access use __pmem annotation
dax: update I/O path to do proper PMEM flushing
pmem: add copy_from_iter_pmem() and clear_pmem()
pmem, x86: clean up conditional pmem includes
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: move x86 PMEM API to new pmem.h header
libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
pmem: switch to devm_ allocations
devres: add devm_memremap
libnvdimm, btt: write and validate parent_uuid
...
While pmem is usable as a block device or via DAX mappings to userspace
there are several usage scenarios that can not target pmem due to its
lack of struct page coverage. In preparation for "hot plugging" pmem
into the vmemmap add ZONE_DEVICE as a new zone to tag these pages
separately from the ones that are subject to standard page allocations.
Importantly "device memory" can be removed at will by userspace
unbinding the driver of the device.
Having a separate zone prevents allocation and otherwise marks these
pages that are distinct from typical uniform memory. Device memory has
different lifetime and performance characteristics than RAM. However,
since we have run out of ZONES_SHIFT bits this functionality currently
depends on sacrificing ZONE_DMA.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Jerome Glisse <j.glisse@gmail.com>
[hch: various simplifications in the arch interface]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
__flush_dcache_icache_phys() requires the ability to access the
memory with the MMU disabled, which means that on a 32-bit system
any memory above 4 GiB is inaccessible. In particular, mpc86xx is
32-bit and can have more than 4 GiB of RAM.
Signed-off-by: Scott Wood <scottwood@freescale.com>
This function can run on systems where physical addresses don't
fit in unsigned long, so make sure to use the macro that contains the
proper cast.
Signed-off-by: Scott Wood <scottwood@freescale.com>
We have a powerpc specific global called mem_init_done which is "set on
boot once kmalloc can be called".
But that's not *quite* true. We set it at the bottom of mem_init(), and
rely on the fact that mm_init() calls kmem_cache_init() immediately
after that, and nothing is running in parallel.
So replace it with the generic and 100% correct slab_is_available().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We did part of sparse initialisation in setup_arch and part in
initmem_init. Put them together.
Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
At the moment we transition from the memblock alloctor to the bootmem
allocator. Gitting rid of the bootmem allocator removes a bunch of
complicated code (most of which I owe the dubious honour of being
responsible for writing).
Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have an extra level of indirection on memory hot remove which is not
matched on memory hot add. Memory hotplug is book3s only, so there is
no need for it.
This also enables means remove_memory() (ie memory hot unplug) works
on powernv.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Freescale updates from Scott (27 commits):
"Highlights include DMA32 zone support (SATA, USB, etc now works on 64-bit
FSL kernels), MSI changes, 8xx optimizations and cleanup, t104x board
support, and PrPMC PCI enumeration."
Commit 1c98025c6c "powerpc: Dynamic DMA
zone limits" updated how zones are created in paging_init(), but missed
the NUMA version of paging_init(). This was noticed via a linker
error, since dma_pfn_limit_to_zone() was, like the non-NUMA
paging_init(), limited by #ifndef CONFIG_NEED_MULTIPLE_NODES.
It turns out that the NUMA paging_init() was not actually doing
anything different from the standard paging_init(), other than a couple
debug prints, a couple 32-bit-only ifdef sections, and a call to
mark_nonram_nosave(). It's not clear whether mark_nonram_nosave() is
inherently wrong to do for NUMA, or just not useful on targets that
have NUMA, but for now I'm preserving the existing behavior.
Fixes: 1c98025c6c "powerpc: Dynamic DMA zone limits"
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Platform code can call limit_zone_pfn() to set appropriate limits
for ZONE_DMA and ZONE_DMA32, and dma_direct_alloc_coherent() will
select a suitable zone based on a device's mask and the pfn limits that
platform code has configured.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
This patch introduces zone_for_memory() to arch_add_memory() on powerpc
to ensure new, higher memory added into ZONE_MOVABLE if movable zone has
already setup.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Mel Gorman" <mgorman@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The memory remove code for powerpc/pseries should call remove_memory()
so that we are holding the hotplug_memory lock during memory remove
operations.
This patch updates the memory node remove handler to call remove_memory()
and adds a ppc_md.remove_memory() entry to handle pseries specific work
that is called from arch_remove_memory().
During memory remove in pseries_remove_memblock() we have to stay with
removing memory one section at a time. This is needed because of how memory
resources are handled. During memory add for pseries (via the probe file in
sysfs) we add memory one section at a time which gives us a memory resource
for each section. Future patches will aim to address this so will not have
to remove memory one section at a time.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull powerpc updates from Ben Herrenschmidt:
"So here's my next branch for powerpc. A bit late as I was on vacation
last week. It's mostly the same stuff that was in next already, I
just added two patches today which are the wiring up of lockref for
powerpc, which for some reason fell through the cracks last time and
is trivial.
The highlights are, in addition to a bunch of bug fixes:
- Reworked Machine Check handling on kernels running without a
hypervisor (or acting as a hypervisor). Provides hooks to handle
some errors in real mode such as TLB errors, handle SLB errors,
etc...
- Support for retrieving memory error information from the service
processor on IBM servers running without a hypervisor and routing
them to the memory poison infrastructure.
- _PAGE_NUMA support on server processors
- 32-bit BookE relocatable kernel support
- FSL e6500 hardware tablewalk support
- A bunch of new/revived board support
- FSL e6500 deeper idle states and altivec powerdown support
You'll notice a generic mm change here, it has been acked by the
relevant authorities and is a pre-req for our _PAGE_NUMA support"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (121 commits)
powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked()
powerpc: Add support for the optimised lockref implementation
powerpc/powernv: Call OPAL sync before kexec'ing
powerpc/eeh: Escalate error on non-existing PE
powerpc/eeh: Handle multiple EEH errors
powerpc: Fix transactional FP/VMX/VSX unavailable handlers
powerpc: Don't corrupt transactional state when using FP/VMX in kernel
powerpc: Reclaim two unused thread_info flag bits
powerpc: Fix races with irq_work
Move precessing of MCE queued event out from syscall exit path.
pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines
powerpc: Make add_system_ram_resources() __init
powerpc: add SATA_MV to ppc64_defconfig
powerpc/powernv: Increase candidate fw image size
powerpc: Add debug checks to catch invalid cpu-to-node mappings
powerpc: Fix the setup of CPU-to-Node mappings during CPU online
powerpc/iommu: Don't detach device without IOMMU group
powerpc/eeh: Hotplug improvement
powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config space
powerpc/eeh: Add restore_config operation
...