-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rswAKCRCRxhvAZXjc
ogkpAQD7/kR7QiOQTdDztDtAavZELOE3p4pUZfAcg8XFfH8QKQD/cgDpxgBbyvSQ
VBInBrv1vPeSvlPWvFkyqT/n8eiR9AQ=
=KQVR
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.15-rc1.sysv' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs sysv removal from Christian Brauner:
"This removes the sysv filesystem. We've discussed this various times.
It's time to try"
* tag 'vfs-6.15-rc1.sysv' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
sysv: Remove the filesystem
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90qAwAKCRCRxhvAZXjc
on7lAP0akpIsJMWREg9tLwTNTySI1b82uKec0EAgM6T7n/PYhAD/T4zoY8UYU0Pr
qCxwTXHUVT6bkNhjREBkfqq9OkPP8w8=
=GxeN
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
- Mount notifications
The day has come where we finally provide a new api to listen for
mount topology changes outside of /proc/<pid>/mountinfo. A mount
namespace file descriptor can be supplied and registered with
fanotify to listen for mount topology changes.
Currently notifications for mount, umount and moving mounts are
generated. The generated notification record contains the unique
mount id of the mount.
The listmount() and statmount() api can be used to query detailed
information about the mount using the received unique mount id.
This allows userspace to figure out exactly how the mount topology
changed without having to generating diffs of /proc/<pid>/mountinfo
in userspace.
- Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount
api
- Support detached mounts in overlayfs
Since last cycle we support specifying overlayfs layers via file
descriptors. However, we don't allow detached mounts which means
userspace cannot user file descriptors received via
open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to
attach them to a mount namespace via move_mount() first.
This is cumbersome and means they have to undo mounts via umount().
Allow them to directly use detached mounts.
- Allow to retrieve idmappings with statmount
Currently it isn't possible to figure out what idmapping has been
attached to an idmapped mount. Add an extension to statmount() which
allows to read the idmapping from the mount.
- Allow creating idmapped mounts from mounts that are already idmapped
So far it isn't possible to allow the creation of idmapped mounts
from already idmapped mounts as this has significant lifetime
implications. Make the creation of idmapped mounts atomic by allow to
pass struct mount_attr together with the open_tree_attr() system call
allowing to solve these issues without complicating VFS lookup in any
way.
The system call has in general the benefit that creating a detached
mount and applying mount attributes to it becomes an atomic operation
for userspace.
- Add a way to query statmount() for supported options
Allow userspace to query which mount information can be retrieved
through statmount().
- Allow superblock owners to force unmount
* tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
umount: Allow superblock owners to force umount
selftests: add tests for mount notification
selinux: add FILE__WATCH_MOUNTNS
samples/vfs: fix printf format string for size_t
fs: allow changing idmappings
fs: add kflags member to struct mount_kattr
fs: add open_tree_attr()
fs: add copy_mount_setattr() helper
fs: add vfs_open_tree() helper
statmount: add a new supported_mask field
samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP
selftests: add tests for using detached mount with overlayfs
samples/vfs: check whether flag was raised
statmount: allow to retrieve idmappings
uidgid: add map_id_range_up()
fs: allow detached mounts in clone_private_mount()
selftests/overlayfs: test specifying layers as O_PATH file descriptors
fs: support O_PATH fds with FSCONFIG_SET_FD
vfs: add notifications for mount attach and detach
fanotify: notify on mount attach and detach
...
All implementations of chacha_init_arch() just call
chacha_init_generic(), so it is pointless. Just delete it, and replace
chacha_init() with what was previously chacha_init_generic().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The immediate issue being fixed here is a nVMX bug where KVM fails to
detect that, after nested VM-Exit, L1 has a pending IRQ (or NMI).
However, checking for a pending interrupt accesses the legacy PIC, and
x86's kvm_arch_destroy_vm() currently frees the PIC before destroying
vCPUs, i.e. checking for IRQs during the forced nested VM-Exit results
in a NULL pointer deref; that's a prerequisite for the nVMX fix.
The remaining patches attempt to bring a bit of sanity to x86's VM
teardown code, which has accumulated a lot of cruft over the years. E.g.
KVM currently unloads each vCPU's MMUs in a separate operation from
destroying vCPUs, all because when guest SMP support was added, KVM had a
kludgy MMU teardown flow that broke when a VM had more than one 1 vCPU.
And that oddity lived on, for 18 years...
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I added this function in an earlier patch, but the missing export caused
a build failure
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Commit e27fbe16af ("MIPS: cm: Detect CM quirks from device tree")
introduced
arch/mips/include/asm/mips-cm.h:119:13: error: ‘mips_cm_update_property’
defined but not used [-Werror=unused-function]
Fix this by making empty function implementation inline
Fixes: e27fbe16af ("MIPS: cm: Detect CM quirks from device tree")
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
After my previous patch, the ioport_map() function changed from
the lib/iomap.c version to the asm-generic/io.h version, which
requires a correct PCI_IOBASE definition.
Unfortunately the types are also different, so add the correct
definition for ioport_map() in asm/io.h and change the machine
specific ones to have the correct type.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The point where the memory is released from memblock to the buddy
allocator is hidden inside arch-specific mem_init()s and the call to
memblock_free_all() is needlessly duplicated in every artiste cure and
after introduction of arch_mm_preinit() hook, mem_init() implementation on
many architecture only contains the call to memblock_free_all().
Pull memblock_free_all() call into mm_core_init() and drop mem_init() on
relevant architectures to make it more explicit where the free memory is
released from memblock to the buddy allocator and to reduce code
duplication in architecture specific code.
Link: https://lkml.kernel.org/r/20250313135003.836600-14-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Guo Ren (csky) <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, implementation of mem_init() in every architecture consists of
one or more of the following:
* initializations that must run before page allocator is active, for
instance swiotlb_init()
* a call to memblock_free_all() to release all the memory to the buddy
allocator
* initializations that must run after page allocator is ready and there is
no arch-specific hook other than mem_init() for that, like for example
register_page_bootmem_info() in x86 and sparc64 or simple setting of
mem_init_done = 1 in several architectures
* a bunch of semi-related stuff that apparently had no better place to
live, for example a ton of BUILD_BUG_ON()s in parisc.
Introduce arch_mm_preinit() that will be the first thing called from
mm_core_init(). On architectures that have initializations that must happen
before the page allocator is ready, move those into arch_mm_preinit() along
with the code that does not depend on ordering with page allocator setup.
On several architectures this results in reduction of mem_init() to a
single call to memblock_free_all() that allows its consolidation next.
Link: https://lkml.kernel.org/r/20250313135003.836600-13-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86]
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Guo Ren (csky) <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
All architectures that support HIGHMEM have their code that frees high
memory pages to the buddy allocator while __free_memory_core() is limited
to freeing only low memory.
There is no actual reason for that. The memory map is completely ready by
the time memblock_free_all() is called and high pages can be released to
the buddy allocator along with low memory.
Remove low memory limit from __free_memory_core() and drop per-architecture
code that frees high memory pages.
Link: https://lkml.kernel.org/r/20250313135003.836600-12-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86]
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Guo Ren (csky) <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
high_memory defines upper bound on the directly mapped memory. This bound
is defined by the beginning of ZONE_HIGHMEM when a system has high memory
and by the end of memory otherwise.
All this is known to generic memory management initialization code that
can set high_memory while initializing core mm structures.
Add a generic calculation of high_memory to free_area_init() and remove
per-architecture calculation except for the architectures that set and use
high_memory earlier than that.
Link: https://lkml.kernel.org/r/20250313135003.836600-11-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86]
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Guo Ren (csky) <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
max_mapnr is essentially the size of the memory map for systems that use
FLATMEM. There is no reason to calculate it in each and every architecture
when it's anyway calculated in alloc_node_mem_map().
Drop setting of max_mapnr from architecture code and set it once in
alloc_node_mem_map().
While on it, move definition of mem_map and max_mapnr to mm/mm_init.c so
there won't be two copies for MMU and !MMU variants.
Link: https://lkml.kernel.org/r/20250313135003.836600-10-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86]
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Guo Ren (csky) <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Allocating the zero pages from memblock is simpler because the memory is
already reserved.
This will also help with pulling out memblock_free_all() to the generic
code and reducing code duplication in arch::mem_init().
Link: https://lkml.kernel.org/r/20250313135003.836600-6-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Guo Ren (csky) <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Both MIPS systems that support numa (loongsoon3 and sgi-ip27) have
identical mem_init() for NUMA case.
Move that into arch/mips/mm/init.c and drop duplicate per-machine
definitions.
Link: https://lkml.kernel.org/r/20250313135003.836600-5-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Guo Ren (csky) <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioremap_prot() currently accepts pgprot_val parameter as an unsigned long,
thus implicitly assuming that pgprot_val and pgprot_t could never be
bigger than unsigned long. But this assumption soon will not be true on
arm64 when using D128 pgtables. In 128 bit page table configuration,
unsigned long is 64 bit, but pgprot_t is 128 bit.
Passing platform abstracted pgprot_t argument is better as compared to
size based data types. Let's change the parameter to directly pass
pgprot_t like another similar helper generic_ioremap_prot().
Without this change in place, D128 configuration does not work on arm64 as
the top 64 bits gets silently stripped when passing the protection value
to this function.
Link: https://lkml.kernel.org/r/20250218101954.415331-1-anshuman.khandual@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Co-developed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The imperative paradigm used to build vmlinux, extract some info from it
or perform some checks on it, and subsequently modify it again goes
against the declarative paradigm that is usually employed for defining
make rules.
In particular, the Makefile.postlink files that consume their input via
an output rule result in some dodgy logic in the decompressor makefiles
for RISC-V and x86, given that the vmlinux.relocs input file needed to
generate the arch-specific relocation tables may not exist or be out of
date, but cannot be constructed using the ordinary Make dependency based
rules, because the info needs to be extracted while vmlinux is in its
ephemeral, non-stripped form.
So instead, for architectures that require the static relocations that
are emitted into vmlinux when passing --emit-relocs to the linker, and
are subsequently stripped out again, introduce an intermediate vmlinux
target called vmlinux.unstripped, and organize the reset of the build
logic accordingly:
- vmlinux.unstripped is created only once, and not updated again
- build rules under arch/*/boot can depend on vmlinux.unstripped without
running the risk of the data disappearing or being out of date
- the final vmlinux generated by the build is not bloated with static
relocations that are never needed again after the build completes.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Some architectures build vmlinux with static relocations preserved, but
strip them again from the final vmlinux image. Arch specific tools
consume these static relocations in order to construct relocation tables
for KASLR.
The fact that vmlinux is created, consumed and subsequently updated goes
against the typical, declarative paradigm used by Make, which is based
on rules and dependencies. So as a first step towards cleaning this up,
introduce a Kconfig symbol to declare that the arch wants to consume the
static relocations emitted into vmlinux. This will be wired up further
in subsequent patches.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
All PIO on MIPS platforms is memory mapped, so there is no benefit in
the lib/iomap.c wrappers that switch between inb/outb and readb/writeb
style accessses.
In fact, the '#define PIO_RESERVED 0' setting completely disables
the GENERIC_IOMAP functionality, and the '#define PIO_OFFSET
mips_io_port_base' setting is based on a misunderstanding of what the
offset is meant to do.
MIPS started using GENERIC_IOMAP in 2018 with commit b962aeb022 ("MIPS:
Use GENERIC_IOMAP") replacing a simple custom implementation of the same
interfaces, but at the time the asm-generic/io.h version was not usable
yet. Since the header is now always included, it's now possible to go
back to the even simpler version.
Use the normal GENERIC_PCI_IOMAP functionality for all mips platforms
without the hacky GENERIC_IOMAP, and provide a custom pci_iounmap()
for the CONFIG_PCI_DRIVERS_LEGACY case to ensure the I/O port base never
gets unmapped.
The readsl() prototype needs an extra 'const' keyword to make it
compatible with the generic ioread32_rep() alias.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
All modules that need CONFIG_LIBCRC32C already select it, so there is no
need to bother users about the option.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250304230712.167600-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
The ARCH_MAY_HAVE patch missed arm64, mips and s390. But it may
also lead to arch options being enabled but ineffective because
of modular/built-in conflicts.
As the primary user of all these options wireguard is selecting
the arch options anyway, make the same selections at the lib/crypto
option level and hide the arch options from the user.
Instead of selecting them centrally from lib/crypto, simply set
the default of each arch option as suggested by Eric Biggers.
Change the Crypto API generic algorithms to select the top-level
lib/crypto options instead of the generic one as otherwise there
is no way to enable the arch options (Eric Biggers). Introduce a
set of INTERNAL options to work around dependency cycles on the
CONFIG_CRYPTO symbol.
Fixes: 1047e21aec ("crypto: lib/Kconfig - Fix lib built-in failure when arch is modular")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Arnd Bergmann <arnd@kernel.org>
Closes: https://lore.kernel.org/oe-kbuild-all/202502232152.JC84YDLp-lkp@intel.com/
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In order to fix a bug, arm64 needs to be told the size of the huge page
for which the huge_pte is being cleared in huge_ptep_get_and_clear().
Provide for this by adding an `unsigned long sz` parameter to the
function. This follows the same pattern as huge_pte_clear() and
set_huge_pte_at().
This commit makes the required interface modifications to the core mm as
well as all arches that implement this function (arm64, loongarch, mips,
parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed
in a separate commit.
Cc: stable@vger.kernel.org
Fixes: 66b3923a1a ("arm64: hugetlb: add support for PTE contiguous bit")
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390
Link: https://lore.kernel.org/r/20250226120656.2400136-2-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Since commit 6f2c2f93a1 ("scripts/sorttable: Remove unneeded
Elf_Rel"), sorttable no longer clears relocs against __ex_table,
claiming "it was never used." But in fact MIPS relocatable kernel had
been implicitly depending on this behavior, so after this commit the
MIPS relocatable kernel has started to spit oops like:
CPU 1 Unable to handle kernel paging request at virtual address 000000fffbbdbff8, epc == ffffffff818f9a6c, ra == ffffffff813ad7d0
... ...
Call Trace:
[<ffffffff818f9a6c>] __raw_copy_from_user+0x48/0x2fc
[<ffffffff813ad7d0>] cp_statx+0x1a0/0x1e0
[<ffffffff813ae528>] do_statx_fd+0xa8/0x118
[<ffffffff813ae670>] sys_statx+0xd8/0xf8
[<ffffffff81156cc8>] syscall_common+0x34/0x58
So ignore those relocs on our own to fix the issue.
Fixes: 6f2c2f93a1 ("scripts/sorttable: Remove unneeded Elf_Rel")
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Synchronize the declaration of ds1287_set_base_clock() between
cevt-ds1287.c and ds1287.h.
Fix follow error with gcc-14 when -Werror:
arch/mips/kernel/cevt-ds1287.c:21:5: error: conflicting types for ‘ds1287_set_base_clock’; have ‘int(unsigned int)’
21 | int ds1287_set_base_clock(unsigned int hz)
| ^~~~~~~~~~~~~~~~~~~~~
In file included from arch/mips/kernel/cevt-ds1287.c:13:
./arch/mips/include/asm/ds1287.h:11:13: note: previous declaration of ‘ds1287_set_base_clock’ with type ‘void(unsigned int)’
11 | extern void ds1287_set_base_clock(unsigned int clock);
| ^~~~~~~~~~~~~~~~~~~~~
make[7]: *** [scripts/Makefile.build:207: arch/mips/kernel/cevt-ds1287.o] Error 1
make[6]: *** [scripts/Makefile.build:465: arch/mips/kernel] Error 2
make[6]: *** Waiting for unfinished jobs....
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Address the issue of cevt-ds1287.c not including the ds1287.h header
file.
Fix follow errors with gcc-14 when -Werror:
arch/mips/kernel/cevt-ds1287.c:15:5: error: no previous prototype for ‘ds1287_timer_state’ [-Werror=missing-prototypes]
15 | int ds1287_timer_state(void)
| ^~~~~~~~~~~~~~~~~~
arch/mips/kernel/cevt-ds1287.c:20:5: error: no previous prototype for ‘ds1287_set_base_clock’ [-Werror=missing-prototypes]
20 | int ds1287_set_base_clock(unsigned int hz)
| ^~~~~~~~~~~~~~~~~~~~~
arch/mips/kernel/cevt-ds1287.c:103:12: error: no previous prototype for ‘ds1287_clockevent_init’ [-Werror=missing-prototypes]
103 | int __init ds1287_clockevent_init(int irq)
| ^~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[7]: *** [scripts/Makefile.build:207: arch/mips/kernel/cevt-ds1287.o] Error 1
make[7]: *** Waiting for unfinished jobs....
make[6]: *** [scripts/Makefile.build:465: arch/mips/kernel] Error 2
make[6]: *** Waiting for unfinished jobs....
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Declare which_prom() as static to suppress gcc compiler warning that
'missing-prototypes'. This function is not intended to be called
from other parts.
Fix follow error with gcc-14 when -Werror:
arch/mips/dec/prom/init.c:45:13: error: no previous prototype for ‘which_prom’ [-Werror=missing-prototypes]
45 | void __init which_prom(s32 magic, s32 *prom_vec)
| ^~~~~~~~~~
cc1: all warnings being treated as errors
make[6]: *** [scripts/Makefile.build:207: arch/mips/dec/prom/init.o] Error 1
make[5]: *** [scripts/Makefile.build:465: arch/mips/dec/prom] Error 2
make[5]: *** Waiting for unfinished jobs....
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
strncpy() is deprecated for NUL-terminated destination buffers. Use
strscpy() instead and remove the manual NUL-termination.
Compile-tested only.
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Current MT7628A device tree file system controller node is wrong since it is
not matching bindings. Hence, update it to match current bindings updating
it also to use new introduced clock constants.
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Remove kvm_arch_sync_events() now that x86 no longer uses it (no other
arch has ever used it).
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Message-ID: <20250224235542.2562848-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Current MT7620A device tree file system controller node is wrong since it is
not matching bindings. Hence, update it to match current bindings updating
it also to use new introduced clock constants.
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Current RT3883 device tree file system controller node is wrong since it is
not matching bindings. Hence, update it to match current bindings updating
it also to use new introduced clock constants.
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Current RT3050 device tree file system controller node is wrong since it is
not matching bindings. Hence, update it to match current bindings updating
it also to use new introduced clock constants.
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Current RT2880 device tree file system controller node is wrong since it is
not matching bindings. Hence, update it to match current bindings updating
it also to use new introduced clock constants.
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
On MIPS system, most of the syscall function name begin with prefix
sys_. Some syscalls are special such as clone/fork, function name of
these begin with __sys_. Since scratch registers need be saved in
stack when these system calls happens.
With ftrace system call method, system call functions are declared with
SYSCALL_DEFINEx, metadata of the system call symbol name begins with
sys_. Here mips specific function arch_syscall_match_sym_name is used to
compare function name between sys_call_table[] and metadata of syscall
symbol.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Define a gpio-restart node to the Cisco SG220-26P so the device can be
rebooted using the SoC's hard reset pin. Set the priority to 192 so the
gpio-restart method takes priority over the watchdog restart.
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Add some of the SoC's CPU peripherals currently supported:
- GPIO controller with support for 24 GPIO lines, although not all
lines are brought out to pads on the SoC package. These lines can
generate interrupts from external sources.
- Watchdog which can be used to restart the SoC if no external restart
logic is present.
- SPI controller, primarily used to access NOR flash
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Add a fixed clock to define the clock frequency of the Lexra bus and use
this for the two uart nodes instead of a separate clock-frequency
property.
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The uart interrupts on RTL838x chips do not lead to the CPU's interrupt
controller directly, but passes via the SoC interrupt controller. Update
the interrupt-parent property to fix this.
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Add the SoC interrupt controller so other components can link to it.
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
rtl83xx.dtsi was once (presumably) created as a base for both RTL838x
and RTL839x SoCs. Both SoCs have a different CPU and the peripherals
require different compatibles. Fold rtl83xx.dtsi into rtl838x.dtsi,
currently only supporting RTL838x SoCs, and create the RTL839x base
include later when required.
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Although not strictly required by the simple-bus binding, add the bus
offset to the node name to be consistent with other nodes. Also drop the
node label as it is not referenced anywhere.
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The referenced CPU clock does not require any additional #clock-cells,
so drop the extraneous '0' in the referenced CPU clock.
The binding for MIPS cpus also does not allow for the clock-names
property, so just drop it.
This resolves some error message from 'dtbs_check':
cpu@0: clocks: [[4], [0]] is too long
'clock-names' does not match any of the regexes: 'pinctrl-[0-9]+'
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz> # For RTL9302C
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The RTL930x SoC series is sufficiently different to warrant its own base
dtsi. This ensures no properties need to be deleted or overwritten, and
prevents accidental inclusions of updates from rtl83xx.dtsi.
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz> # For RTL9302C
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The CM3.5 device used in EyeQ6H SoCs incorrectly reports the status
for Hardware Cache Initialization (HCI). This commit adds the
compatible string for the CM to acknowledge this issue, which enables
the use of the second CPU cluster.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Some CM3.5 devices incorrectly report that hardware cache
initialization has completed, and also claim to support hardware cache
initialization when they don't actually do so. This commit fixes this
issue by retrieving the correct information from the device tree and
allowing the system to bypass the hardware cache initialization
step. Instead, it relies on manual operation. As a result, multi-user
support is now possible for these CPUs.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Some information that should be retrieved at runtime for the Coherence
Manager can be either absent or wrong. This patch allows checking if
some of this information is available from the device tree and updates
the internal variable accordingly.
For now, only the compatible string associated with the broken HCI is
being retrieved.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Since 2002 (change "Replace BKL for chain locking with sysvfs-private
rwlock") the sysv filesystem was doing IO under a rwlock in its
get_block() function (yes, a non-sleepable lock hold over a function
used to read inode metadata for all reads and writes). Nobody noticed
until syzbot in 2023 [1]. This shows nobody is using the filesystem.
Just drop it.
[1] https://lore.kernel.org/all/0000000000000ccf9a05ee84f5b0@google.com/
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20250220163940.10155-2-jack@suse.cz
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Replace the deprecated "simple-audio-card,hp-det-gpio" property by
"simple-audio-card,hp-det-gpios" in Simple Audio Card device nodes.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The MIPS architecture's source files do not require <linux/fb.h>.
Remove the include statement.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Probe for & boot CPUs (cores & VPs) in secondary clusters (ie. not the
cluster that began booting Linux) when they are present in systems with
CM 3.5 or higher.
Signed-off-by: Paul Burton <paulburton@kernel.org>
Signed-off-by: Chao-ying Fu <cfu@wavecomp.com>
Signed-off-by: Dragan Mladjenovic <dragan.mladjenovic@syrmia.com>
Signed-off-by: Aleksandar Rikalo <arikalo@gmail.com>
Tested-by: Serge Semin <fancer.lancer@gmail.com>
Tested-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
In preparation for supporting multi-cluster systems, introduce a struct
cluster_boot_config as an extra layer in the boot configuration
maintained by the MIPS Coherent Processing System (CPS) SMP
implementation. For now only one struct cluster_boot_config will be
allocated & we'll simply defererence its core_config field to find the
struct core_boot_config array which can be used to boot as usual.
Signed-off-by: Paul Burton <paulburton@kernel.org>
Signed-off-by: Dragan Mladjenovic <dragan.mladjenovic@syrmia.com>
Signed-off-by: Aleksandar Rikalo <arikalo@gmail.com>
Tested-by: Serge Semin <fancer.lancer@gmail.com>
Tested-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The pm-cps code has up until now used per-CPU variables indexed by core,
rather than CPU number, in order to share data amongst sibling CPUs (ie.
VPs/threads in a core). This works fine for single cluster systems, but
with multi-cluster systems a core number is no longer unique in the
system, leading to sharing between CPUs that are not actually siblings.
Avoid this issue by using per-CPU variables as they are more generally
used - ie. access them using CPU numbers rather than core numbers.
Sharing between siblings is then accomplished by:
- Assigning the same pointer to entries for each sibling CPU for the
nc_asm_enter & ready_count variables, which allow this by virtue of
being per-CPU pointers.
- Indexing by the first CPU set in a CPUs cpu_sibling_map in the case
of pm_barrier, for which we can't use the previous approach because
the per-CPU variable is not a pointer.
Signed-off-by: Paul Burton <paulburton@kernel.org>
Signed-off-by: Dragan Mladjenovic <dragan.mladjenovic@syrmia.com>
Signed-off-by: Aleksandar Rikalo <arikalo@gmail.com>
Tested-by: Serge Semin <fancer.lancer@gmail.com>
Tested-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The generic storage implementation provides the same features as the
custom one. However it can be shared between architectures, making
maintenance easier.
Co-developed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-13-13a4669dfc8c@linutronix.de
As the Makefile is included into other Makefiles it can not be used to
define objects to be built from the current source directory.
However the generic datastore will introduce such a local source file.
Rename the included Makefile so it is clear how it is to be used and to
make room for a regular Makefile in lib/vdso/.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-4-13a4669dfc8c@linutronix.de
hrtimer_setup() takes the callback function pointer as argument and
initializes the timer completely.
Replace hrtimer_init() and the open coded initialization of
hrtimer::function with the new setup mechanism.
Patch was created by using Coccinelle.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/34004267eba24388fb50ef6de1b8dd7addba6475.1738746821.git.namcao@linutronix.de
This makes ptrace/get_syscall_info selftest pass on mips o32 and
mips64 o32 by fixing the following two test assertions:
1. get_syscall_info test assertion on mips o32:
# get_syscall_info.c:218:get_syscall_info:Expected exp_args[5] (3134521044) == info.entry.args[4] (4911432)
# get_syscall_info.c:219:get_syscall_info:wait #1: entry stop mismatch
2. get_syscall_info test assertion on mips64 o32:
# get_syscall_info.c:209:get_syscall_info:Expected exp_args[2] (3134324433) == info.entry.args[1] (18446744072548908753)
# get_syscall_info.c:210:get_syscall_info:wait #1: entry stop mismatch
The first assertion happens due to mips_get_syscall_arg() trying to access
another task's context but failing to do it properly because get_user() it
calls just peeks at the current task's context. It usually does not crash
because the default user stack always gets assigned the same VMA, but it
is pure luck which mips_get_syscall_arg() wouldn't have if e.g. the stack
was switched (via setcontext(3) or however) or a non-default process's
thread peeked at, and in any case irrelevant data is obtained just as
observed with the test case.
mips_get_syscall_arg() ought to be using access_remote_vm() instead to
retrieve the other task's stack contents, but given that the data has been
already obtained and saved in `struct pt_regs' it would be an overkill.
The first assertion is fixed for mips o32 by using struct pt_regs.args
instead of get_user() to obtain syscall arguments. This approach works
due to this piece in arch/mips/kernel/scall32-o32.S:
/*
* Ok, copy the args from the luser stack to the kernel stack.
*/
.set push
.set noreorder
.set nomacro
load_a4: user_lw(t5, 16(t0)) # argument #5 from usp
load_a5: user_lw(t6, 20(t0)) # argument #6 from usp
load_a6: user_lw(t7, 24(t0)) # argument #7 from usp
load_a7: user_lw(t8, 28(t0)) # argument #8 from usp
loads_done:
sw t5, PT_ARG4(sp) # argument #5 to ksp
sw t6, PT_ARG5(sp) # argument #6 to ksp
sw t7, PT_ARG6(sp) # argument #7 to ksp
sw t8, PT_ARG7(sp) # argument #8 to ksp
.set pop
.section __ex_table,"a"
PTR_WD load_a4, bad_stack_a4
PTR_WD load_a5, bad_stack_a5
PTR_WD load_a6, bad_stack_a6
PTR_WD load_a7, bad_stack_a7
.previous
arch/mips/kernel/scall64-o32.S has analogous code for mips64 o32 that
allows fixing the issue by obtaining syscall arguments from struct
pt_regs.regs[4..11] instead of the erroneous use of get_user().
The second assertion is fixed by truncating 64-bit values to 32-bit
syscall arguments.
Fixes: c0ff3c53d4 ("MIPS: Enable HAVE_ARCH_TRACEHOOK.")
Signed-off-by: Dmitry V. Levin <ldv@strace.io>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
We have several places across the kernel where we want to access another
task's syscall arguments, such as ptrace(2), seccomp(2), etc., by making
a call to syscall_get_arguments().
This works for register arguments right away by accessing the task's
`regs' member of `struct pt_regs', however for stack arguments seen with
32-bit/o32 kernels things are more complicated. Technically they ought
to be obtained from the user stack with calls to an access_remote_vm(),
but we have an easier way available already.
So as to be able to access syscall stack arguments as regular function
arguments following the MIPS calling convention we copy them over from
the user stack to the kernel stack in arch/mips/kernel/scall32-o32.S, in
handle_sys(), to the current stack frame's outgoing argument space at
the top of the stack, which is where the handler called expects to see
its incoming arguments. This area is also pointed at by the `pt_regs'
pointer obtained by task_pt_regs().
Make the o32 stack argument space a proper member of `struct pt_regs'
then, by renaming the existing member from `pad0' to `args' and using
generated offsets to access the space. No functional change though.
With the change in place the o32 kernel stack frame layout at the entry
to a syscall handler invoked by handle_sys() is therefore as follows:
$sp + 68 -> | ... | <- pt_regs.regs[9]
+---------------------+
$sp + 64 -> | $t0 | <- pt_regs.regs[8]
+---------------------+
$sp + 60 -> | $a3/argument #4 | <- pt_regs.regs[7]
+---------------------+
$sp + 56 -> | $a2/argument #3 | <- pt_regs.regs[6]
+---------------------+
$sp + 52 -> | $a1/argument #2 | <- pt_regs.regs[5]
+---------------------+
$sp + 48 -> | $a0/argument #1 | <- pt_regs.regs[4]
+---------------------+
$sp + 44 -> | $v1 | <- pt_regs.regs[3]
+---------------------+
$sp + 40 -> | $v0 | <- pt_regs.regs[2]
+---------------------+
$sp + 36 -> | $at | <- pt_regs.regs[1]
+---------------------+
$sp + 32 -> | $zero | <- pt_regs.regs[0]
+---------------------+
$sp + 28 -> | stack argument #8 | <- pt_regs.args[7]
+---------------------+
$sp + 24 -> | stack argument #7 | <- pt_regs.args[6]
+---------------------+
$sp + 20 -> | stack argument #6 | <- pt_regs.args[5]
+---------------------+
$sp + 16 -> | stack argument #5 | <- pt_regs.args[4]
+---------------------+
$sp + 12 -> | psABI space for $a3 | <- pt_regs.args[3]
+---------------------+
$sp + 8 -> | psABI space for $a2 | <- pt_regs.args[2]
+---------------------+
$sp + 4 -> | psABI space for $a1 | <- pt_regs.args[1]
+---------------------+
$sp + 0 -> | psABI space for $a0 | <- pt_regs.args[0]
+---------------------+
holding user data received and with the first 4 frame slots reserved by
the psABI for the compiler to spill the incoming arguments from $a0-$a3
registers (which it sometimes does according to its needs) and the next
4 frame slots designated by the psABI for any stack function arguments
that follow. This data is also available for other tasks to peek/poke
at as reqired and where permitted.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Add open_tree_attr() which allow to atomically create a detached mount
tree and set mount options on it. If OPEN_TREE_CLONE is used this will
allow the creation of a detached mount with a new set of mount options
without it ever being exposed to userspace without that set of mount
options applied.
Link: https://lore.kernel.org/r/20250128-work-mnt_idmap-update-v2-v1-3-c25feb0d2eb3@kernel.org
Reviewed-by: "Seth Forshee (DigitalOcean)" <sforshee@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
arch/mips/Kconfig selects HAVE_ARCH_SECCOMP_FILTER so syscall_trace_enter()
can just use __secure_computing(NULL) and rely on populate_seccomp_data(sd)
and "sd == NULL" checks in __secure_computing(sd) paths.
With the change above syscall_trace_enter() can just use secure_computing()
and avoid #ifdef + test_thread_flag(TIF_SECCOMP). CONFIG_GENERIC_ENTRY is
not defined, so test_syscall_work(SECCOMP) will check TIF_SECCOMP.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250128150300.GA15318@redhat.com
Signed-off-by: Kees Cook <kees@kernel.org>
Remove the "crct10dif" shash algorithm from the crypto API. It has no
known user now that the lib is no longer built on top of it. It has no
remaining references in kernel code. The only other potential users
would be the usual components that allow specifying arbitrary hash
algorithms by name, namely AF_ALG and dm-integrity. However there are
no indications that "crct10dif" is being used with these components.
Debian Code Search and web searches don't find anything relevant, and
explicitly grepping the source code of the usual suspects (cryptsetup,
libell, iwd) finds no matches either. "crc32" and "crc32c" are used in
a few more places, but that doesn't seem to be the case for "crct10dif".
crc_t10dif_update() is also tested by crc_kunit now, so the test
coverage provided via the crypto self-tests is no longer needed.
Also note that the "crct10dif" shash algorithm was inconsistent with the
rest of the shash API in that it wrote the digest in CPU endianness,
making the resulting byte array differ on little endian vs. big endian
platforms. This means it was effectively just built for use by the lib
functions, and it was not actually correct to treat it as "just another
hash function" that could be dropped in via the shash API.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20250206173857.39794-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Following the standardization on crc32c() as the lib entry point for the
Castagnoli CRC32 instead of the previous mix of crc32c(), crc32c_le(),
and __crc32c_le(), make the same change to the underlying base and arch
functions that implement it.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250208024911.14936-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Remove enum crc_op_size and enum crc_type, since they are never actually
used. Tokens with the names of the enum values do appear in the file,
but they are only used for token concatenation with the preprocessor.
This prevents a conflict with the addition of crc32c() to linux/crc32.h.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/r/20250207224233.GA1261167@ax162
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250208024911.14936-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Remove all remaining references to CONFIG_CRC32_BIT,
CONFIG_CRC32_SARWATE, CONFIG_CRC32_SLICEBY4, and CONFIG_CRC32_SLICEBY8.
These options no longer exist, now that we've standardized on a single
generic CRC32 implementation.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250205000424.75149-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
This reverts commit bc7584e009.
The split IPC system calls for o32 have been introduced with modern
version only. Changing this breaks ABI.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Here is the "big" set of char/misc/iio and other smaller driver
subsystem updates for 6.14-rc1. Loads of different things in here this
development cycle, highlights are:
- ntsync "driver" to handle Windows locking types enabling Wine to
work much better on many workloads (i.e. games). The driver
framework was in 6.13, but now it's enabled and fully working
properly. Should make many SteamOS users happy. Even comes with
tests!
- Large IIO driver updates and bugfixes
- FPGA driver updates
- Coresight driver updates
- MHI driver updates
- PPS driver updatesa
- const bin_attribute reworking for many drivers
- binder driver updates
- smaller driver updates and fixes
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ5fGOQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynatACeLlbkhUT544Va1eOL2TkjfcGxrZUAoJ3ymGC0
y0N7/+fWL6aS+b4sEilv
=TU0D
-----END PGP SIGNATURE-----
Merge tag 'char-misc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull Char/Misc/IIO driver updates from Greg KH:
"Here is the "big" set of char/misc/iio and other smaller driver
subsystem updates for 6.14-rc1. Loads of different things in here this
development cycle, highlights are:
- ntsync "driver" to handle Windows locking types enabling Wine to
work much better on many workloads (i.e. games). The driver
framework was in 6.13, but now it's enabled and fully working
properly. Should make many SteamOS users happy. Even comes with
tests!
- Large IIO driver updates and bugfixes
- FPGA driver updates
- Coresight driver updates
- MHI driver updates
- PPS driver updatesa
- const bin_attribute reworking for many drivers
- binder driver updates
- smaller driver updates and fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (311 commits)
ntsync: Fix reference leaks in the remaining create ioctls.
spmi: hisi-spmi-controller: Drop duplicated OF node assignment in spmi_controller_probe()
spmi: Set fwnode for spmi devices
ntsync: fix a file reference leak in drivers/misc/ntsync.c
scripts/tags.sh: Don't tag usages of DECLARE_BITMAP
dt-bindings: interconnect: qcom,msm8998-bwmon: Add SM8750 CPU BWMONs
dt-bindings: interconnect: OSM L3: Document sm8650 OSM L3 compatible
dt-bindings: interconnect: qcom-bwmon: Document QCS615 bwmon compatibles
interconnect: sm8750: Add missing const to static qcom_icc_desc
memstick: core: fix kernel-doc notation
intel_th: core: fix kernel-doc warnings
binder: log transaction code on failure
iio: dac: ad3552r-hs: clear reset status flag
iio: dac: ad3552r-common: fix ad3541/2r ranges
iio: chemical: bme680: Fix uninitialized variable in __bme680_read_raw()
misc: fastrpc: Fix copy buffer page size
misc: fastrpc: Fix registered buffer page address
misc: fastrpc: Deregister device nodes properly in error scenarios
nvmem: core: improve range check for nvmem_cell_write()
nvmem: qcom-spmi-sdam: Set size in struct nvmem_config
...
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmeXoDUaHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHAorw/8CaBkXlyYNQzlSD6WK9Si
yfZAhKp3BdZWUfImlRJFQkAWaORz3LWtMg6T5odQpara3D5tvUTZ6leXDsVjwctA
pZXv1xiAg9SymQQiWLdHawGG6xWCG+Z0eI8vQrXjPp+6fI6c7pq6AfDjW+TQ9dnl
0cZEPUItgj0TBv8dSs2hKyHRotU7Fn2sagWVhMy5KeLnp8qK1rq/aZaFehNXIuCo
bslcsoT1Sex6YwCawST94f+kWAzDOigp/M32+7gQJXDQljbjTCXBtWkpoP0QRl55
bcdiTB1sSvLBlCurFI/0ktvxm5lfyjmMmvvWPYgNW50v3QROr9xWyjkRWhXtSBZk
M8MozS9+TH51QtPOf+uIkWY5KkIAJILw85eVtC5gP8L980qvUkBhREm36m1Vy3NT
xUTRKuaeBm7FNu2lD3wj16F5K4YdtE+Zwr+luRnxTWpdeauMoDkyzW8QIitO8CEw
2gedxeivvs+kAufkodmv+heC6Nk/fkoYGm5AfHeCIdpwFXLV/gtCP9PvhiUiE+Le
tKShGvFOB8hj0UlS7z6XPlInXvwuqZmKhRc214ymKTWmNLYDKutRo++GlNZALzCV
ts47pcw76ktkC8ptYkzxOXBWwpR1NOdGvAgV9Px6N/211Ww6wRn2BPb6oToSebE0
ZERE04cQGnm16i06qwD38Hc=
=IdU/
-----END PGP SIGNATURE-----
Merge tag 'mips_6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
"Cleanups and fixes"
* tag 'mips_6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: pci-legacy: Override pci_address_to_pio
MIPS: Loongson64: env: Use str_on_off() helper in prom_lefi_init_env()
MIPS: migrate to generic rule for built-in DTBs
mips: fix shmctl/semctl/msgctl syscall for o32
mips/math-emu: fix emulation of the prefx instruction
MIPS: Loongson: Add comments for interface_info
MIPS: Loongson64: remove ROM Size unit in boardinfo
MIPS: traps: Use str_enabled_disabled() in parity_protection_init()
MIPS: ftrace: Declare ftrace_get_parent_ra_addr() as static
Revert "MIPS: csrc-r4k: Select HAVE_UNSTABLE_SCHED_CLOCK if SMP && 64BIT"
MIPS: Fix the wrong format specifier
MIPS: Add a blank line after __HEAD
MIPS: kernel: Rename read/write_c0_ecc to read/writec0_errctl
indivudual patches which are described in their changelogs.
- "Allocate and free frozen pages" from Matthew Wilcox reorganizes the
page allocator so we end up with the ability to allocate and free
zero-refcount pages. So that callers (ie, slab) can avoid a refcount
inc & dec.
- "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use
large folios other than PMD-sized ones.
- "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and
fixes for this small built-in kernel selftest.
- "mas_anode_descend() related cleanup" from Wei Yang tidies up part of
the mapletree code.
- "mm: fix format issues and param types" from Keren Sun implements a
few minor code cleanups.
- "simplify split calculation" from Wei Yang provides a few fixes and a
test for the mapletree code.
- "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes
continues the work of moving vma-related code into the (relatively) new
mm/vma.c.
- "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
Hildenbrand cleans up and rationalizes handling of gfp flags in the page
allocator.
- "readahead: Reintroduce fix for improper RA window sizing" from Jan
Kara is a second attempt at fixing a readahead window sizing issue. It
should reduce the amount of unnecessary reading.
- "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
addresses an issue where "huge" amounts of pte pagetables are
accumulated
(https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/).
Qi's series addresses this windup by synchronously freeing PTE memory
within the context of madvise(MADV_DONTNEED).
- "selftest/mm: Remove warnings found by adding compiler flags" from
Muhammad Usama Anjum fixes some build warnings in the selftests code
when optional compiler warnings are enabled.
- "mm: don't use __GFP_HARDWALL when migrating remote pages" from David
Hildenbrand tightens the allocator's observance of __GFP_HARDWALL.
- "pkeys kselftests improvements" from Kevin Brodsky implements various
fixes and cleanups in the MM selftests code, mainly pertaining to the
pkeys tests.
- "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
estimate application working set size.
- "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
provides some cleanups to memcg's hugetlb charging logic.
- "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based
kernel build was demonstrated.
- "zram: split page type read/write handling" from Sergey Senozhatsky
has several fixes and cleaups for zram in the area of zram_write_page().
A watchdog softlockup warning was eliminated.
- "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky
cleans up the pagetable destructor implementations. A rare
use-after-free race is fixed.
- "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
simplifies and cleans up the debugging code in the VMA merging logic.
- "Account page tables at all levels" from Kevin Brodsky cleans up and
regularizes the pagetable ctor/dtor handling. This results in
improvements in accounting accuracy.
- "mm/damon: replace most damon_callback usages in sysfs with new core
functions" from SeongJae Park cleans up and generalizes DAMON's sysfs
file interface logic.
- "mm/damon: enable page level properties based monitoring" from
SeongJae Park increases the amount of information which is presented in
response to DAMOS actions.
- "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes
DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs
is completed.
- "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter
Xu cleans up and generalizes the hugetlb reservation accounting.
- "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
removes a never-used feature of the alloc_pages_bulk() interface.
- "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
extends DAMOS filters to support not only exclusion (rejecting), but
also inclusion (allowing) behavior.
- "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
"introduces a new memory descriptor for zswap.zpool that currently
overlaps with struct page for now. This is part of the effort to reduce
the size of struct page and to enable dynamic allocation of memory
descriptors."
- "mm, swap: rework of swap allocator locks" from Kairui Song redoes and
simplifies the swap allocator locking. A speedup of 400% was
demonstrated for one workload. As was a 35% reduction for kernel build
time with swap-on-zram.
- "mm: update mips to use do_mmap(), make mmap_region() internal" from
Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
mmap_region() can be made MM-internal.
- "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU
regressions and otherwise improves MGLRU performance.
- "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park
updates DAMON documentation.
- "Cleanup for memfd_create()" from Isaac Manjarres does that thing.
- "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand
provides various cleanups in the areas of hugetlb folios, THP folios and
migration.
- "Uncached buffered IO" from Jens Axboe implements the new
RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache
reading and writing. To permite userspace to address issues with
massive buildup of useless pagecache when reading/writing fast devices.
- "selftests/mm: virtual_address_range: Reduce memory" from Thomas
Weißschuh fixes and optimizes some of the MM selftests.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5a+cwAKCRDdBJ7gKXxA
jtoyAP9R58oaOKPJuTizEKKXvh/RpMyD6sYcz/uPpnf+cKTZxQEAqfVznfWlw/Lz
uC3KRZYhmd5YrxU4o+qjbzp9XWX/xAE=
=Ib2s
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"The various patchsets are summarized below. Plus of course many
indivudual patches which are described in their changelogs.
- "Allocate and free frozen pages" from Matthew Wilcox reorganizes
the page allocator so we end up with the ability to allocate and
free zero-refcount pages. So that callers (ie, slab) can avoid a
refcount inc & dec
- "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to
use large folios other than PMD-sized ones
- "Fix mm/rodata_test" from Petr Tesarik performs some maintenance
and fixes for this small built-in kernel selftest
- "mas_anode_descend() related cleanup" from Wei Yang tidies up part
of the mapletree code
- "mm: fix format issues and param types" from Keren Sun implements a
few minor code cleanups
- "simplify split calculation" from Wei Yang provides a few fixes and
a test for the mapletree code
- "mm/vma: make more mmap logic userland testable" from Lorenzo
Stoakes continues the work of moving vma-related code into the
(relatively) new mm/vma.c
- "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
Hildenbrand cleans up and rationalizes handling of gfp flags in the
page allocator
- "readahead: Reintroduce fix for improper RA window sizing" from Jan
Kara is a second attempt at fixing a readahead window sizing issue.
It should reduce the amount of unnecessary reading
- "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
addresses an issue where "huge" amounts of pte pagetables are
accumulated:
https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/
Qi's series addresses this windup by synchronously freeing PTE
memory within the context of madvise(MADV_DONTNEED)
- "selftest/mm: Remove warnings found by adding compiler flags" from
Muhammad Usama Anjum fixes some build warnings in the selftests
code when optional compiler warnings are enabled
- "mm: don't use __GFP_HARDWALL when migrating remote pages" from
David Hildenbrand tightens the allocator's observance of
__GFP_HARDWALL
- "pkeys kselftests improvements" from Kevin Brodsky implements
various fixes and cleanups in the MM selftests code, mainly
pertaining to the pkeys tests
- "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
estimate application working set size
- "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
provides some cleanups to memcg's hugetlb charging logic
- "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
removes the global swap cgroup lock. A speedup of 10% for a
tmpfs-based kernel build was demonstrated
- "zram: split page type read/write handling" from Sergey Senozhatsky
has several fixes and cleaups for zram in the area of
zram_write_page(). A watchdog softlockup warning was eliminated
- "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin
Brodsky cleans up the pagetable destructor implementations. A rare
use-after-free race is fixed
- "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
simplifies and cleans up the debugging code in the VMA merging
logic
- "Account page tables at all levels" from Kevin Brodsky cleans up
and regularizes the pagetable ctor/dtor handling. This results in
improvements in accounting accuracy
- "mm/damon: replace most damon_callback usages in sysfs with new
core functions" from SeongJae Park cleans up and generalizes
DAMON's sysfs file interface logic
- "mm/damon: enable page level properties based monitoring" from
SeongJae Park increases the amount of information which is
presented in response to DAMOS actions
- "mm/damon: remove DAMON debugfs interface" from SeongJae Park
removes DAMON's long-deprecated debugfs interfaces. Thus the
migration to sysfs is completed
- "mm/hugetlb: Refactor hugetlb allocation resv accounting" from
Peter Xu cleans up and generalizes the hugetlb reservation
accounting
- "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
removes a never-used feature of the alloc_pages_bulk() interface
- "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
extends DAMOS filters to support not only exclusion (rejecting),
but also inclusion (allowing) behavior
- "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
introduces a new memory descriptor for zswap.zpool that currently
overlaps with struct page for now. This is part of the effort to
reduce the size of struct page and to enable dynamic allocation of
memory descriptors
- "mm, swap: rework of swap allocator locks" from Kairui Song redoes
and simplifies the swap allocator locking. A speedup of 400% was
demonstrated for one workload. As was a 35% reduction for kernel
build time with swap-on-zram
- "mm: update mips to use do_mmap(), make mmap_region() internal"
from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
mmap_region() can be made MM-internal
- "mm/mglru: performance optimizations" from Yu Zhao fixes a few
MGLRU regressions and otherwise improves MGLRU performance
- "Docs/mm/damon: add tuning guide and misc updates" from SeongJae
Park updates DAMON documentation
- "Cleanup for memfd_create()" from Isaac Manjarres does that thing
- "mm: hugetlb+THP folio and migration cleanups" from David
Hildenbrand provides various cleanups in the areas of hugetlb
folios, THP folios and migration
- "Uncached buffered IO" from Jens Axboe implements the new
RWF_DONTCACHE flag which provides synchronous dropbehind for
pagecache reading and writing. To permite userspace to address
issues with massive buildup of useless pagecache when
reading/writing fast devices
- "selftests/mm: virtual_address_range: Reduce memory" from Thomas
Weißschuh fixes and optimizes some of the MM selftests"
* tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm/compaction: fix UBSAN shift-out-of-bounds warning
s390/mm: add missing ctor/dtor on page table upgrade
kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()
tools: add VM_WARN_ON_VMG definition
mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()
seqlock: add missing parameter documentation for raw_seqcount_try_begin()
mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh
mm/page_alloc: remove the incorrect and misleading comment
zram: remove zcomp_stream_put() from write_incompressible_page()
mm: separate move/undo parts from migrate_pages_batch()
mm/kfence: use str_write_read() helper in get_access_type()
selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()
selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
selftests/mm: vm_util: split up /proc/self/smaps parsing
selftests/mm: virtual_address_range: unmap chunks after validation
selftests/mm: virtual_address_range: mmap() without PROT_WRITE
selftests/memfd/memfd_test: fix possible NULL pointer dereference
mm: add FGP_DONTCACHE folio creation flag
mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue
...
Before SLUB initialization, various subsystems used memblock_alloc to
allocate memory. In most cases, when memory allocation fails, an
immediate panic is required. To simplify this behavior and reduce
repetitive checks, introduce `memblock_alloc_or_panic`. This function
ensures that memory allocation failures result in a panic automatically,
improving code readability and consistency across subsystems that require
this behavior.
[guoweikang.kernel@gmail.com: arch/s390: save_area_alloc default failure behavior changed to panic]
Link: https://lkml.kernel.org/r/20250109033136.2845676-1-guoweikang.kernel@gmail.com
Link: https://lore.kernel.org/lkml/Z2fknmnNtiZbCc7x@kernel.org/
Link: https://lkml.kernel.org/r/20250102072528.650926-1-guoweikang.kernel@gmail.com
Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390]
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm: update mips to use do_mmap(), make mmap_region()
internal".
Currently the only user of mmap_region() outside of the memory management
code is the MIPS VDSO implementation.
This uses mmap_region() to map a 'delay slot emulation page' at the top of
the stack which is read-only and executable.
This mapping requires that an already-acquired mmap write lock is utilised
and that uffd and populate logic is ignored. This rules out vm_mmap(),
however do_mmap() fits the bill.
Adapt this code to use do_mmap() and then once done, make mmap_region()
internal and userland testable, and avoid any other uses of mmap_region(),
which is absolutely and strictly an internal mm function which bypasses a
great number of checks and logic.
This patch (of 2):
mmap_region() is an internal memory management implementation detail that
is not intended to be used outside of the memory management subsystem.
Map the delay slot emulation page using do_mmap() which makes use of the
already-held mmap write lock and bypasses unneeded populate and
userfaultfd logic.
This should have the precise same behaviour as the existing logic.
Link: https://lkml.kernel.org/r/cover.1735819274.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/ef076e381570f709e5c2c142dc030ec5b3309a0e.1735819274.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We already have a generic implementation of alloc/free up to P4D level, as
well as pgd_free(). Let's finish the work and add a generic PGD-level
alloc helper as well.
Unlike at lower levels, almost all architectures need some specific magic
at PGD level (typically initialising PGD entries), so introducing a
generic pgd_alloc() isn't worth it. Instead we introduce two new helpers,
__pgd_alloc() and __pgd_free(), and make use of them in the arch-specific
pgd_alloc() and pgd_free() wherever possible. To accommodate as many arch
as possible, __pgd_alloc() takes a page allocation order.
Because pagetable_alloc() allocates zeroed pages, explicit zeroing in
pgd_alloc() becomes redundant and we can get rid of it. Some trivial
implementations of pgd_free() also become unnecessary once __pgd_alloc()
is used; remove them.
Another small improvement is consistent accounting of PGD pages by using
GFP_PGTABLE_{USER,KERNEL} as appropriate.
Not all PGD allocations can be handled by the generic helpers. In
particular, multiple architectures allocate PGDs from a kmem_cache, and
those PGDs may not be page-sized.
Link: https://lkml.kernel.org/r/20250103184415.2744423-6-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The pagetable_p*_dtor() are exactly the same except for the handling of
ptlock. If we make ptlock_free() handle the case where ptdesc->ptl is
NULL and remove VM_BUG_ON_PAGE() from pmd_ptlock_free(), we can unify
pagetable_p*_dtor() into one function. Let's introduce pagetable_dtor()
to do this.
Later, pagetable_dtor() will be moved to tlb_remove_ptdesc(), so that
ptlock and page table pages can be freed together (regardless of whether
RCU is used). This prevents the use-after-free problem where the ptlock
is freed immediately but the page table pages is freed later via RCU.
Link: https://lkml.kernel.org/r/47f44fff9dc68d9d9e9a0d6c036df275f820598a.1736317725.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390]
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- Reorganize the architecture-optimized CRC32 and CRC-T10DIF code to be
directly accessible via the library API, instead of requiring the
crypto API. This is much simpler and more efficient.
- Convert some users such as ext4 to use the CRC32 library API instead
of the crypto API. More conversions like this will come later.
- Add a KUnit test that tests and benchmarks multiple CRC variants.
Remove older, less-comprehensive tests that are made redundant by
this.
- Add an entry to MAINTAINERS for the kernel's CRC library code. I'm
volunteering to maintain it. I have additional cleanups and
optimizations planned for future cycles.
These patches have been in linux-next since -rc1.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZ418ZRQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKyJYAP9kBlpm8W9/XY6N8SpjKaXE/vKQYHQl
Nobhak06Us8uJwEAkcUTymWP4IwQj5A9jgBAPRw53FQcNVKIc+01C7gRHw0=
=mqSH
-----END PGP SIGNATURE-----
Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
- Reorganize the architecture-optimized CRC32 and CRC-T10DIF code to be
directly accessible via the library API, instead of requiring the
crypto API. This is much simpler and more efficient.
- Convert some users such as ext4 to use the CRC32 library API instead
of the crypto API. More conversions like this will come later.
- Add a KUnit test that tests and benchmarks multiple CRC variants.
Remove older, less-comprehensive tests that are made redundant by
this.
- Add an entry to MAINTAINERS for the kernel's CRC library code. I'm
volunteering to maintain it. I have additional cleanups and
optimizations planned for future cycles.
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (31 commits)
MAINTAINERS: add entry for CRC library
powerpc/crc: delete obsolete crc-vpmsum_test.c
lib/crc32test: delete obsolete crc32test.c
lib/crc16_kunit: delete obsolete crc16_kunit.c
lib/crc_kunit.c: add KUnit test suite for CRC library functions
powerpc/crc-t10dif: expose CRC-T10DIF function through lib
arm64/crc-t10dif: expose CRC-T10DIF function through lib
arm/crc-t10dif: expose CRC-T10DIF function through lib
x86/crc-t10dif: expose CRC-T10DIF function through lib
crypto: crct10dif - expose arch-optimized lib function
lib/crc-t10dif: add support for arch overrides
lib/crc-t10dif: stop wrapping the crypto API
scsi: target: iscsi: switch to using the crc32c library
f2fs: switch to using the crc32 library
jbd2: switch to using the crc32c library
ext4: switch to using the crc32c library
lib/crc32: make crc32c() go directly to lib
bcachefs: Explicitly select CRYPTO from BCACHEFS_FS
x86/crc32: expose CRC32 functions through lib
x86/crc32: update prototype for crc32_pclmul_le_16()
...
pci-legacy systems are not using logic_pio to managed PIO
allocations, thus the generic pci_address_to_pio won't work
when PCI_IOBASE is defined.
Override the function to use architecture implementation to
fix the problem.
Cc: stable@vger.kernel.org
Fixes: 4bfb53e7d3 ("mips: add <asm-generic/io.h> including")
Reported-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Closes: https://lore.kernel.org/r/99f75c66-4c2d-45dc-a808-b5ba440c7551@app.fastmail.com/
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Remove hard-coded strings by using the str_on_off() helper function.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Commit 654102df2a ("kbuild: add generic support for built-in boot
DTBs") introduced generic support for built-in DTBs.
Select GENERIC_BUILTIN_DTB and BUILTIN_DTB_ALL when the built-in DTB
support is enabled.
DTBs compiled under arch/mips/boot/dts/ will be wrapped by the generic
rule in scripts/Makefile.vmlinux.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The commit 275f22148e ("ipc: rename old-style shmctl/semctl/msgctl
syscalls") switched various architectures to use sys_old_*ctl() with
ipc_parse_version, including mips n32/n64. However, for mips o32, commit
0d6040d468 ("arch: add split IPC system calls where needed") added
separate IPC syscalls without properly using the old-style handlers.
This causes applications using uClibc-ng to fail with -EINVAL when
calling semctl/shmctl/msgctl with IPC_64 flag, as uClibc-ng uses the
syscall numbers from kernel headers to determine whether to use the IPC
multiplexer or split syscalls. In contrast, glibc is unaffected as it
uses a unified feature test macro __ASSUME_DIRECT_SYSVIPC_SYSCALLS
(disabled for mips-o32) to make this decision.
Fix this by switching the o32 ABI entries for semctl, shmctl and msgctl
to use the old-style handlers, matching the behavior of other
architectures and fixing compatibility with uClibc-ng.
Signed-off-by: Ism Hong <ism.hong@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Currently, installation of Debian 12.8 for mipsel fails on machines
without an FPU [1]. This is caused by the fact that zstd (which is used
for initramfs compression) executes the prefx instruction, which is not
emulated properly by the kernel.
The prefx (Prefetch Indexed) instruction fetches data from memory into
the cache without any side effects. Though functionally unrelated, it
requires an FPU [2].
Bytecode format of this instruction ends on "001111" binary:
(prefx instruction format) & 0x0000003f = 0x0000000f
The code in fpux_emu() runs like so:
#define MIPSInst(x) x
#define MIPSInst_FMA_FFMT(x) (MIPSInst(x) & 0x00000007)
#define MIPSInst_FUNC(x) (MIPSInst(x) & 0x0000003f)
enum cop1x_func { ..., pfetch_op = 0x0f, ... };
...
switch (MIPSInst_FMA_FFMT(ir)) {
...
case 0x3:
if (MIPSInst_FUNC(ir) != pfetch_op)
return SIGILL;
/* ignore prefx operation */
break;
default:
return SIGILL;
}
That snippet above contains a logic error and the
if (MIPSInst_FUNC(ir) != pfetch_op)
comparison always fires.
When MIPSInst_FUNC(ir) is equal to pfetch_op, ir must end on 001111
binary. In this case, MIPSInst_FMA_FFMT(ir) must be equal to 0x7, which
does not match that case label.
This causes emulation failure for the prefx instruction. Fix it.
This has been broken by
commit 919af8b96c ("MIPS: Make definitions of MIPSInst_FMA_{FUNC,FMTM} consistent with MIPS64 manual")
which modified the MIPSInst_FMA_FFMT macro without updating the users.
Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Cc: stable@vger.kernel.org # after 3 weeks
Cc: Dengcheng Zhu <dzhu@wavecomp.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Fixes: 919af8b96c ("MIPS: Make definitions of MIPSInst_FMA_{FUNC,FMTM} consistent with MIPS64 manual")
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1091858
[2] MIPS Architecture For Programmers Volume II-A: The MIPS32 Instruction Set
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Clarify meanings of members of interface_info, especially for .size,
which does not define the ROM size.
Suggested-by: Icenowy Zheng <uwu@icenowy.me>
Signed-off-by: Kexy Biscuit <kexybiscuit@aosc.io>
Acked-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Per Appendix A.7 in Q/LS 0013-2014 (龙芯CPU开发系统固件与内核接口规范 V2.2,
lit. Loongson DevSys Firmware Kernel Interface Specification V2.2),
interface_info.size is size of this interface, not size of the LEFI BIOS
ROM.
In any case, the BIOS ROM Size just cannot be several kilobytes (KB) on
Loongson64 LEFI platforms.
Reported-by: Mingcong Bai <jeffbai@aosc.io>
Suggested-by: Icenowy Zheng <uwu@icenowy.me>
Fixes: 6c1bfbd9df ("MIPS: Loongson64: Add /sys/firmware/lefi/boardinfo")
Cc: stable@vger.kernel.org
Signed-off-by: Kexy Biscuit <kexybiscuit@aosc.io>
Acked-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Remove hard-coded strings by using the str_enabled_disabled() helper
function.
Use pr_info() instead of printk(KERN_INFO) to silence multiple
checkpatch warnings.
Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Declare ftrace_get_parent_ra_addr() as static to suppress clang
compiler warning that 'no previous prototype'. This function is
not intended to be called from other parts.
Fix follow error with clang-19:
arch/mips/kernel/ftrace.c:251:15: error: no previous prototype for function 'ftrace_get_parent_ra_addr' [-Werror,-Wmissing-prototypes]
251 | unsigned long ftrace_get_parent_ra_addr(unsigned long self_ra, unsigned long
| ^
arch/mips/kernel/ftrace.c:251:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
251 | unsigned long ftrace_get_parent_ra_addr(unsigned long self_ra, unsigned long
| ^
| static
1 error generated.
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
This reverts commit 426fa8e4fe.
The commit has caused two issues on Loongson 3A4000:
1. The timestamp in dmesg become erratic, like:
[3.736957] amdgpu 0000:04:00.0: ... ...
[3.748895] [drm] Initialized amdgpu ... ...
[18446744073.381141] amdgpu 0000:04:00:0: ... ...
[1.613326] igb 0000:03:00.0 enp3s0: ... ...
2. More seriously, some workloads (for example, the test
stdlib/test-cxa_atexit2 in the Glibc test suite) triggers an RCU
stall and hang the system with a high probably (4 hangs out of 5
tests).
Revert this commit to use jiffie on Loongson MIPS systems and fix these
issues for now. The root cause may need more investigation.
Cc: stable@vger.kernel.org # 6.11+
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Make a minor change to eliminate a static checker warning. The type
of cpu is unsigned int, so the correct format specifier should be
%u instead of %d.
Signed-off-by: liujing <liujing@cmss.chinamobile.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Add a blank line after __HEAD in the text section.
The __HEAD symbol was added in commit 52892ed6b0 ("MIPS:
Place __kernel_entry at the beginning of text section").
Signed-off-by: Rong Xu <xur@google.com>
Suggested-by: Maciej W. Rozycki <macro@orcam.me.uk>
---Changelog--
Added suggested-by.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
CP0 register 26 is used as ECC register for legacy cores, but newer
cores (MIPS32/MIPS64) use it as an ErrCtl register. Since the kernel only
uses CP0 26 as ErrCtl register rename the access functions to the more
fitting name.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The keywrap (kw) algorithm has no in-tree user. It has never had an
in-tree user, and the patch that added it provided no justification for
its inclusion. Even use of it via AF_ALG is impossible, as it uses a
weird calling convention where part of the ciphertext is returned via
the IV buffer, which is not returned to userspace in AF_ALG.
It's also unclear whether any new code in the kernel that does key
wrapping would actually use this algorithm. It is controversial in the
cryptographic community due to having no clearly stated security goal,
no security proof, poor performance, and only a 64-bit auth tag. Later
work (https://eprint.iacr.org/2006/221) suggested that the goal is
deterministic authenticated encryption. But there are now more modern
algorithms for this, and this is not the same as key wrapping, for which
a regular AEAD such as AES-GCM usually can be (and is) used instead.
Therefore, remove this unused code.
There were several special cases for this algorithm in the self-tests,
due to its weird calling convention. Remove those too.
Cc: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove the vmac64 template, as it has no known users. It also continues
to have longstanding bugs such as alignment violations (see
https://lore.kernel.org/r/20241226134847.6690-1-evepolonium@gmail.com/).
This code was added in 2009 by commit f1939f7c56 ("crypto: vmac - New
hash algorithm for intel_txt support"). Based on the mention of
intel_txt support in the commit title, it seems it was added as a
prerequisite for the contemporaneous patch
"intel_txt: add s3 userspace memory integrity verification"
(https://lore.kernel.org/r/4ABF2B50.6070106@intel.com/). In the design
proposed by that patch, when an Intel Trusted Execution Technology (TXT)
enabled system resumed from suspend, the "tboot" trusted executable
launched the Linux kernel without verifying userspace memory, and then
the Linux kernel used VMAC to verify userspace memory.
However, that patch was never merged, as reviewers had objected to the
design. It was later reworked into commit 4bd96a7a81 ("x86, tboot:
Add support for S3 memory integrity protection") which made tboot verify
the memory instead. Thus the VMAC support in Linux was never used.
No in-tree user has appeared since then, other than potentially the
usual components that allow specifying arbitrary hash algorithms by
name, namely AF_ALG and dm-integrity. However there are no indications
that VMAC is being used with these components. Debian Code Search and
web searches for "vmac64" (the actual algorithm name) do not return any
results other than the kernel itself, suggesting that it does not appear
in any other code or documentation. Explicitly grepping the source code
of the usual suspects (libell, iwd, cryptsetup) finds no matches either.
Before 2018, the vmac code was also completely broken due to using a
hardcoded nonce and the wrong endianness for the MAC. It was then fixed
by commit ed331adab3 ("crypto: vmac - add nonced version with big
endian digest") and commit 0917b87312 ("crypto: vmac - remove insecure
version with hardcoded nonce"). These were intentionally breaking
changes that changed all the computed MAC values as well as the
algorithm name ("vmac" to "vmac64"). No complaints were ever received
about these breaking changes, strongly suggesting the absence of users.
The reason I had put some effort into fixing this code in 2018 is
because it was used by an out-of-tree driver. But if it is still needed
in that particular out-of-tree driver, the code can be carried in that
driver instead. There is no need to carry it upstream.
Cc: Atharva Tiwari <evepolonium@gmail.com>
Cc: Shane Wang <shane.wang@intel.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a new reserved-memory node, containing bootloader config with MAC
addresses for both ethernet instances of the SoC.
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20241230143035.265518-8-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add new socket option, SO_RCVPRIORITY, to include SO_PRIORITY in the
ancillary data returned by recvmsg().
This is analogous to the existing support for SO_RCVMARK,
as implemented in commit 6fd1d51cfa ("net: SO_RCVMARK socket option
for SO_MARK with recvmsg()").
Reviewed-by: Willem de Bruijn <willemb@google.com>
Suggested-by: Ferenc Fejes <fejes@inf.elte.hu>
Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
Link: https://patch.msgid.link/20241213084457.45120-5-annaemesenyiri@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move the mips CRC32 assembly code into the lib directory and wire it up
to the library interface. This allows it to be used without going
through the crypto API. It remains usable via the crypto API too via
the shash algorithms that use the library interface. Thus all the
arch-specific "shash" code becomes unnecessary and is removed.
Note: to see the diff from arch/mips/crypto/crc32-mips.c to
arch/mips/lib/crc32-mips.c, view this commit with 'git show -M10'.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20241202010844.144356-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
The continual trickle of small conversion patches is grating on me, and
is really not helping. Just get rid of the 'remove_new' member
function, which is just an alias for the plain 'remove', and had a
comment to that effect:
/*
* .remove_new() is a relic from a prototype conversion of .remove().
* New drivers are supposed to implement .remove(). Once all drivers are
* converted to not use .remove_new any more, it will be dropped.
*/
This was just a tree-wide 'sed' script that replaced '.remove_new' with
'.remove', with some care taken to turn a subsequent tab into two tabs
to make things line up.
I did do some minimal manual whitespace adjustment for places that used
spaces to line things up.
Then I just removed the old (sic) .remove_new member function, and this
is the end result. No more unnecessary conversion noise.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Add generic support for built-in boot DTB files
- Enable TAB cycling for dialog buttons in nconfig
- Fix issues in streamline_config.pl
- Refactor Kconfig
- Add support for Clang's AutoFDO (Automatic Feedback-Directed
Optimization)
- Add support for Clang's Propeller, a profile-guided optimization.
- Change the working directory to the external module directory for M=
builds
- Support building external modules in a separate output directory
- Enable objtool for *.mod.o and additional kernel objects
- Use lz4 instead of deprecated lz4c
- Work around a performance issue with "git describe"
- Refactor modpost
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmdKGgEVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGrFoQAIgioJPRG+HC6bGmjy4tL4bq1RAm
78nbD12grrAa+NvQGRZYRs264rWxBGwrNfGGS9BDvlWJZ3fmKEuPlfCIxC0nkKk8
LVLNxSVvgpHE47RQ+E4V+yYhrlZKb4aWZjH3ZICn7vxRgbQ5Veq61aatluVHyn9c
I8g+APYN/S1A4JkFzaLe8GV7v5OM3+zGSn3M9n7/DxVkoiNrMOXJm5hRdRgYfEv/
kMppheY2PPshZsaL+yLAdrJccY5au5vYE/v8wHkMtvM/LF6YwjgqPVDRFQ30BuLM
sAMMd6AUoopiDZQOpqmXYukU0b0MQPswg3jmB+PWUBrlsuydRa8kkyPwUaFrDd+w
9d0jZRc8/O/lxUdD1AefRkNcA/dIJ4lTPr+2NpxwHuj2UFo0gLQmtjBggMFHaWvs
0NQRBPlxfOE4+Htl09gyg230kHuWq+rh7xqbyJCX+hBOaZ6kI2lryl6QkgpAoS+x
KDOcUKnsgGMGARQRrvCOAXnQs+rjkW8fEm6t7eSBFPuWJMK85k4LmxOog8GVYEdM
JcwCnCHt9TtcHlSxLRnVXj2aqGTFNLJXE1aLyCp9u8MxZ7qcx01xOuCmwp6FRzNq
ghal7ngA58Y/S4K/oJ+CW2KupOb6CFne8mpyotpYeWI7MR64t0YWs4voZkuK46E2
CEBfA4PDehA4BxQe
=GDKD
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add generic support for built-in boot DTB files
- Enable TAB cycling for dialog buttons in nconfig
- Fix issues in streamline_config.pl
- Refactor Kconfig
- Add support for Clang's AutoFDO (Automatic Feedback-Directed
Optimization)
- Add support for Clang's Propeller, a profile-guided optimization.
- Change the working directory to the external module directory for M=
builds
- Support building external modules in a separate output directory
- Enable objtool for *.mod.o and additional kernel objects
- Use lz4 instead of deprecated lz4c
- Work around a performance issue with "git describe"
- Refactor modpost
* tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (85 commits)
kbuild: rename .tmp_vmlinux.kallsyms0.syms to .tmp_vmlinux0.syms
gitignore: Don't ignore 'tags' directory
kbuild: add dependency from vmlinux to resolve_btfids
modpost: replace tdb_hash() with hash_str()
kbuild: deb-pkg: add python3:native to build dependency
genksyms: reduce indentation in export_symbol()
modpost: improve error messages in device_id_check()
modpost: rename alias symbol for MODULE_DEVICE_TABLE()
modpost: rename variables in handle_moddevtable()
modpost: move strstarts() to modpost.h
modpost: convert do_usb_table() to a generic handler
modpost: convert do_of_table() to a generic handler
modpost: convert do_pnp_device_entry() to a generic handler
modpost: convert do_pnp_card_entries() to a generic handler
modpost: call module_alias_printf() from all do_*_entry() functions
modpost: pass (struct module *) to do_*_entry() functions
modpost: remove DEF_FIELD_ADDR_VAR() macro
modpost: deduplicate MODULE_ALIAS() for all drivers
modpost: introduce module_alias_printf() helper
modpost: remove unnecessary check in do_acpi_entry()
...
- add SPI nand to realtek device tree
- change clock tree for mobileye
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmdJe7oaHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHDtFA/9HizYQrLv1qoK+e/7OTkE
XT0S1jcV534x+yQlMops+aGHIif8ghrf4nHq3i2JNPI0hzasS8ao4jO8qFTONeoa
b9yh7EchGSPptR5Qc3d4OxXQ9rWdeAXe4ByVavW+odgzRRoIuH6oFrUs/KXIo5Zn
3d1bpcNLultZKfLVryrT9Q74cQ4TNwlpwfjuGLsUO2xSzQ0PBlZSY2bokrxdUg9t
xH0IR3Nd9v0e6P5zUdpRcHI1Hu2GXzz0bTnkxDsmETlDYU/VI51BYWHmNGUscYAz
iUqrI4FZD/ncTmrqgZOI1SG3jTKh8dx9HZDU+nN2P35OPClMgKQYLBpMg0/Kffgf
hX/qet0aPwSOYPfzb0Ao1CQO3FNbZ7oipo3DkuM3dwihJA0pbPHDuTmKpS11Agec
5XSYPmNJg253TC0cPBKty/TgzVlxNi9+ss7MreO3uRY6HGo2f+kX7fO4+7iDkCF0
nrTJ/PLdH4WTn+pykk4f8uA32hGmjiJN0/rxAkbIZ78noVnPStGBXNCbbqPzUShV
FmNgyo+ob5H6iZF8o+8u1CRk8aAIEwsNIysJmHNNEVp3ox8GGb5LnXWv9bosxwOs
72HLQOxAiaiWirfxcXpIjKUT6nGxivOdRkmptfTMAi6u2EwlynHyj4z+Cs5mGBfa
tNc9Px3EPsatzG0MUji7f00=
=f5du
-----END PGP SIGNATURE-----
Merge tag 'mips_6.13_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
- fix for loongson64 device tree
- add SPI nand to realtek device tree
- change clock tree for mobileye
* tag 'mips_6.13_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Loongson64: DTS: Really fix PCIe port nodes for ls7a
mips: dts: realtek: Add SPI NAND controller
MIPS: mobileye: eyeq6h: add OLB nodes OLB and remove fixed clocks
MIPS: mobileye: eyeq5: use OLB as provider for fixed factor clocks
Mark __kernel_entry as ".head.text" and place HEAD_TEXT before
TEXT_TEXT in the linker script. This ensures that __kernel_entry
will be placed at the beginning of text section.
Drop mips from scripts/head-object-list.txt.
Signed-off-by: Rong Xu <xur@google.com>
Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Closes: https://lore.kernel.org/lkml/c6719149-8531-4174-824e-a3caf4bc6d0e@alliedtelesis.co.nz/T/
Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
essentially guessing which pfns are refcounted pages. The reason to
do so was that KVM needs to map both non-refcounted pages (for example
BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP VMAs that contain
refcounted pages. However, the result was security issues in the past,
and more recently the inability to map VM_IO and VM_PFNMAP memory
that _is_ backed by struct page but is not refcounted. In particular
this broke virtio-gpu blob resources (which directly map host graphics
buffers into the guest as "vram" for the virtio-gpu device) with the
amdgpu driver, because amdgpu allocates non-compound higher order pages
and the tail pages could not be mapped into KVM.
This requires adjusting all uses of struct page in the per-architecture
code, to always work on the pfn whenever possible. The large series that
did this, from David Stevens and Sean Christopherson, also cleaned up
substantially the set of functions that provided arch code with the
pfn for a host virtual addresses. The previous maze of twisty little
passages, all different, is replaced by five functions (__gfn_to_page,
__kvm_faultin_pfn, the non-__ versions of these two, and kvm_prefetch_pages)
saving almost 200 lines of code.
ARM:
* Support for stage-1 permission indirection (FEAT_S1PIE) and
permission overlays (FEAT_S1POE), including nested virt + the
emulated page table walker
* Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call
was introduced in PSCIv1.3 as a mechanism to request hibernation,
similar to the S4 state in ACPI
* Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
part of it, introduce trivial initialization of the host's MPAM
context so KVM can use the corresponding traps
* PMU support under nested virtualization, honoring the guest
hypervisor's trap configuration and event filtering when running a
nested guest
* Fixes to vgic ITS serialization where stale device/interrupt table
entries are not zeroed when the mapping is invalidated by the VM
* Avoid emulated MMIO completion if userspace has requested synchronous
external abort injection
* Various fixes and cleanups affecting pKVM, vCPU initialization, and
selftests
LoongArch:
* Add iocsr and mmio bus simulation in kernel.
* Add in-kernel interrupt controller emulation.
* Add support for virtualization extensions to the eiointc irqchip.
PPC:
* Drop lingering and utterly obsolete references to PPC970 KVM, which was
removed 10 years ago.
* Fix incorrect documentation references to non-existing ioctls
RISC-V:
* Accelerate KVM RISC-V when running as a guest
* Perf support to collect KVM guest statistics from host side
s390:
* New selftests: more ucontrol selftests and CPU model sanity checks
* Support for the gen17 CPU model
* List registers supported by KVM_GET/SET_ONE_REG in the documentation
x86:
* Cleanup KVM's handling of Accessed and Dirty bits to dedup code, improve
documentation, harden against unexpected changes. Even if the hardware
A/D tracking is disabled, it is possible to use the hardware-defined A/D
bits to track if a PFN is Accessed and/or Dirty, and that removes a lot
of special cases.
* Elide TLB flushes when aging secondary PTEs, as has been done in x86's
primary MMU for over 10 years.
* Recover huge pages in-place in the TDP MMU when dirty page logging is
toggled off, instead of zapping them and waiting until the page is
re-accessed to create a huge mapping. This reduces vCPU jitter.
* Batch TLB flushes when dirty page logging is toggled off. This reduces
the time it takes to disable dirty logging by ~3x.
* Remove the shrinker that was (poorly) attempting to reclaim shadow page
tables in low-memory situations.
* Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE.
* Advertise CPUIDs for new instructions in Clearwater Forest
* Quirk KVM's misguided behavior of initialized certain feature MSRs to
their maximum supported feature set, which can result in KVM creating
invalid vCPU state. E.g. initializing PERF_CAPABILITIES to a non-zero
value results in the vCPU having invalid state if userspace hides PDCM
from the guest, which in turn can lead to save/restore failures.
* Fix KVM's handling of non-canonical checks for vCPUs that support LA57
to better follow the "architecture", in quotes because the actual
behavior is poorly documented. E.g. most MSR writes and descriptor
table loads ignore CR4.LA57 and operate purely on whether the CPU
supports LA57.
* Bypass the register cache when querying CPL from kvm_sched_out(), as
filling the cache from IRQ context is generally unsafe; harden the
cache accessors to try to prevent similar issues from occuring in the
future. The issue that triggered this change was already fixed in 6.12,
but was still kinda latent.
* Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM
over-advertises SPEC_CTRL when trying to support cross-vendor VMs.
* Minor cleanups
* Switch hugepage recovery thread to use vhost_task. These kthreads can
consume significant amounts of CPU time on behalf of a VM or in response
to how the VM behaves (for example how it accesses its memory); therefore
KVM tried to place the thread in the VM's cgroups and charge the CPU
time consumed by that work to the VM's container. However the kthreads
did not process SIGSTOP/SIGCONT, and therefore cgroups which had KVM
instances inside could not complete freezing. Fix this by replacing the
kthread with a PF_USER_WORKER thread, via the vhost_task abstraction.
Another 100+ lines removed, with generally better behavior too like
having these threads properly parented in the process tree.
* Revert a workaround for an old CPU erratum (Nehalem/Westmere) that didn't
really work; there was really nothing to work around anyway: the broken
patch was meant to fix nested virtualization, but the PERF_GLOBAL_CTRL
MSR is virtualized and therefore unaffected by the erratum.
* Fix 6.12 regression where CONFIG_KVM will be built as a module even
if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is 'y'.
x86 selftests:
* x86 selftests can now use AVX.
Documentation:
* Use rST internal links
* Reorganize the introduction to the API document
Generic:
* Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead
of RCU, so that running a vCPU on a different task doesn't encounter long
due to having to wait for all CPUs become quiescent. In general both reads
and writes are rare, but userspace that supports confidential computing is
introducing the use of "helper" vCPUs that may jump from one host processor
to another. Those will be very happy to trigger a synchronize_rcu(), and
the effect on performance is quite the disaster.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmc9MRYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroP00QgArxqxBIGLCW5t7bw7vtNq63QYRyh4
dTiDguLiYQJ+AXmnRu11R6aPC7HgMAvlFCCmH+GEce4WEgt26hxCmncJr/aJOSwS
letCS7TrME16PeZvh25A1nhPBUw6mTF1qqzgcdHMrqXG8LuHoGcKYGSRVbkf3kfI
1ZoMq1r8ChXbVVmCx9DQ3gw1TVr5Dpjs2voLh8rDSE9Xpw0tVVabHu3/NhQEz/F+
t8/nRaqH777icCHIf9PCk5HnarHxLAOvhM2M0Yj09PuBcE5fFQxpxltw/qiKQqqW
ep4oquojGl87kZnhlDaac2UNtK90Ws+WxxvCwUmbvGN0ZJVaQwf4FvTwig==
=lWpE
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"The biggest change here is eliminating the awful idea that KVM had of
essentially guessing which pfns are refcounted pages.
The reason to do so was that KVM needs to map both non-refcounted
pages (for example BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP
VMAs that contain refcounted pages.
However, the result was security issues in the past, and more recently
the inability to map VM_IO and VM_PFNMAP memory that _is_ backed by
struct page but is not refcounted. In particular this broke virtio-gpu
blob resources (which directly map host graphics buffers into the
guest as "vram" for the virtio-gpu device) with the amdgpu driver,
because amdgpu allocates non-compound higher order pages and the tail
pages could not be mapped into KVM.
This requires adjusting all uses of struct page in the
per-architecture code, to always work on the pfn whenever possible.
The large series that did this, from David Stevens and Sean
Christopherson, also cleaned up substantially the set of functions
that provided arch code with the pfn for a host virtual addresses.
The previous maze of twisty little passages, all different, is
replaced by five functions (__gfn_to_page, __kvm_faultin_pfn, the
non-__ versions of these two, and kvm_prefetch_pages) saving almost
200 lines of code.
ARM:
- Support for stage-1 permission indirection (FEAT_S1PIE) and
permission overlays (FEAT_S1POE), including nested virt + the
emulated page table walker
- Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This
call was introduced in PSCIv1.3 as a mechanism to request
hibernation, similar to the S4 state in ACPI
- Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
part of it, introduce trivial initialization of the host's MPAM
context so KVM can use the corresponding traps
- PMU support under nested virtualization, honoring the guest
hypervisor's trap configuration and event filtering when running a
nested guest
- Fixes to vgic ITS serialization where stale device/interrupt table
entries are not zeroed when the mapping is invalidated by the VM
- Avoid emulated MMIO completion if userspace has requested
synchronous external abort injection
- Various fixes and cleanups affecting pKVM, vCPU initialization, and
selftests
LoongArch:
- Add iocsr and mmio bus simulation in kernel.
- Add in-kernel interrupt controller emulation.
- Add support for virtualization extensions to the eiointc irqchip.
PPC:
- Drop lingering and utterly obsolete references to PPC970 KVM, which
was removed 10 years ago.
- Fix incorrect documentation references to non-existing ioctls
RISC-V:
- Accelerate KVM RISC-V when running as a guest
- Perf support to collect KVM guest statistics from host side
s390:
- New selftests: more ucontrol selftests and CPU model sanity checks
- Support for the gen17 CPU model
- List registers supported by KVM_GET/SET_ONE_REG in the
documentation
x86:
- Cleanup KVM's handling of Accessed and Dirty bits to dedup code,
improve documentation, harden against unexpected changes.
Even if the hardware A/D tracking is disabled, it is possible to
use the hardware-defined A/D bits to track if a PFN is Accessed
and/or Dirty, and that removes a lot of special cases.
- Elide TLB flushes when aging secondary PTEs, as has been done in
x86's primary MMU for over 10 years.
- Recover huge pages in-place in the TDP MMU when dirty page logging
is toggled off, instead of zapping them and waiting until the page
is re-accessed to create a huge mapping. This reduces vCPU jitter.
- Batch TLB flushes when dirty page logging is toggled off. This
reduces the time it takes to disable dirty logging by ~3x.
- Remove the shrinker that was (poorly) attempting to reclaim shadow
page tables in low-memory situations.
- Clean up and optimize KVM's handling of writes to
MSR_IA32_APICBASE.
- Advertise CPUIDs for new instructions in Clearwater Forest
- Quirk KVM's misguided behavior of initialized certain feature MSRs
to their maximum supported feature set, which can result in KVM
creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to
a non-zero value results in the vCPU having invalid state if
userspace hides PDCM from the guest, which in turn can lead to
save/restore failures.
- Fix KVM's handling of non-canonical checks for vCPUs that support
LA57 to better follow the "architecture", in quotes because the
actual behavior is poorly documented. E.g. most MSR writes and
descriptor table loads ignore CR4.LA57 and operate purely on
whether the CPU supports LA57.
- Bypass the register cache when querying CPL from kvm_sched_out(),
as filling the cache from IRQ context is generally unsafe; harden
the cache accessors to try to prevent similar issues from occuring
in the future. The issue that triggered this change was already
fixed in 6.12, but was still kinda latent.
- Advertise AMD_IBPB_RET to userspace, and fix a related bug where
KVM over-advertises SPEC_CTRL when trying to support cross-vendor
VMs.
- Minor cleanups
- Switch hugepage recovery thread to use vhost_task.
These kthreads can consume significant amounts of CPU time on
behalf of a VM or in response to how the VM behaves (for example
how it accesses its memory); therefore KVM tried to place the
thread in the VM's cgroups and charge the CPU time consumed by that
work to the VM's container.
However the kthreads did not process SIGSTOP/SIGCONT, and therefore
cgroups which had KVM instances inside could not complete freezing.
Fix this by replacing the kthread with a PF_USER_WORKER thread, via
the vhost_task abstraction. Another 100+ lines removed, with
generally better behavior too like having these threads properly
parented in the process tree.
- Revert a workaround for an old CPU erratum (Nehalem/Westmere) that
didn't really work; there was really nothing to work around anyway:
the broken patch was meant to fix nested virtualization, but the
PERF_GLOBAL_CTRL MSR is virtualized and therefore unaffected by the
erratum.
- Fix 6.12 regression where CONFIG_KVM will be built as a module even
if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is
'y'.
x86 selftests:
- x86 selftests can now use AVX.
Documentation:
- Use rST internal links
- Reorganize the introduction to the API document
Generic:
- Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock
instead of RCU, so that running a vCPU on a different task doesn't
encounter long due to having to wait for all CPUs become quiescent.
In general both reads and writes are rare, but userspace that
supports confidential computing is introducing the use of "helper"
vCPUs that may jump from one host processor to another. Those will
be very happy to trigger a synchronize_rcu(), and the effect on
performance is quite the disaster"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (298 commits)
KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL || KVM_AMD
KVM: x86: add back X86_LOCAL_APIC dependency
Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()"
KVM: x86: switch hugepage recovery thread to vhost_task
KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR
x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest
Documentation: KVM: fix malformed table
irqchip/loongson-eiointc: Add virt extension support
LoongArch: KVM: Add irqfd support
LoongArch: KVM: Add PCHPIC user mode read and write functions
LoongArch: KVM: Add PCHPIC read and write functions
LoongArch: KVM: Add PCHPIC device support
LoongArch: KVM: Add EIOINTC user mode read and write functions
LoongArch: KVM: Add EIOINTC read and write functions
LoongArch: KVM: Add EIOINTC device support
LoongArch: KVM: Add IPI user mode read and write function
LoongArch: KVM: Add IPI read and write function
LoongArch: KVM: Add IPI device support
LoongArch: KVM: Add iocsr and mmio bus simulation in kernel
KVM: arm64: Pass on SVE mapping failures
...
Sergey Senozhatsky improves zram's post-processing selection algorithm.
This leads to improved memory savings.
- Wei Yang has gone to town on the mapletree code, contributing several
series which clean up the implementation:
- "refine mas_mab_cp()"
- "Reduce the space to be cleared for maple_big_node"
- "maple_tree: simplify mas_push_node()"
- "Following cleanup after introduce mas_wr_store_type()"
- "refine storing null"
- The series "selftests/mm: hugetlb_fault_after_madv improvements" from
David Hildenbrand fixes this selftest for s390.
- The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
implements some rationaizations and cleanups in the page mapping code.
- The series "mm: optimize shadow entries removal" from Shakeel Butt
optimizes the file truncation code by speeding up the handling of shadow
entries.
- The series "Remove PageKsm()" from Matthew Wilcox completes the
migration of this flag over to being a folio-based flag.
- The series "Unify hugetlb into arch_get_unmapped_area functions" from
Oscar Salvador implements a bunch of consolidations and cleanups in the
hugetlb code.
- The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
takes away the wp-fault time practice of turning a huge zero page into
small pages. Instead we replace the whole thing with a THP. More
consistent cleaner and potentiall saves a large number of pagefaults.
- The series "percpu: Add a test case and fix for clang" from Andy
Shevchenko enhances and fixes the kernel's built in percpu test code.
- The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
optimizes mremap() by avoiding doing things which we didn't need to do.
- The series "Improve the tmpfs large folio read performance" from
Baolin Wang teaches tmpfs to copy data into userspace at the folio size
rather than as individual pages. A 20% speedup was observed.
- The series "mm/damon/vaddr: Fix issue in
damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting.
- The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt
removes the long-deprecated memcgv2 charge moving feature.
- The series "fix error handling in mmap_region() and refactor" from
Lorenzo Stoakes cleanup up some of the mmap() error handling and
addresses some potential performance issues.
- The series "x86/module: use large ROX pages for text allocations" from
Mike Rapoport teaches x86 to use large pages for read-only-execute
module text.
- The series "page allocation tag compression" from Suren Baghdasaryan
is followon maintenance work for the new page allocation profiling
feature.
- The series "page->index removals in mm" from Matthew Wilcox remove
most references to page->index in mm/. A slow march towards shrinking
struct page.
- The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
interface tests" from Andrew Paniakin performs maintenance work for
DAMON's self testing code.
- The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
improves zswap's batching of compression and decompression. It is a
step along the way towards using Intel IAA hardware acceleration for
this zswap operation.
- The series "kasan: migrate the last module test to kunit" from
Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests
over to the KUnit framework.
- The series "implement lightweight guard pages" from Lorenzo Stoakes
permits userapace to place fault-generating guard pages within a single
VMA, rather than requiring that multiple VMAs be created for this.
Improved efficiencies for userspace memory allocators are expected.
- The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
tracepoints to provide increased visibility into memcg stats flushing
activity.
- The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
fixes a zram buglet which potentially affected performance.
- The series "mm: add more kernel parameters to control mTHP" from
Maíra Canal enhances our ability to control/configuremultisize THP from
the kernel boot command line.
- The series "kasan: few improvements on kunit tests" from Sabyrzhan
Tasbolatov has a couple of fixups for the KASAN KUnit tests.
- The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
from Kairui Song optimizes list_lru memory utilization when lockdep is
enabled.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzwFqgAKCRDdBJ7gKXxA
jkeuAQCkl+BmeYHE6uG0hi3pRxkupseR6DEOAYIiTv0/l8/GggD/Z3jmEeqnZaNq
xyyenpibWgUoShU2wZ/Ha8FE5WDINwg=
=JfWR
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- The series "zram: optimal post-processing target selection" from
Sergey Senozhatsky improves zram's post-processing selection
algorithm. This leads to improved memory savings.
- Wei Yang has gone to town on the mapletree code, contributing several
series which clean up the implementation:
- "refine mas_mab_cp()"
- "Reduce the space to be cleared for maple_big_node"
- "maple_tree: simplify mas_push_node()"
- "Following cleanup after introduce mas_wr_store_type()"
- "refine storing null"
- The series "selftests/mm: hugetlb_fault_after_madv improvements" from
David Hildenbrand fixes this selftest for s390.
- The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
implements some rationaizations and cleanups in the page mapping
code.
- The series "mm: optimize shadow entries removal" from Shakeel Butt
optimizes the file truncation code by speeding up the handling of
shadow entries.
- The series "Remove PageKsm()" from Matthew Wilcox completes the
migration of this flag over to being a folio-based flag.
- The series "Unify hugetlb into arch_get_unmapped_area functions" from
Oscar Salvador implements a bunch of consolidations and cleanups in
the hugetlb code.
- The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
takes away the wp-fault time practice of turning a huge zero page
into small pages. Instead we replace the whole thing with a THP. More
consistent cleaner and potentiall saves a large number of pagefaults.
- The series "percpu: Add a test case and fix for clang" from Andy
Shevchenko enhances and fixes the kernel's built in percpu test code.
- The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
optimizes mremap() by avoiding doing things which we didn't need to
do.
- The series "Improve the tmpfs large folio read performance" from
Baolin Wang teaches tmpfs to copy data into userspace at the folio
size rather than as individual pages. A 20% speedup was observed.
- The series "mm/damon/vaddr: Fix issue in
damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON
splitting.
- The series "memcg-v1: fully deprecate charge moving" from Shakeel
Butt removes the long-deprecated memcgv2 charge moving feature.
- The series "fix error handling in mmap_region() and refactor" from
Lorenzo Stoakes cleanup up some of the mmap() error handling and
addresses some potential performance issues.
- The series "x86/module: use large ROX pages for text allocations"
from Mike Rapoport teaches x86 to use large pages for
read-only-execute module text.
- The series "page allocation tag compression" from Suren Baghdasaryan
is followon maintenance work for the new page allocation profiling
feature.
- The series "page->index removals in mm" from Matthew Wilcox remove
most references to page->index in mm/. A slow march towards shrinking
struct page.
- The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
interface tests" from Andrew Paniakin performs maintenance work for
DAMON's self testing code.
- The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
improves zswap's batching of compression and decompression. It is a
step along the way towards using Intel IAA hardware acceleration for
this zswap operation.
- The series "kasan: migrate the last module test to kunit" from
Sabyrzhan Tasbolatov completes the migration of the KASAN built-in
tests over to the KUnit framework.
- The series "implement lightweight guard pages" from Lorenzo Stoakes
permits userapace to place fault-generating guard pages within a
single VMA, rather than requiring that multiple VMAs be created for
this. Improved efficiencies for userspace memory allocators are
expected.
- The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
tracepoints to provide increased visibility into memcg stats flushing
activity.
- The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
fixes a zram buglet which potentially affected performance.
- The series "mm: add more kernel parameters to control mTHP" from
Maíra Canal enhances our ability to control/configuremultisize THP
from the kernel boot command line.
- The series "kasan: few improvements on kunit tests" from Sabyrzhan
Tasbolatov has a couple of fixups for the KASAN KUnit tests.
- The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
from Kairui Song optimizes list_lru memory utilization when lockdep
is enabled.
* tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits)
cma: enforce non-zero pageblock_order during cma_init_reserved_mem()
mm/kfence: add a new kunit test test_use_after_free_read_nofault()
zram: fix NULL pointer in comp_algorithm_show()
memcg/hugetlb: add hugeTLB counters to memcg
vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event
mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount
zram: ZRAM_DEF_COMP should depend on ZRAM
MAINTAINERS/MEMORY MANAGEMENT: add document files for mm
Docs/mm/damon: recommend academic papers to read and/or cite
mm: define general function pXd_init()
kmemleak: iommu/iova: fix transient kmemleak false positive
mm/list_lru: simplify the list_lru walk callback function
mm/list_lru: split the lock to per-cgroup scope
mm/list_lru: simplify reparenting and initial allocation
mm/list_lru: code clean up for reparenting
mm/list_lru: don't export list_lru_add
mm/list_lru: don't pass unnecessary key parameters
kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller
kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW
kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols
...
Fix the dtc warnings:
arch/mips/boot/dts/loongson/ls7a-pch.dtsi:68.16-416.5: Warning (interrupt_provider): /bus@10000000/pci@1a000000: '#interrupt-cells' found, but node is not an interrupt provider
arch/mips/boot/dts/loongson/ls7a-pch.dtsi:68.16-416.5: Warning (interrupt_provider): /bus@10000000/pci@1a000000: '#interrupt-cells' found, but node is not an interrupt provider
arch/mips/boot/dts/loongson/loongson64g_4core_ls7a.dtb: Warning (interrupt_map): Failed prerequisite 'interrupt_provider'
And a runtime warning introduced in commit 045b14ca5c ("of: WARN on
deprecated #address-cells/#size-cells handling"):
WARNING: CPU: 0 PID: 1 at drivers/of/base.c:106 of_bus_n_addr_cells+0x9c/0xe0
Missing '#address-cells' in /bus@10000000/pci@1a000000/pci_bridge@9,0
The fix is similar to commit d89a415ff8 ("MIPS: Loongson64: DTS: Fix PCIe
port nodes for ls7a"), which has fixed the issue for ls2k (despite its
subject mentions ls7a).
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The most significant set of changes is the per netns RTNL. The new
behavior is disabled by default, regression risk should be contained.
Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
default value from PTP_1588_CLOCK_KVM, as the first is intended to be
a more reliable replacement for the latter.
Core
----
- Started a very large, in-progress, effort to make the RTNL lock
scope per network-namespace, thus reducing the lock contention
significantly in the containerized use-case, comprising:
- RCU-ified some relevant slices of the FIB control path
- introduce basic per netns locking helpers
- namespacified the IPv4 address hash table
- remove rtnl_register{,_module}() in favour of rtnl_register_many()
- refactor rtnl_{new,del,set}link() moving as much validation as
possible out of RTNL lock
- convert all phonet doit() and dumpit() handlers to RCU
- convert IPv4 addresses manipulation to per-netns RTNL
- convert virtual interface creation to per-netns RTNL
the per-netns lock infra is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL
knob, disabled by default ad interim.
- Introduce NAPI suspension, to efficiently switching between busy
polling (NAPI processing suspended) and normal processing.
- Migrate the IPv4 routing input, output and control path from direct
ToS usage to DSCP macros. This is a work in progress to make ECN
handling consistent and reliable.
- Add drop reasons support to the IPv4 rotue input path, allowing
better introspection in case of packets drop.
- Make FIB seqnum lockless, dropping RTNL protection for read
access.
- Make inet{,v6} addresses hashing less predicable.
- Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
and timestamps
Things we sprinkled into general kernel code
--------------------------------------------
- Add small file operations for debugfs, to reduce the struct ops size.
- Refactoring and optimization for the implementation of page_frag API,
This is a preparatory work to consolidate the page_frag
implementation.
Netfilter
---------
- Optimize set element transactions to reduce memory consumption
- Extended netlink error reporting for attribute parser failure.
- Make legacy xtables configs user selectable, giving users
the option to configure iptables without enabling any other config.
- Address a lot of false-positive RCU issues, pointed by recent
CI improvements.
BPF
---
- Put xsk sockets on a struct diet and add various cleanups. Overall,
this helps to bump performance by 12% for some workloads.
- Extend BPF selftests to increase coverage of XDP features in
combination with BPF cpumap.
- Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it.
- Extend netkit with an option to delegate skb->{mark,priority}
scrubbing to its BPF program.
- Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
programs.
Protocols
---------
- Introduces 4-tuple hash for connected udp sockets, speeding-up
significantly connected sockets lookup.
- Add a fastpath for some TCP timers that usually expires after close,
the socket lock contention.
- Add inbound and outbound xfrm state caches to speed up state lookups.
- Avoid sending MPTCP advertisements on stale subflows, reducing
risks on loosing them.
- Make neighbours table flushing more scalable, maintaining per device
neigh lists.
Driver API
----------
- Introduce a unified interface to configure transmission H/W shaping,
and expose it to user-space via generic-netlink.
- Add support for per-NAPI config via netlink. This makes napi
configuration persistent across queues removal and re-creation.
Requires driver updates, currently supported drivers are:
nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
- Add ethtool support for writing SFP / PHY firmware blocks.
- Track RSS context allocation from ethtool core.
- Implement support for mirroring to DSA CPU port, via TC mirror
offload.
- Consolidate FDB updates notification, to avoid duplicates on
device-specific entries.
- Expose DPLL clock quality level to the user-space.
- Support master-slave PHY config via device tree.
Tests and tooling
-----------------
- forwarding: introduce deferred commands, to simplify
the cleanup phase
Drivers
-------
- Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
IRQs and queues to NAPI IDs, allowing busy polling and better
introspection.
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- mlx5:
- a large refactor to implement support for cross E-Switch
scheduling
- refactor H/W conter management to let it scale better
- H/W GRO cleanups
- Intel (100G, ice)::
- adds support for ethtool reset
- implement support for per TX queue H/W shaping
- AMD/Solarflare:
- implement per device queue stats support
- Broadcom (bnxt):
- improve wildcard l4proto on IPv4/IPv6 ntuple rules
- Marvell Octeon:
- Adds representor support for each Resource Virtualization Unit
(RVU) device.
- Hisilicon:
- adds support for the BMC Gigabit Ethernet
- IBM (EMAC):
- driver cleanup and modernization
- Cisco (VIC):
- raise the queues number limit to 256
- Ethernet virtual:
- Google vNIC:
- implements page pool support
- macsec:
- inherit lower device's features and TSO limits when offloading
- virtio_net:
- enable premapped mode by default
- support for XDP socket(AF_XDP) zerocopy TX
- wireguard:
- set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
packets.
- Ethernet NICs embedded and virtual:
- Broadcom ASP:
- enable software timestamping
- Freescale:
- add enetc4 PF driver
- MediaTek: Airoha SoC:
- implement BQL support
- RealTek r8169:
- enable TSO by default on r8168/r8125
- implement extended ethtool stats
- Renesas AVB:
- enable TX checksum offload
- Synopsys (stmmac):
- support header splitting for vlan tagged packets
- move common code for DWMAC4 and DWXGMAC into a separate FPE
module.
- Add the dwmac driver support for T-HEAD TH1520 SoC
- Synopsys (xpcs):
- driver refactor and cleanup
- TI:
- icssg_prueth: add VLAN offload support
- Xilinx emaclite:
- adds clock support
- Ethernet switches:
- Microchip:
- implement support for the lan969x Ethernet switch family
- add LAN9646 switch support to KSZ DSA driver
- Ethernet PHYs:
- Marvel: 88q2x: enable auto negotiation
- Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
- PTP:
- Add support for the Amazon virtual clock device
- Add PtP driver for s390 clocks
- WiFi:
- mac80211
- EHT 1024 aggregation size for transmissions
- new operation to indicate that a new interface is to be added
- support radio separation of multi-band devices
- move wireless extension spy implementation to libiw
- Broadcom:
- brcmfmac: optional LPO clock support
- Microchip:
- add support for Atmel WILC3000
- Qualcomm (ath12k):
- firmware coredump collection support
- add debugfs support for a multitude of statistics
- Qualcomm (ath5k):
- Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
- Realtek:
- rtw88: 8821au and 8812au USB adapters support
- rtw89: add thermal protection
- rtw89: fine tune BT-coexsitence to improve user experience
- rtw89: firmware secure boot for WiFi 6 chip
- Bluetooth
- add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
0x13d3:0x3623
- add Realtek RTL8852BE support for id Foxconn 0xe123
- add MediaTek MT7920 support for wireless module ids
- btintel_pcie: add handshake between driver and firmware
- btintel_pcie: add recovery mechanism
- btnxpuart: add GPIO support to power save feature
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmc8sukSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkLEYQAIMM6Qjh0bh3Byr3gOS1xZzXG+APLjP4
9Jr0p3i+X53i90jvVqzeVO5FTc95MVHSKZ3kvPkDMXSLUaEJxocNHCI5Dzl/2/qL
wWdpUB6/ou+jKB4Bn6Z8OvVODT7qrr0tVa9M2/fuKWrIsOU/ntIhG8EhnGddk5U/
vKPSf5PUIb81uNRnF58VusY3wrT1dEoh9VfJYxL+ST+inPxjEAMy6Y+lmlsjGaSX
jrS+Pp9KYiUwl3Qt0AQs+cG4OHkJdjbnChrfosWwpkiyddO8klVq06+wX/TiSzfF
b9VZtBfy/GZs3lkE1mQkcILdtX5pP3YHQdpsuxFfVI0JHVszx2ck7WdoRux/8F0v
kKZsYcO7bH9I1wMFP66Ff9hIbdEQaeucK+KdDkXyPNMfP91Vzmfjii8IBxOC36Ie
BbOeFUrXyTxxJ2u0vf/X9JtIq8bcrkNrSd1n1jlGPMqG3FVzsY95+Oi4qfsyeUbl
lS1PlVTqPMPFdX54HnxM3y2rJjhd7iXhkvmtuXNjRFThXlOiK3maAPWlM1aZ3b8u
Vjs4JFUsW0tleZG+RzANjsGjXbf7AiPUGLZt+acem0K+fcjG4i5aGIAJrxwa/ORx
eG74IZRt5cOI371W7gNLGHjwnuge8tFPgOWcRP2eozNm7jvMYALBejYS7eWUTvaf
THcvVM+bupEZ
=GzPr
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"The most significant set of changes is the per netns RTNL. The new
behavior is disabled by default, regression risk should be contained.
Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
default value from PTP_1588_CLOCK_KVM, as the first is intended to be
a more reliable replacement for the latter.
Core:
- Started a very large, in-progress, effort to make the RTNL lock
scope per network-namespace, thus reducing the lock contention
significantly in the containerized use-case, comprising:
- RCU-ified some relevant slices of the FIB control path
- introduce basic per netns locking helpers
- namespacified the IPv4 address hash table
- remove rtnl_register{,_module}() in favour of
rtnl_register_many()
- refactor rtnl_{new,del,set}link() moving as much validation as
possible out of RTNL lock
- convert all phonet doit() and dumpit() handlers to RCU
- convert IPv4 addresses manipulation to per-netns RTNL
- convert virtual interface creation to per-netns RTNL
the per-netns lock infrastructure is guarded by the
CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim.
- Introduce NAPI suspension, to efficiently switching between busy
polling (NAPI processing suspended) and normal processing.
- Migrate the IPv4 routing input, output and control path from direct
ToS usage to DSCP macros. This is a work in progress to make ECN
handling consistent and reliable.
- Add drop reasons support to the IPv4 rotue input path, allowing
better introspection in case of packets drop.
- Make FIB seqnum lockless, dropping RTNL protection for read access.
- Make inet{,v6} addresses hashing less predicable.
- Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
and timestamps
Things we sprinkled into general kernel code:
- Add small file operations for debugfs, to reduce the struct ops
size.
- Refactoring and optimization for the implementation of page_frag
API, This is a preparatory work to consolidate the page_frag
implementation.
Netfilter:
- Optimize set element transactions to reduce memory consumption
- Extended netlink error reporting for attribute parser failure.
- Make legacy xtables configs user selectable, giving users the
option to configure iptables without enabling any other config.
- Address a lot of false-positive RCU issues, pointed by recent CI
improvements.
BPF:
- Put xsk sockets on a struct diet and add various cleanups. Overall,
this helps to bump performance by 12% for some workloads.
- Extend BPF selftests to increase coverage of XDP features in
combination with BPF cpumap.
- Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it.
- Extend netkit with an option to delegate skb->{mark,priority}
scrubbing to its BPF program.
- Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
programs.
Protocols:
- Introduces 4-tuple hash for connected udp sockets, speeding-up
significantly connected sockets lookup.
- Add a fastpath for some TCP timers that usually expires after
close, the socket lock contention.
- Add inbound and outbound xfrm state caches to speed up state
lookups.
- Avoid sending MPTCP advertisements on stale subflows, reducing
risks on loosing them.
- Make neighbours table flushing more scalable, maintaining per
device neigh lists.
Driver API:
- Introduce a unified interface to configure transmission H/W
shaping, and expose it to user-space via generic-netlink.
- Add support for per-NAPI config via netlink. This makes napi
configuration persistent across queues removal and re-creation.
Requires driver updates, currently supported drivers are:
nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
- Add ethtool support for writing SFP / PHY firmware blocks.
- Track RSS context allocation from ethtool core.
- Implement support for mirroring to DSA CPU port, via TC mirror
offload.
- Consolidate FDB updates notification, to avoid duplicates on
device-specific entries.
- Expose DPLL clock quality level to the user-space.
- Support master-slave PHY config via device tree.
Tests and tooling:
- forwarding: introduce deferred commands, to simplify the cleanup
phase
Drivers:
- Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
IRQs and queues to NAPI IDs, allowing busy polling and better
introspection.
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- mlx5:
- a large refactor to implement support for cross E-Switch
scheduling
- refactor H/W conter management to let it scale better
- H/W GRO cleanups
- Intel (100G, ice)::
- add support for ethtool reset
- implement support for per TX queue H/W shaping
- AMD/Solarflare:
- implement per device queue stats support
- Broadcom (bnxt):
- improve wildcard l4proto on IPv4/IPv6 ntuple rules
- Marvell Octeon:
- Add representor support for each Resource Virtualization Unit
(RVU) device.
- Hisilicon:
- add support for the BMC Gigabit Ethernet
- IBM (EMAC):
- driver cleanup and modernization
- Cisco (VIC):
- raise the queues number limit to 256
- Ethernet virtual:
- Google vNIC:
- implement page pool support
- macsec:
- inherit lower device's features and TSO limits when
offloading
- virtio_net:
- enable premapped mode by default
- support for XDP socket(AF_XDP) zerocopy TX
- wireguard:
- set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
packets.
- Ethernet NICs embedded and virtual:
- Broadcom ASP:
- enable software timestamping
- Freescale:
- add enetc4 PF driver
- MediaTek: Airoha SoC:
- implement BQL support
- RealTek r8169:
- enable TSO by default on r8168/r8125
- implement extended ethtool stats
- Renesas AVB:
- enable TX checksum offload
- Synopsys (stmmac):
- support header splitting for vlan tagged packets
- move common code for DWMAC4 and DWXGMAC into a separate FPE
module.
- add dwmac driver support for T-HEAD TH1520 SoC
- Synopsys (xpcs):
- driver refactor and cleanup
- TI:
- icssg_prueth: add VLAN offload support
- Xilinx emaclite:
- add clock support
- Ethernet switches:
- Microchip:
- implement support for the lan969x Ethernet switch family
- add LAN9646 switch support to KSZ DSA driver
- Ethernet PHYs:
- Marvel: 88q2x: enable auto negotiation
- Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
- PTP:
- Add support for the Amazon virtual clock device
- Add PtP driver for s390 clocks
- WiFi:
- mac80211
- EHT 1024 aggregation size for transmissions
- new operation to indicate that a new interface is to be added
- support radio separation of multi-band devices
- move wireless extension spy implementation to libiw
- Broadcom:
- brcmfmac: optional LPO clock support
- Microchip:
- add support for Atmel WILC3000
- Qualcomm (ath12k):
- firmware coredump collection support
- add debugfs support for a multitude of statistics
- Qualcomm (ath5k):
- Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
- Realtek:
- rtw88: 8821au and 8812au USB adapters support
- rtw89: add thermal protection
- rtw89: fine tune BT-coexsitence to improve user experience
- rtw89: firmware secure boot for WiFi 6 chip
- Bluetooth
- add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
0x13d3:0x3623
- add Realtek RTL8852BE support for id Foxconn 0xe123
- add MediaTek MT7920 support for wireless module ids
- btintel_pcie: add handshake between driver and firmware
- btintel_pcie: add recovery mechanism
- btnxpuart: add GPIO support to power save feature"
* tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits)
mm: page_frag: fix a compile error when kernel is not compiled
Documentation: tipc: fix formatting issue in tipc.rst
selftests: nic_performance: Add selftest for performance of NIC driver
selftests: nic_link_layer: Add selftest case for speed and duplex states
selftests: nic_link_layer: Add link layer selftest for NIC driver
bnxt_en: Add FW trace coredump segments to the coredump
bnxt_en: Add a new ethtool -W dump flag
bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
bnxt_en: Add functions to copy host context memory
bnxt_en: Do not free FW log context memory
bnxt_en: Manage the FW trace context memory
bnxt_en: Allocate backing store memory for FW trace logs
bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
bnxt_en: Refactor bnxt_free_ctx_mem()
bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
bnxt_en: Update firmware interface spec to 1.10.3.85
selftests/bpf: Add some tests with sockmap SK_PASS
bpf: fix recursive lock when verdict program return SK_PASS
wireguard: device: support big tcp GSO
wireguard: selftests: load nf_conntrack if not present
...
These are a number of unrelated cleanups, generally simplifying the
architecture specific header files:
- A series from Al Viro simplifies asm/vga.h, after it turns out that
most of it can be generalized.
- A series from Julian Vetter adds a common version of
memcpy_{to,from}io() and memset_io() and changes most architectures
to use that instead of their own implementation
- A series from Niklas Schnelle concludes his work to make PC
style inb()/outb() optional
- Nicolas Pitre contributes improvements for the generic do_div()
helper
- Christoph Hellwig adds a generic version of page_to_phys()
and phys_to_page(), replacing the slightly different architecture
specific definitions.
- Uwe Kleine-Koenig has a minor cleanup for ioctl definitions
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmc+Z0gACgkQYKtH/8kJ
UicqzA/8CcqVdcWKlFAyiFI62DCkd3iYm/joNK3/JhvUIvVFvY+HI0+XpTeOEN1r
dfYBNg/KTVSbia5MEEy28Lk5WdoA3X7p9E8NuYC1ik/qvH3Y0kXDU2NiRcJDwalq
u56tGUwDITFUzRo47a4Z53JpV60FlGaUVjuKp1jJiOQkcs/iussVYuti8mNVb1ud
1tf21TEAIywq43IC8CxevIRsBkJBqMhalaGWYgKw3ZTwXdiKaXed6RH7IjPodanN
6b7R6aFEqlT7usFX9vLOYNRGzd3HIueXOT1iqiiGI1lm5u/iutxKH+8eS4q381oN
WJL0jQdo4sv2MxtSHYrjpzPRQpSp/qrin29h3PVjwBjZF3i5WvFeTYgfjQEEkqe0
fpTXjUsr5n1F1pGV90DtJHwaD5TxKD4VYFLDRCDGUiAnWPkZ7EYUBL3SA6GqEkXB
1lVRPsEBo0y867/WQcoCZA/x7ANZDI6bDZ6fjumwx8OCZOHZeN6FGtqQJHcVZR5O
+nu/j3I8YH1tZGKbA+wliyQwt/T60Oxs62HHcFzFLGakARwUEDYO53IGCJUByFwk
kCrgNVvzFklwWpqqyTADqb5lkQKpZr5gIdpst185qttCQkb+EFWiCi9w2inXTjHl
2oCc7Uf0cvoxnhVlJAw73eGTtpqS37KCWK+iNyrQbOfy+hgIv+w=
=zEHk
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"These are a number of unrelated cleanups, generally simplifying the
architecture specific header files:
- A series from Al Viro simplifies asm/vga.h, after it turns out that
most of it can be generalized.
- A series from Julian Vetter adds a common version of
memcpy_{to,from}io() and memset_io() and changes most architectures
to use that instead of their own implementation
- A series from Niklas Schnelle concludes his work to make PC style
inb()/outb() optional
- Nicolas Pitre contributes improvements for the generic do_div()
helper
- Christoph Hellwig adds a generic version of page_to_phys() and
phys_to_page(), replacing the slightly different architecture
specific definitions.
- Uwe Kleine-Koenig has a minor cleanup for ioctl definitions"
* tag 'asm-generic-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (24 commits)
empty include/asm-generic/vga.h
sparc: get rid of asm/vga.h
asm/vga.h: don't bother with scr_mem{cpy,move}v() unless we need to
vt_buffer.h: get rid of dead code in default scr_...() instances
tty: serial: export serial_8250_warn_need_ioport
lib/iomem_copy: fix kerneldoc format style
hexagon: simplify asm/io.h for !HAS_IOPORT
loongarch: Use new fallback IO memcpy/memset
csky: Use new fallback IO memcpy/memset
arm64: Use new fallback IO memcpy/memset
New implementation for IO memcpy and IO memset
watchdog: Add HAS_IOPORT dependency for SBC8360 and SBC7240
__arch_xprod64(): make __always_inline when optimizing for performance
ARM: div64: improve __arch_xprod_64()
asm-generic/div64: optimize/simplify __div64_const32()
lib/math/test_div64: add some edge cases relevant to __div64_const32()
asm-generic: add an optional pfn_valid check to page_to_phys
asm-generic: provide generic page_to_phys and phys_to_page implementations
asm-generic/io.h: Remove I/O port accessors for HAS_IOPORT=n
tty: serial: handle HAS_IOPORT dependencies
...
Add the SPI-NAND controller on the RTL9300 family of devices. This
supports serial/dual/quad data width and DMA for read/program
operations.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Bindings:
- Enable dtc "interrupt_provider" warnings for binding examples.
Fix the warnings in fsl,mu-msi and ti,sci-inta due to this.
- Convert zii,rave-sp-wdt, zii,rave-sp-pwrbutton, and
altr,fpga-passive-serial to DT schema format
- Add some documentation on the different forms of YAML text blocks
which are a constant source of review comments
- Fix some schema errors in constraints for arrays
- Add compatibles for qcom,sar2130p-pdc and onnn,adt7462
DT core:
- Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
- Add some warnings on deprecated address handling
- Rework early_init_dt_scan() so the arch can pass in the phys address
of the DTB as __pa() is not always valid to use. This fixes a warning
for arm64 with kexec.
- Add and use some new DT graph iterators for iterating over ports and
endpoints
- Rework reserved-memory handling to be sized dynamically for fixed
regions
- Optimize of_modalias() to avoid a strlen() call
- Constify struct device_node and property pointers where ever possible
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmc7qaoACgkQ+vtdtY28
YcN54g/+Ifz4hQTSWV+VBhihovMMPiQUdxZ+MfJfPnPcZ7NJzaTf+zqhZyS4wQou
v0pdtyR0B1fCM/EvKaYD+1aTTAQFEIT5Dqac+9ePwqaYqSk+yCTxyzW9m+P3rTPV
THo8SGRss7T+Rs+2WaUGxphTJItMGIRdbBvoqK+82EdKFXXKw2BSD8tlJTWwbTam
9xkrpUzw7f4FvVY8vVhRyOd5i8/v+FH8D65DMIT6ME9zRn4MzKVzCg6udgYeCBld
C2XbV+wnyewtjrN2IX+2uQ2mheb7yJu3AEI3iFR5x/sRrsSLpisxrUl38xOOpxrM
XxYtHgE3omjagQ+y+L2PMthlKvhFrXVXIvhUH8xxje5z1Vyq3VMfiABkHlMpAnys
5LY4xEhvqDkPNo65UmjMiHxGW/xtcKsmAZBOp+HLerZfCJIFvl380fi8mNg/Sjvz
7ExCSpzCPsHASZg7QCTplU3BUtg+067Ch/k8Hsn/Og73Pqm3xH4IezQZKwweN9ZT
LC6OQBI7C3Yt1hom9qgUcA4H4/aaPxTVV7i0DGuAKh8Lon6SaoX2yFpweUBgbsL/
c9DIW4vbYBIGASxxUbHlNMKvPCKACKmpFXhsnH5Waj+VWSOwsJ8bjGpH8PfMKdFW
dyJB/r94GqCGpCW7+FC1qGmXiQJGkCo89pKBVjSf4Kj45ht/76o=
=NCYS
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"Bindings:
- Enable dtc "interrupt_provider" warnings for binding examples. Fix
the warnings in fsl,mu-msi and ti,sci-inta due to this.
- Convert zii,rave-sp-wdt, zii,rave-sp-pwrbutton, and
altr,fpga-passive-serial to DT schema format
- Add some documentation on the different forms of YAML text blocks
which are a constant source of review comments
- Fix some schema errors in constraints for arrays
- Add compatibles for qcom,sar2130p-pdc and onnn,adt7462
DT core:
- Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
- Add some warnings on deprecated address handling
- Rework early_init_dt_scan() so the arch can pass in the phys
address of the DTB as __pa() is not always valid to use. This fixes
a warning for arm64 with kexec.
- Add and use some new DT graph iterators for iterating over ports
and endpoints
- Rework reserved-memory handling to be sized dynamically for fixed
regions
- Optimize of_modalias() to avoid a strlen() call
- Constify struct device_node and property pointers where ever
possible"
* tag 'devicetree-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (36 commits)
of: Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
dt-bindings: interrupt-controller: qcom,pdc: Add SAR2130P compatible
of/address: Rework bus matching to avoid warnings
of: WARN on deprecated #address-cells/#size-cells handling
of/fdt: Don't use default address cell sizes for address translation
dt-bindings: Enable dtc "interrupt_provider" warnings
of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify
dt-bindings: cache: qcom,llcc: Fix X1E80100 reg entries
dt-bindings: watchdog: convert zii,rave-sp-wdt.txt to yaml format
dt-bindings: input: convert zii,rave-sp-pwrbutton.txt to yaml
media: xilinx-tpg: use new of_graph functions
fbdev: omapfb: use new of_graph functions
gpu: drm: omapdrm: use new of_graph functions
ASoC: audio-graph-card2: use new of_graph functions
ASoC: audio-graph-card: use new of_graph functions
ASoC: test-component: use new of_graph functions
of: property: use new of_graph functions
of: property: add of_graph_get_next_port_endpoint()
of: property: add of_graph_get_next_port()
of: module: remove strlen() call in of_modalias()
...
- The final step to get rid of auto-rearming posix-timers
posix-timers are currently auto-rearmed by the kernel when the signal
of the timer is ignored so that the timer signal can be delivered once
the corresponding signal is unignored.
This requires to throttle the timer to prevent a DoS by small intervals
and keeps the system pointlessly out of low power states for no value.
This is a long standing non-trivial problem due to the lock order of
posix-timer lock and the sighand lock along with life time issues as
the timer and the sigqueue have different life time rules.
Cure this by:
* Embedding the sigqueue into the timer struct to have the same life
time rules. Aside of that this also avoids the lookup of the timer
in the signal delivery and rearm path as it's just a always valid
container_of() now.
* Queuing ignored timer signals onto a seperate ignored list.
* Moving queued timer signals onto the ignored list when the signal is
switched to SIG_IGN before it could be delivered.
* Walking the ignored list when SIG_IGN is lifted and requeue the
signals to the actual signal lists. This allows the signal delivery
code to rearm the timer.
This also required to consolidate the signal delivery rules so they are
consistent across all situations. With that all self test scenarios
finally succeed.
- Core infrastructure for VFS multigrain timestamping
This is required to allow the kernel to use coarse grained time stamps
by default and switch to fine grained time stamps when inode attributes
are actively observed via getattr().
These changes have been provided to the VFS tree as well, so that the
VFS specific infrastructure could be built on top.
- Cleanup and consolidation of the sleep() infrastructure
* Move all sleep and timeout functions into one file
* Rework udelay() and ndelay() into proper documented inline functions
and replace the hardcoded magic numbers by proper defines.
* Rework the fsleep() implementation to take the reality of the timer
wheel granularity on different HZ values into account. Right now the
boundaries are hard coded time ranges which fail to provide the
requested accuracy on different HZ settings.
* Update documentation for all sleep/timeout related functions and fix
up stale documentation links all over the place
* Fixup a few usage sites
- Rework of timekeeping and adjtimex(2) to prepare for multiple PTP clocks
A system can have multiple PTP clocks which are participating in
seperate and independent PTP clock domains. So far the kernel only
considers the PTP clock which is based on CLOCK TAI relevant as that's
the clock which drives the timekeeping adjustments via the various user
space daemons through adjtimex(2).
The non TAI based clock domains are accessible via the file descriptor
based posix clocks, but their usability is very limited. They can't be
accessed fast as they always go all the way out to the hardware and
they cannot be utilized in the kernel itself.
As Time Sensitive Networking (TSN) gains traction it is required to
provide fast user and kernel space access to these clocks.
The approach taken is to utilize the timekeeping and adjtimex(2)
infrastructure to provide this access in a similar way how the kernel
provides access to clock MONOTONIC, REALTIME etc.
Instead of creating a duplicated infrastructure this rework converts
timekeeping and adjtimex(2) into generic functionality which operates
on pointers to data structures instead of using static variables.
This allows to provide time accessors and adjtimex(2) functionality for
the independent PTP clocks in a subsequent step.
- Consolidate hrtimer initialization
hrtimers are set up by initializing the data structure and then
seperately setting the callback function for historical reasons.
That's an extra unnecessary step and makes Rust support less straight
forward than it should be.
Provide a new set of hrtimer_setup*() functions and convert the core
code and a few usage sites of the less frequently used interfaces over.
The bulk of the htimer_init() to hrtimer_setup() conversion is already
prepared and scheduled for the next merge window.
- Drivers:
* Ensure that the global timekeeping clocksource is utilizing the
cluster 0 timer on MIPS multi-cluster systems.
Otherwise CPUs on different clusters use their cluster specific
clocksource which is not guaranteed to be synchronized with other
clusters.
* Mostly boring cleanups, fixes, improvements and code movement
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7kPITHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoZKkD/9OUL6fOJrDUmOYBa4QVeMyfTef4EaL
tvwIMM/29XQFeiq3xxCIn+EMnHjXn2lvIhYGQ7GKsbKYwvJ7ZBDpQb+UMhZ2nKI9
6D6BP6WomZohKeH2fZbJQAdqOi3KRYdvQdIsVZUexkqiaVPphRvOH9wOr45gHtZM
EyMRSotPlQTDqcrbUejDMEO94GyjDCYXRsyATLxjmTzL/N4xD4NRIiotjM2vL/a9
8MuCgIhrKUEyYlFoOxxeokBsF3kk3/ez2jlG9b/N8VLH3SYIc2zgL58FBgWxlmgG
bY71nVG3nUgEjxBd2dcXAVVqvb+5widk8p6O7xxOAQKTLMcJ4H0tQDkMnzBtUzvB
DGAJDHAmAr0g+ja9O35Pkhunkh4HYFIbq0Il4d1HMKObhJV0JumcKuQVxrXycdm3
UZfq3seqHsZJQbPgCAhlFU0/2WWScocbee9bNebGT33KVwSp5FoVv89C/6Vjb+vV
Gusc3thqrQuMAZW5zV8g4UcBAA/xH4PB0I+vHib+9XPZ4UQ7/6xKl2jE0kd5hX7n
AAUeZvFNFqIsY+B6vz+Jx/yzyM7u5cuXq87pof5EHVFzv56lyTp4ToGcOGYRgKH5
JXeYV1OxGziSDrd5vbf9CzdWMzqMvTefXrHbWrjkjhNOe8E1A8O88RZ5uRKZhmSw
hZZ4hdM9+3T7cg==
=2VC6
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"A rather large update for timekeeping and timers:
- The final step to get rid of auto-rearming posix-timers
posix-timers are currently auto-rearmed by the kernel when the
signal of the timer is ignored so that the timer signal can be
delivered once the corresponding signal is unignored.
This requires to throttle the timer to prevent a DoS by small
intervals and keeps the system pointlessly out of low power states
for no value. This is a long standing non-trivial problem due to
the lock order of posix-timer lock and the sighand lock along with
life time issues as the timer and the sigqueue have different life
time rules.
Cure this by:
- Embedding the sigqueue into the timer struct to have the same
life time rules. Aside of that this also avoids the lookup of
the timer in the signal delivery and rearm path as it's just a
always valid container_of() now.
- Queuing ignored timer signals onto a seperate ignored list.
- Moving queued timer signals onto the ignored list when the
signal is switched to SIG_IGN before it could be delivered.
- Walking the ignored list when SIG_IGN is lifted and requeue the
signals to the actual signal lists. This allows the signal
delivery code to rearm the timer.
This also required to consolidate the signal delivery rules so they
are consistent across all situations. With that all self test
scenarios finally succeed.
- Core infrastructure for VFS multigrain timestamping
This is required to allow the kernel to use coarse grained time
stamps by default and switch to fine grained time stamps when inode
attributes are actively observed via getattr().
These changes have been provided to the VFS tree as well, so that
the VFS specific infrastructure could be built on top.
- Cleanup and consolidation of the sleep() infrastructure
- Move all sleep and timeout functions into one file
- Rework udelay() and ndelay() into proper documented inline
functions and replace the hardcoded magic numbers by proper
defines.
- Rework the fsleep() implementation to take the reality of the
timer wheel granularity on different HZ values into account.
Right now the boundaries are hard coded time ranges which fail
to provide the requested accuracy on different HZ settings.
- Update documentation for all sleep/timeout related functions
and fix up stale documentation links all over the place
- Fixup a few usage sites
- Rework of timekeeping and adjtimex(2) to prepare for multiple PTP
clocks
A system can have multiple PTP clocks which are participating in
seperate and independent PTP clock domains. So far the kernel only
considers the PTP clock which is based on CLOCK TAI relevant as
that's the clock which drives the timekeeping adjustments via the
various user space daemons through adjtimex(2).
The non TAI based clock domains are accessible via the file
descriptor based posix clocks, but their usability is very limited.
They can't be accessed fast as they always go all the way out to
the hardware and they cannot be utilized in the kernel itself.
As Time Sensitive Networking (TSN) gains traction it is required to
provide fast user and kernel space access to these clocks.
The approach taken is to utilize the timekeeping and adjtimex(2)
infrastructure to provide this access in a similar way how the
kernel provides access to clock MONOTONIC, REALTIME etc.
Instead of creating a duplicated infrastructure this rework
converts timekeeping and adjtimex(2) into generic functionality
which operates on pointers to data structures instead of using
static variables.
This allows to provide time accessors and adjtimex(2) functionality
for the independent PTP clocks in a subsequent step.
- Consolidate hrtimer initialization
hrtimers are set up by initializing the data structure and then
seperately setting the callback function for historical reasons.
That's an extra unnecessary step and makes Rust support less
straight forward than it should be.
Provide a new set of hrtimer_setup*() functions and convert the
core code and a few usage sites of the less frequently used
interfaces over.
The bulk of the htimer_init() to hrtimer_setup() conversion is
already prepared and scheduled for the next merge window.
- Drivers:
- Ensure that the global timekeeping clocksource is utilizing the
cluster 0 timer on MIPS multi-cluster systems.
Otherwise CPUs on different clusters use their cluster specific
clocksource which is not guaranteed to be synchronized with
other clusters.
- Mostly boring cleanups, fixes, improvements and code movement"
* tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits)
posix-timers: Fix spurious warning on double enqueue versus do_exit()
clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties
clocksource/drivers/gpx: Remove redundant casts
clocksource/drivers/timer-ti-dm: Fix child node refcount handling
dt-bindings: timer: actions,owl-timer: convert to YAML
clocksource/drivers/ralink: Add Ralink System Tick Counter driver
clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource
clocksource/drivers/timer-ti-dm: Don't fail probe if int not found
clocksource/drivers:sp804: Make user selectable
clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions
hrtimers: Delete hrtimer_init_on_stack()
alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack()
io_uring: Switch to use hrtimer_setup_on_stack()
sched/idle: Switch to use hrtimer_setup_on_stack()
hrtimers: Delete hrtimer_init_sleeper_on_stack()
wait: Switch to use hrtimer_setup_sleeper_on_stack()
timers: Switch to use hrtimer_setup_sleeper_on_stack()
net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack()
futex: Switch to use hrtimer_setup_sleeper_on_stack()
fs/aio: Switch to use hrtimer_setup_sleeper_on_stack()
...
The VDSO data page handling is architecture specific for historical
reasons, but there is no real technical reason to do so.
Aside of that VDSO data has become a dump ground for various mechanisms
and fail to provide a clear separation of the functionalities.
Clean this up by:
* consolidating the VDSO page data by getting rid of architecture
specific warts especially in x86 and PowerPC.
* removing the last includes of header files which are pulling in other
headers outside of the VDSO namespace.
* seperating timekeeping and other VDSO data accordingly.
Further consolidation of the VDSO page handling is done in subsequent
changes scheduled for the next merge window.
This also lays the ground for expanding the VDSO time getters for
independent PTP clocks in a generic way without making every architecture
add support seperately.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7kyoTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoVBjD/9awdN2YeCGIM9rlHIktUdNRmRSL2SL
6av1CPffN5DenONYTXWrDYPkC4yfjUwIs8H57uzFo10yA7RQ/Qfq+O68k5GnuFew
jvpmmYSZ6TT21AmAaCIhn+kdl9YbEJFvN2AWH85Bl29k9FGB04VzJlQMMjfEZ1a5
Mhwv+cfYNuPSZmU570jcxW2XgbyTWlLZBByXX/Tuz9bwpmtszba507bvo45x6gIP
twaWNzrsyJpdXfMrfUnRiChN8jHlDN7I6fgQvpsoRH5FOiVwIFo0Ip2rKbk+ONfD
W/rcU5oeqRIxRVDHzf2Sv8WPHMCLRv01ZHBcbJOtgvZC3YiKgKYoeEKabu9ZL1BH
6VmrxjYOBBFQHOYAKPqBuS7BgH5PmtMbDdSZXDfRaAKaCzhCRysdlWW7z48r2R//
zPufb7J6Tle23AkuZWhFjvlGgSBl4zxnTFn31HYOyQps3TMI4y50Z2DhE/EeU8a6
DRl8/k1KQVDUZ6udJogS5kOr1J8pFtUPrA2uhR8UyLdx7YKiCzcdO1qWAjtXlVe8
oNpzinU+H9bQqGe9IyS7kCG9xNaCRZNkln5Q1WfnkTzg5f6ihfaCvIku3l4bgVpw
3HmcxYiC6RxQB+ozwN7hzCCKT4L9aMhr/457TNOqRkj2Elw3nvJ02L4aI86XAKLE
jwO9Fkp9qcCxCw==
=q5eD
-----END PGP SIGNATURE-----
Merge tag 'timers-vdso-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull vdso data page handling updates from Thomas Gleixner:
"First steps of consolidating the VDSO data page handling.
The VDSO data page handling is architecture specific for historical
reasons, but there is no real technical reason to do so.
Aside of that VDSO data has become a dump ground for various
mechanisms and fail to provide a clear separation of the
functionalities.
Clean this up by:
- consolidating the VDSO page data by getting rid of architecture
specific warts especially in x86 and PowerPC.
- removing the last includes of header files which are pulling in
other headers outside of the VDSO namespace.
- seperating timekeeping and other VDSO data accordingly.
Further consolidation of the VDSO page handling is done in subsequent
changes scheduled for the next merge window.
This also lays the ground for expanding the VDSO time getters for
independent PTP clocks in a generic way without making every
architecture add support seperately"
* tag 'timers-vdso-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
x86/vdso: Add missing brackets in switch case
vdso: Rename struct arch_vdso_data to arch_vdso_time_data
powerpc: Split systemcfg struct definitions out from vdso
powerpc: Split systemcfg data out of vdso data page
powerpc: Add kconfig option for the systemcfg page
powerpc/pseries/lparcfg: Use num_possible_cpus() for potential processors
powerpc/pseries/lparcfg: Fix printing of system_active_processors
powerpc/procfs: Propagate error of remap_pfn_range()
powerpc/vdso: Remove offset comment from 32bit vdso_arch_data
x86/vdso: Split virtual clock pages into dedicated mapping
x86/vdso: Delete vvar.h
x86/vdso: Access vdso data without vvar.h
x86/vdso: Move the rng offset to vsyscall.h
x86/vdso: Access rng vdso data without vvar.h
x86/vdso: Access timens vdso data without vvar.h
x86/vdso: Allocate vvar page from C code
x86/vdso: Access rng data from kernel without vvar
x86/vdso: Place vdso_data at beginning of vvar page
x86/vdso: Use __arch_get_vdso_data() to access vdso data
x86/mm/mmap: Remove arch_vma_name()
...
add *xattrat() syscalls, sanitize struct filename handling in there.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZzdj4gAKCRBZ7Krx/gZQ
6/02AQC8ndn9i1wLGRb5DdZYGNWUDhXCdPrZCF2nyvU2swCIPwEAm1H5F/bxBXeT
6qCLHThVw4KTJOT2aDY03ELrxbi8Vg4=
=35Oj
-----END PGP SIGNATURE-----
Merge tag 'pull-xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull xattr updates from Al Viro:
"Sanitize xattr and io_uring interactions with it, add *xattrat()
syscalls, sanitize struct filename handling in there"
* tag 'pull-xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
xattr: remove redundant check on variable err
fs/xattr: add *at family syscalls
new helpers: file_removexattr(), filename_removexattr()
new helpers: file_listxattr(), filename_listxattr()
replace do_getxattr() with saner helpers.
replace do_setxattr() with saner helpers.
new helper: import_xattr_name()
fs: rename struct xattr_ctx to kernel_xattr_ctx
xattr: switch to CLASS(fd)
io_[gs]etxattr_prep(): just use getname()
io_uring: IORING_OP_F[GS]ETXATTR is fine with REQ_F_FIXED_FILE
getname_maybe_null() - the third variant of pathname copy-in
teach filename_lookup() to treat NULL filename as ""
changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzkr6AAKCRDdBJ7gKXxA
jsb2AP9HCOI4w9rQTmBdnaefXytS7fiiPq+LVNpjJ0NGXX2FSgD/e1NM0wi8KevQ
npcvlqTcXtRSJvYNF904aTNyDn+Kuw0=
=KFGY
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-11-16-15-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"10 hotfixes, 7 of which are cc:stable. All singletons, please see the
changelogs for details"
* tag 'mm-hotfixes-stable-2024-11-16-15-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: revert "mm: shmem: fix data-race in shmem_getattr()"
ocfs2: uncache inode which has failed entering the group
mm: fix NULL pointer dereference in alloc_pages_bulk_noprof
mm, doc: update read_ahead_kb for MADV_HUGEPAGE
fs/proc/task_mmu: prevent integer overflow in pagemap_scan_get_args()
sched/task_stack: fix object_is_on_stack() for KASAN tagged pointers
crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32
mm/mremap: fix address wraparound in move_page_tables()
tools/mm: fix compile error
mm, swap: fix allocation and scanning race with swapoff
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmccju4ACgkQxycdCkmx
i6db/BAAyqJ3joT62295GBRcgFMn+nZe/1Tq4jkFAj7saoCzFRn70SWu0nq3Zaed
xaQNl5BePD57ySUtsGm80qTISxOSE3zJ3c1S2vSisO6pjryMGuUFI1nPynD5sFz8
O+GMQB85qtZMOQZUtNQ6kt3gugGdoIvdgM0D/mG4qhRa5H6EeagV7rL+h89AAQ1M
c+ggOxpX2RKFFNsw+7xBjlIEMnYtl+Qq5ifgp5wUGetbQwUhWBF9lwBm7OV06Dp5
pnwQt2ctw2D12ZFyG25ncHHjf3ZzEuNU75SWyJQT+MuAO+4eSAplAr+MwvOytwqc
bxeWz5T4qfV3RqFCVJMpTzYrBG6jgUZrvUA1s4NNCXi07YhakZIlKcCql0tGAlkN
xg2MQxlwp0vmlayzq6PcT0zem8/pr0hvEUGl6FOTq1yCuRnUpre/kAIJY5QdQ/P7
TJ1KczyOJL56VEYwFEWmIFWKdz/XDmJxehiQVvFHOBA89zxsNv3/+oH6knMXEFXl
nmRwCDIjtZTv0QORL3h/odULwSxa6cN56+MrooIlrsDBDAajCXAj8lwBatgmI1QT
sCo1z2GMWQZ2j9R5nfyZRdrwIn8LXMozzRd/cfBkVi7hn7Mp8bz3xDOtK8TFik7l
iYmekMXH7itafm8yhDstLbk2dmymmk3+Gk41qr83vrAVZXk3YjY=
=GphV
-----END PGP SIGNATURE-----
Merge tag 'v6.12-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a regression in the MIPS CRC32C code"
* tag 'v6.12-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: mips/crc32 - fix the CRC32C implementation
Change the declaration of clocks: remove all fixed clocks and declare
system-controllers (OLB) as clock providers.
Remove eyeq6h-fixed-clocks.dtsi and move the crystal clock to the main
eyeq6h.dtsi file.
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Change the structure of the clock tree: rather than individual
devicetree nodes registering each fixed factor clock derived from OLB
PLLs, have the OLB node provide the necessary clocks.
Remove eyeq5-clocks.dtsi and move the three remaining "fixed-clock"s to
the main eyeq5.dtsi file.
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Fixes boot failures on 6.9 on PPC_BOOK3S_32 machines using Open Firmware.
On these machines, the kernel refuses to boot from non-zero
PHYSICAL_START, which occurs when CRASH_DUMP is on.
Since most PPC_BOOK3S_32 machines boot via Open Firmware, it should
default to off for them. Users booting via some other mechanism can still
turn it on explicitly.
Does not change the default on any other architectures for the
time being.
Link: https://lkml.kernel.org/r/20240917163720.1644584-1-dave@vasilevsky.ca
Fixes: 75bc255a74 ("crash: clean up kdump related config items")
Signed-off-by: Dave Vasilevsky <dave@vasilevsky.ca>
Reported-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Closes: https://lists.debian.org/debian-powerpc/2024/07/msg00001.html
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
System Tick Counter is present on Ralink SoCs RT3352 and MT7620. This
driver has been in 'arch/mips/ralink' directory since the beggining of
Ralink architecture support. However, it can be moved into a more proper
place in 'drivers/clocksource'. Hence add it here adding also support for
compile test targets and reducing LOC in architecture code folder.
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20241028203643.191268-2-sergio.paracuellos@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
- Drop obsolete references to PPC970 KVM, which was removed 10 years ago.
- Fix incorrect references to non-existing ioctls
- List registers supported by KVM_GET/SET_ONE_REG on s390
- Use rST internal links
- Reorganize the introduction to the API document
Add the I2C controllers that are part of the RTL9300 SoC.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The board level reset on systems using the RTL9302 can be driven via the
switch. Use a syscon-reboot node to represent this.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
A significant number of 3A4000 machines come with NVMe drives
pre-installed, so we should support it in its defconfig.
Tested-by: Erpeng Xu <xuerpeng@uniontech.com>
Tested-by: Qiang Ma <maqianga@uniontech.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Due to long-term changes in kernel build configurations,
run 'make savedefconfig' to update the build configuration
dependencies.
This commit does not affect the actual .config file content,
in preparation for future modifications to loongson3_defconfig.
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Remove hard-coded strings by using the str_yes_no() helper function.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
pud_init(), pmd_init() and kernel_pte_init() are duplicated defined in
file kasan.c and sparse-vmemmap.c as weak functions. Move them to generic
header file pgtable.h, architecture can redefine them.
Link: https://lkml.kernel.org/r/20241104070712.52902-1-maobibo@loongson.cn
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
... if they are identical to fallbacks, just leave them alone.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Implement a new lightweight guard page feature, that is regions of
userland virtual memory that, when accessed, cause a fatal signal to
arise.
Currently users must establish PROT_NONE ranges to achieve this.
However this is very costly memory-wise - we need a VMA for each and every
one of these regions AND they become unmergeable with surrounding VMAs.
In addition repeated mmap() calls require repeated kernel context switches
and contention of the mmap lock to install these ranges, potentially also
having to unmap memory if installed over existing ranges.
The lightweight guard approach eliminates the VMA cost altogether - rather
than establishing a PROT_NONE VMA, it operates at the level of page table
entries - establishing PTE markers such that accesses to them cause a
fault followed by a SIGSGEV signal being raised.
This is achieved through the PTE marker mechanism, which we have already
extended to provide PTE_MARKER_GUARD, which we installed via the generic
page walking logic which we have extended for this purpose.
These guard ranges are established with MADV_GUARD_INSTALL. If the range
in which they are installed contain any existing mappings, they will be
zapped, i.e. free the range and unmap memory (thus mimicking the
behaviour of MADV_DONTNEED in this respect).
Any existing guard entries will be left untouched. There is therefore no
nesting of guarded pages.
Guarded ranges are NOT cleared by MADV_DONTNEED nor MADV_FREE (in both
instances the memory range may be reused at which point a user would
expect guards to still be in place), but they are cleared via
MADV_GUARD_REMOVE, process teardown or unmapping of memory ranges.
The guard property can be removed from ranges via MADV_GUARD_REMOVE. The
ranges over which this is applied, should they contain non-guard entries,
will be untouched, with only guard entries being cleared.
We permit this operation on anonymous memory only, and only VMAs which are
non-special, non-huge and not mlock()'d (if we permitted this we'd have to
drop locked pages which would be rather counterintuitive).
Racing page faults can cause repeated attempts to install guard pages that
are interrupted, result in a zap, and this process can end up being
repeated. If this happens more than would be expected in normal
operation, we rescind locks and retry the whole thing, which avoids lock
contention in this scenario.
Link: https://lkml.kernel.org/r/6aafb5821bf209f277dfae0787abb2ef87a37542.1730123433.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Jann Horn <jannh@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Jann Horn <jannh@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- Accelerate KVM RISC-V when running as a guest
- Perf support to collect KVM guest statistics from host side
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmct20AACgkQrUjsVaLH
LAdyWw/+Os6fIFhThsur7z26E/+7Edxk/QEfUMEEvQPwKfwiRdHYLU1dWGTCw2nA
xr8yR9MQFkUMTHZBPi+zki47uE6gW1qMb1tl8AuZ4cjZCMSg0pBObdB/axT5+517
boC74A1r04dlvW3sEvcMaHen0B75Hwm703jjOIiThiZFlB519HcattTPAJo8nTRD
D4hhpTY/v8QyFlMg0m7KnOOa3MtmR0Q704LaAdEQ5S9kdMlUtMzljHvWIzYdTfjJ
C7v7EAo+iRi7ecBdg9ZxhOkcKUB2xF+lmLYBRoiOeaECJ4RIaseY63SqLyzlGeNZ
Dx7DuF0mljC3nmHPH/60pOxDKYWGxjx5IQF1ORBvcsux4uhwH8KkyjcxYody/m82
HXalAwmN4mA2asrt0xTpIoOUkwOGy7LEKx6083hbrSgzk9E/bnoi+YDe+Gm4z+YR
ofqsj4G6x0fS85RCKB4iQGRcHgW2IPHYeKWXYYP0m1COe9kj9E3POBpAjj3qMxZr
R+eveYhLZ7A3B/DAVYQQVoCcQWECXC0M4FOOVh4vEV2JeEB55dlI77/p6X5y7RGj
YA7njv9eC5hrxOgUMOLFIvMqFKe7dcuYmE3V11+TD5lHBfqWfKE7vA1V1rOA8e8R
Qf5DXLVBbsNj2yirbj1LI3x7jg5aXw5IhylJNn/c1vezqw8RUfg=
=sI+4
-----END PGP SIGNATURE-----
Merge tag 'kvm-riscv-6.13-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.13
- Accelerate KVM RISC-V when running as a guest
- Perf support to collect KVM guest statistics from host side
Several architectures support text patching, but they name the header
files that declare patching functions differently.
Make all such headers consistently named text-patching.h and add an empty
header in asm-generic for architectures that do not support text patching.
Link: https://lkml.kernel.org/r/20241023162711.2579610-4-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: kdevops <kdevops@lists.linux.dev>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
prepare_hugepage_range() performs almost the same checks for all
architectures that define it, with the exception of mips and loongarch
that also check for overflows.
The rest checks for the addr and len to be properly aligned, so we can
move that to hugetlb_get_unmapped_area() and get rid of a fair amount of
duplicated code.
[akpm@linux-foundation.org: remove now-unused local]
Link: https://lore.kernel.org/oe-kbuild-all/202410081210.uNLbf3Jk-lkp@intel.com/
Link: https://lkml.kernel.org/r/20241007075037.267650-10-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The generic vdso/datapage.h declares a symbol named "vdso_data".
Avoid a conflict by renaming the identically named variable in genvdso.c.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-9-b64f0842d512@linutronix.de
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmck8eQACgkQxWXV+ddt
WDu05g/6AwrnvPkivC4iVOv4Wkzrpk4gm76smx91Y9B8tSDLI1pHaS27CvJz9iWl
vBKXPN3PQVQHwo6SPn+NjsFOSMkXlbBOVKpPU+MlZwH9Tuw66qcC+EnUCK2wEuAy
3TN7cUGIA4r/j+SkhgIz+Irlr5pjdb1KkPIMBEVGcVFqDIuvDaTEGBqTn2i/V5aa
dMn+gK+9rfngTOJ68t/pEFaX7SEWCvgMIcBpBB4/vs1gHm3ve2bcc1sBAdMxb1Se
SrxgZfq+Rc5tkMn540JaWGwkb0rLzwXlurK6ygTKDKCpH0IMX+pBvDkexh9Zj0ux
jejlRxiuDzTx3z2a7FjHDyp2sdZWMpq3sPsowpJ1Dsgi5EtSxTy4irmQuSAZY1Uj
/uo6YwV9aTGeiNDwZeKqKc/wOuAttaMZLr14s37pro9KxndFJ/XZBxeyB+euUCOw
B8AvAQVVIJAYQLyWINWruNKppqlgiO2RaN15RvvT2pX01d0TOx1KX1XFQku7YFxb
M/8ZNXzJ96XtkeyHL3euo3zj7N5jWtnCvPINugUG1ADQa+bc8aX336gld1neD6fs
QqIFIgzZG0l4N95viJilACrI6tW9zFnBqMyNFRhucKiX9aP9glOvhSfxfjcpDuQ/
i/LIyxVLwp8M3hPNvv8tC345+1C2ug9AD0OyhWjjIYPuiOxtTWs=
=alpB
-----END PGP SIGNATURE-----
Merge tag 'for-6.12-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more stability fixes. There's one patch adding export of MIPS
cmpxchg helper, used in the error propagation fix.
- fix error propagation from split bios to the original btrfs bio
- fix merging of adjacent extents (normal operation, defragmentation)
- fix potential use after free after freeing btrfs device structures"
* tag 'for-6.12-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix defrag not merging contiguous extents due to merged extent maps
btrfs: fix extent map merging not happening for adjacent extents
btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()
btrfs: fix error propagation of split bios
MIPS: export __cmpxchg_small()
There is currently no eyeq6 compatible string defined in the binding
documentation. Only eyeq6h version is defined, so let's use it.
Note that there are actually no codes relying on eyeq6h; the purpose
of this patch is mainly to be coherent with the documentation.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
For the bcm6358/6368 SoC the brcm,bmips-cbr-reg due to bootloader
misconfiguration or HW bug from running the system from TP1.
A workaround is now present to handle broken system that suffer from
this bug hence add the now required property.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202409251520.pE12GzHd-lkp@intel.com/
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
While most MIPS64 CPUs use 32-bit values for their VP Local Reset
Exception Base registers, some I6500 CPUs can utilize a 64-bit value,
allowing addressing up to 47 bits of physical memory.
For the EyeQ6H CPU, where physical memory addresses exceed the 4GB
limit, utilizing this feature is mandatory to enable SMP support.
Unfortunately, there is no way to detect this capability based solely
on the ID of the CPU. According to Imagination, which designed the
CPU, the only reliable method is to fill the reset base field with
0xFF and then read back its value. If the upper part of the read-back
value is zero, it indicates that the address space is limited to 32
bits.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
When MIPS_FP_SUPPORT is disabled, __sanitize_fcr31() is defined as
nothing, which triggers a gcc warning:
In file included from kernel/sched/core.c:79:
kernel/sched/core.c: In function 'context_switch':
./arch/mips/include/asm/switch_to.h:114:39: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
114 | __sanitize_fcr31(next); \
| ^
kernel/sched/core.c:5316:9: note: in expansion of macro 'switch_to'
5316 | switch_to(prev, next, prev);
| ^~~~~~~~~
Fix this by providing an empty body for __sanitize_fcr31() like one is
defined for __mips_mt_fpaff_switch_to().
Fixes: 36a498035b ("MIPS: Avoid FCSR sanitization when CONFIG_MIPS_FP_SUPPORT=n")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Replace open-coded pieces with sysfs_emit() helper in sysfs .show()
callbacks.
Signed-off-by: Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com>
Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
__pa() is only intended to be used for linear map addresses and using
it for initial_boot_params which is in fixmap for arm64 will give an
incorrect value. Hence save the physical address when it is known at
boot time when calling early_init_dt_scan for arm64 and use it at kexec
time instead of converting the virtual address using __pa().
Note that arm64 doesn't need the FDT region reserved in the DT as the
kernel explicitly reserves the passed in FDT. Therefore, only a debug
warning is fixed with this change.
Reported-by: Breno Leitao <leitao@debian.org>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Fixes: ac10be5cdb ("arm64: Use common of_kexec_alloc_and_setup_fdt()")
Link: https://lore.kernel.org/r/20241023171426.452688-1-usamaarif642@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
page_to_phys is duplicated by all architectures, and from some strange
reason placed in <asm/io.h> where it doesn't fit at all.
phys_to_page is only provided by a few architectures despite having a lot
of open coded users.
Provide generic versions in <asm-generic/memory_model.h> to make these
helpers more easily usable.
Note with this patch powerpc loses the CONFIG_DEBUG_VIRTUAL pfn_valid
check. It will be added back in a generic version later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Commit ca459e5f82 ("crypto: mips/crc32 - Clean up useless assignment
operations") changed crc32c_mips_le_hw() to use the instructions that
use the "regular" CRC32 polynomial instead of the Castagnoli polynomial.
Therefore it can't be computing CRC32C values correctly anymore.
I haven't been successful in running a MIPS kernel in QEMU, but based on
code review this is the fix that is needed.
Fixes: ca459e5f82 ("crypto: mips/crc32 - Clean up useless assignment operations")
Cc: Guan Wentao <guanwentao@uniontech.com>
Cc: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Wentao Guan <guanwentao@uniontech.com>
Acked-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert MIPS to kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-73-seanjc@google.com>
Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that MIPS can convert to kvm_release_faultin_page() without tripping
its lockdep assertion on mmu_lock being held.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-72-seanjc@google.com>
Mark pages accessed only in the slow page fault path in order to remove
an unnecessary user of kvm_pfn_to_refcounted_page(). Marking pages
accessed in the primary MMU during KVM page fault handling isn't harmful,
but it's largely pointless and likely a waste of a cycles since the
primary MMU will call into KVM via mmu_notifiers when aging pages. I.e.
KVM participates in a "pull" model, so there's no need to also "push"
updates.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-71-seanjc@google.com>
Mark pages/folios dirty only the slow page fault path, i.e. only when
mmu_lock is held and the operation is mmu_notifier-protected, as marking a
page/folio dirty after it has been written back can make some filesystems
unhappy (backing KVM guests will such filesystem files is uncommon, and
the race is minuscule, hence the lack of complaints).
See the link below for details.
Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-70-seanjc@google.com>
The VDSO implementation includes headers from outside of the
vdso/ namespace.
Introduce vdso/page.h to make sure that the generic library
uses only the allowed namespace.
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Link: https://lore.kernel.org/all/20241014151340.1639555-3-vincenzo.frascino@arm.com
Since the generic VDSO clock mode storage is used, this header file is
unused and can be removed.
This avoids including a non-VDSO header while building the VDSO,
which can lead to compilation errors.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20241010-vdso-generic-arch_update_vsyscall-v1-9-7fe5a3ea4382@linutronix.de
Konstantin reports the maintainer's address bounces.
There is no other maintainer and the driver is quite old.
There is a good chance nobody is using this driver any more.
Let's try to remove it completely, we can revert it back in
if someone complains.
Link: https://lore.kernel.org/20240925-bizarre-earwig-from-pluto-1484aa@lemu/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Denis Kirjanov <dkirjanov@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>