Patch series "kasan: migrate the last module test to kunit", v4.
copy_user_test() is the last KUnit-incompatible test with
CONFIG_KASAN_MODULE_TEST requirement, which we are going to migrate to
KUnit framework and delete the former test and Kconfig as well.
In this patch series:
- [1/3] move kasan_check_write() and check_object_size() to
do_strncpy_from_user() to cover with KASAN checks with
multiple conditions in strncpy_from_user().
- [2/3] migrated copy_user_test() to KUnit, where we can also test
strncpy_from_user() due to [1/4].
KUnits have been tested on:
- x86_64 with CONFIG_KASAN_GENERIC. Passed
- arm64 with CONFIG_KASAN_SW_TAGS. 1 fail. See [1]
- arm64 with CONFIG_KASAN_HW_TAGS. 1 fail. See [1]
[1] https://lore.kernel.org/linux-mm/CACzwLxj21h7nCcS2-KA_q7ybe+5pxH0uCDwu64q_9pPsydneWQ@mail.gmail.com/
- [3/3] delete CONFIG_KASAN_MODULE_TEST and documentation occurrences.
This patch (of 3):
Since in the commit 2865baf54077("x86: support user address masking
instead of non-speculative conditional") do_strncpy_from_user() is called
from multiple places, we should sanitize the kernel *dst memory and size
which were done in strncpy_from_user() previously.
Link: https://lkml.kernel.org/r/20241016131802.3115788-1-snovitoll@gmail.com
Link: https://lkml.kernel.org/r/20241016131802.3115788-2-snovitoll@gmail.com
Fixes: 2865baf540 ("x86: support user address masking instead of non-speculative conditional")
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Hu Haowen <2023002089@link.tyut.edu.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pick up e7ac4daeed ("mm: count zeromap read and set for swapout and
swapin") in order to move
mm: define obj_cgroup_get() if CONFIG_MEMCG is not defined
mm: zswap: modify zswap_compress() to accept a page instead of a folio
mm: zswap: rename zswap_pool_get() to zswap_pool_tryget()
mm: zswap: modify zswap_stored_pages to be atomic_long_t
mm: zswap: support large folios in zswap_store()
mm: swap: count successful large folio zswap stores in hugepage zswpout stats
mm: zswap: zswap_store_page() will initialize entry after adding to xarray.
mm: add per-order mTHP swpin counters
from mm-unstable into mm-stable.
pgalloc_tag_copy() and pgalloc_tag_split() are sizable and outside of any
performance-critical paths, so it should be fine to uninline them. Also
move their declarations into pgalloc_tag.h which seems like a more
appropriate place for them. No functional changes other than uninlining.
Link: https://lkml.kernel.org/r/20241024162318.1640781-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Sourav Panda <souravpanda@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Implement support for storing page allocation tag references directly in
the page flags instead of page extensions. sysctl.vm.mem_profiling boot
parameter it extended to provide a way for a user to request this mode.
Enabling compression eliminates memory overhead caused by page_ext and
results in better performance for page allocations. However this mode
will not work if the number of available page flag bits is insufficient to
address all kernel allocations. Such condition can happen during boot or
when loading a module. If this condition is detected, memory allocation
profiling gets disabled with an appropriate warning. By default
compression mode is disabled.
Link: https://lkml.kernel.org/r/20241023170759.999909-7-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xiongwei Song <xiongwei.song@windriver.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The memory reserved for module tags does not need to be backed by physical
pages until there are tags to store there. Change the way we reserve this
memory to allocate only virtual area for the tags and populate it with
physical pages as needed when we load a module.
[surenb@google.com: avoid execmem_vmap() when !MMU]
Link: https://lkml.kernel.org/r/20241031233611.3833002-1-surenb@google.com
Link: https://lkml.kernel.org/r/20241023170759.999909-5-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xiongwei Song <xiongwei.song@windriver.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When a module gets unloaded there is a possibility that some of the
allocations it made are still used and therefore the allocation tags
corresponding to these allocations are still referenced. As such, the
memory for these tags can't be freed. This is currently handled as an
abnormal situation and module's data section is not being unloaded. To
handle this situation without keeping module's data in memory, allow
codetags with longer lifespan than the module to be loaded into their own
separate memory. The in-use memory areas and gaps after module unloading
in this separate memory are tracked using maple trees. Allocation tags
arrange their separate memory so that it is virtually contiguous and that
will allow simple allocation tag indexing later on in this patchset. The
size of this virtually contiguous memory is set to store up to 100000
allocation tags.
[surenb@google.com: fix empty codetag module section handling]
Link: https://lkml.kernel.org/r/20241101000017.3856204-1-surenb@google.com
[akpm@linux-foundation.org: update comment, per Dan]
Link: https://lkml.kernel.org/r/20241023170759.999909-4-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xiongwei Song <xiongwei.song@windriver.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Implement a helper function to disable memory allocation profiling and use
it when creation of /proc/allocinfo fails. Ensure /proc/allocinfo does
not get created when memory allocation profiling is disabled.
Link: https://lkml.kernel.org/r/20241023170759.999909-3-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xiongwei Song <xiongwei.song@windriver.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Since gfp & GFP_ATOMIC == GFP_ATOMIC is true for GFP_KERNEL | GFP_HIGH, it
will use kmalloc if user specifies that combination. Here the reason why
combining the __vmalloc_node() and kmalloc_node() is that the vmalloc does
not support all GFP flag, especially GFP_ATOMIC. So we should check if
gfp & (GFP_ATOMIC | GFP_KERNEL) != GFP_ATOMIC for vmalloc first. This
ensures caller can sleep. And for the robustness, even if vmalloc fails,
it should retry with kmalloc to allocate it.
Link: https://lkml.kernel.org/r/173008598713.1262174.2959179484209897252.stgit@mhiramat.roam.corp.google.com
Fixes: aff1871bfc ("objpool: fix choosing allocation for percpu slots")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/all/CAHk-=whO+vSH+XVRio8byJU8idAWES0SPGVZ7KAVdc4qrV0VUA@mail.gmail.com/
Cc: Leo Yan <leo.yan@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Wu <wuqiang.matt@bytedance.com>
Cc: Mikel Rychliski <mikel@mikelr.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
virtio-mem currently depends on !DEVMEM | STRICT_DEVMEM. Let's default
STRICT_DEVMEM to "y" just like we do for arm64 and x86.
There could be ways in the future to filter access to virtio-mem device
memory even without STRICT_DEVMEM, but for now let's just keep it
simple.
Tested-by: Mario Casquero <mcasquer@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241025141453.1210600-6-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
After commit 5d659bbb52 ("maple_tree: introduce mas_wr_store_type()"),
the check here is redundant.
Let's remove it.
Link: https://lkml.kernel.org/r/20241017015809.23392-3-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Following cleanup after introduce mas_wr_store_type()", v2.
Patch 1 postpone new_end calculation when needed.
Patch 2 removes a unnecessary sanity check in mas_wr_slot_store().
This patch (of 2):
For wr_exact_fit/wr_new_root, we don't need to calculate new_end.
Let's postpone it until necessary.
Link: https://lkml.kernel.org/r/20241017015809.23392-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20241017015809.23392-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It might be a corner case when we add UINT_MAX as 64-bit unsigned value to
the percpu variable as it's not the same as -1 (ULONG_LONG_MAX). Add a
test case for that.
Link: https://lkml.kernel.org/r/20241016182635.1156168-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When count is not 0, we know head is valid. So we can put the assignment
in if (count) instead of checking the head pointer again.
Also count represents current total, we can assign the new total by
increasing the count by one.
Link: https://lkml.kernel.org/r/20241015120746.15850-4-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If it jumps to nomem_one, the total allocated number is not changed. So
we don't need to adjust it.
For the nomem_bulk case, we know there is a valid mas->alloc. So we don't
need to do the check.
Link: https://lkml.kernel.org/r/20241015120746.15850-3-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "maple_tree: simplify mas_push_node()", v2.
When count is not 0, we know head is valid. So we can put the assignment
in if (count) instead of checking the head pointer again.
Also count represents current total, we can assign the new total by
increasing the count by one.
This patch (of 3):
If this is not a new allocated one, the request_count has already been
cleared in mas_set_alloc_req().
Link: https://lkml.kernel.org/r/20241015120746.15850-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20241015120746.15850-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For a root node, mte_parent_slot() return 0, this exactly fits the
following !p_slot check.
So we can remove the special handling for root node.
Link: https://lkml.kernel.org/r/20240913063128.27391-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In the following code, the second call to the mas_node_count will return
-ENOMEM:
mas_node_count(mas, MAPLE_ALLOC_SLOTS + 1);
mas_node_count(mas, MAPLE_ALLOC_SLOTS * 2 + 2);
This is because there may be some full maple_alloc node in current maple
state. Use full maple_alloc node will make max_req equal to 0. And it
leads to mt_alloc_bulk return 0. As a result, mas_node_count set mas.node
to MA_ERROR(-ENOMEM).
Find a non-full maple_alloc node, and if necessary, use this non-full node
in the next while loop.
Link: https://lkml.kernel.org/r/20240626160631.3636515-1-Liam.Howlett@oracle.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Jiazi Li <jqqlijiazi@gmail.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In mas_wr_store_type(), we check if new_end < mt_slots[wr_mas->type]. If
this check fails, we know that ,after this, new_end is >= mt_min_slots.
Checking this again when we detect a wr_node_store later in the function
is reduntant. Because this check is part of an OR statement, the
statement will always evaluate to true, therefore we can just get rid of
it.
We also refactor mas_wr_store_type() to return the store type rather than
set it directly as it greatly cleans up the function.
Link: https://lkml.kernel.org/r/20241011214451.7286-2-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
Suggested-by: Liam Howlett <liam.howlett@oracle.com>
Suggested-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Many maple tree values output when an mt_validate() or equivalent hits an
issue utilise tagged pointers, most notably parent nodes. Also some
pivots/slots contain meaningful values, output as pointers, such as the
index of the last entry with data for example.
All pointer values such as this are destroyed by kernel pointer hashing
rendering the debug output obtained from CONFIG_DEBUG_VM_MAPLE_TREE
considerably less usable.
Update this code to output the raw pointers using %px rather than %p when
CONFIG_DEBUG_VM_MAPLE_TREE is defined. This is justified, as the use of
this configuration flag indicates that this is a test environment.
Userland does not understand %px, so use %p there.
In an abundance of caution, if CONFIG_DEBUG_VM_MAPLE_TREE is not set, also
use %p to avoid exposing raw kernel pointers except when we are positive a
testing mode is enabled.
This was inspired by the investigation performed in recent debugging
efforts around a maple tree regression [0] where kernel pointer tagging had
to be disabled in order to obtain truly meaningful and useful data.
[0]:https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/
Link: https://lkml.kernel.org/r/20241007115335.90104-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Replace the swp function pointer in the min_heap_callbacks of
test_min_heap with NULL, allowing direct usage of the default builtin swap
implementation. This modification simplifies the code and improves
performance by removing unnecessary function indirection.
Link: https://lkml.kernel.org/r/20241020040200.939973-5-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Coly Li <colyli@suse.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Sakai <msakai@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Enhance min heap API with non-inline functions and
optimizations", v2.
Add non-inline versions of the min heap API functions in lib/min_heap.c
and updates all users outside of kernel/events/core.c to use these
non-inline versions. To mitigate the performance impact of indirect
function calls caused by the non-inline versions of the swap and compare
functions, a builtin swap has been introduced that swaps elements based on
their size. Additionally, it micro-optimizes the efficiency of the min
heap by pre-scaling the counter, following the same approach as in
lib/sort.c. Documentation for the min heap API has also been added to the
core-api section.
This patch (of 10):
All current min heap API functions are marked with '__always_inline'.
However, as the number of users increases, inlining these functions
everywhere leads to a increase in kernel size.
In performance-critical paths, such as when perf events are enabled and
min heap functions are called on every context switch, it is important to
retain the inline versions for optimal performance. To balance this, the
original inline functions are kept, and additional non-inline versions of
the functions have been added in lib/min_heap.c.
Link: https://lkml.kernel.org/r/20241020040200.939973-1-visitorckw@gmail.com
Link: https://lore.kernel.org/20240522161048.8d8bbc7b153b4ecd92c50666@linux-foundation.org
Link: https://lkml.kernel.org/r/20241020040200.939973-2-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Coly Li <colyli@suse.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Sakai <msakai@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Remove unnecessary header includes from
{tools/}lib/list_sort.c".
Remove outdated and unnecessary header includes from lib/list_sort.c and
tools/lib/list_sort.c. Additionally, update the hunk exceptions checked
by check_headers.sh to reflect these changes.
This patch (of 3):
After commit 043b3f7b63 ("lib/list_sort: simplify and remove
MAX_LIST_LENGTH_BITS"), list_sort.c no longer uses ARRAY_SIZE() (which
required kernel.h and bug.h for BUILD_BUG_ON_ZERO via __must_be_array) or
memset() (which required string.h). As these headers are no longer
needed, removes them.
There are no changes to the generated code, as confirmed by 'objdump -d'.
Additionally, 'wc -l' shows that the size of lib/.list_sort.o.cmd is
reduced from 259 lines to 101 lines.
Link: https://lkml.kernel.org/r/20241012042828.471614-1-visitorckw@gmail.com
Link: https://lkml.kernel.org/r/20241012042828.471614-2-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, cpuset is the only user of the union-find implementation.
Compiling union-find in all configurations unnecessarily increases the
code size when building the kernel without cgroup support. Modify the
build system to compile union-find only when CONFIG_CPUSETS is enabled.
Link: https://lore.kernel.org/lkml/1ccd6411-5002-4574-bb8e-3e64bba6a757@redhat.com/
Link: https://lkml.kernel.org/r/20241011141214.87096-1-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Suggested-by: Waiman Long <llong@redhat.com>
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Xavier <xavier_qy@163.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add Kunit tests for the kernel's implementation of the standard CRC-16
algorithm (<linux/crc16.h>). The test data consists of 100
randomly-generated test cases, validated against a naive CRC-16
implementation.
This test follows roughly the same logic as lib/crc32test.c, but without
the performance measurements.
Link: https://lkml.kernel.org/r/20241012-crc16-kunit-v3-1-0ca75cb58ca9@lkcamp.dev
Signed-off-by: Vinicius Peixoto <vpeixoto@lkcamp.dev>
Co-developed-by: Enzo Bertoloti <ebertoloti@lkcamp.dev>
Signed-off-by: Enzo Bertoloti <ebertoloti@lkcamp.dev>
Co-developed-by: Fabricio Gasperin <fgasperin@lkcamp.dev>
Signed-off-by: Fabricio Gasperin <fgasperin@lkcamp.dev>
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Rae Moar <rmoar@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Check the total number of elements in both resultant lists are correct
within list_cut_position*(). Previously, only the first list's size was
checked. so additional elements in the second list would not have been
caught.
Link: https://lkml.kernel.org/r/20241008065253.26673-1-richard120310@gmail.com
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When executing 'make menuconfig' with KUNIT enabled, the int_pow test
option appears on the first page of the main menu instead of under the
runtime testing section. Relocate the int_pow test configuration to the
appropriate runtime testing submenu, ensuring a more organized and logical
structure in the menu configuration.
Link: https://lkml.kernel.org/r/20241005222221.2154393-1-visitorckw@gmail.com
Fixes: 7fcc9b5321 ("lib/math: Add int_pow test suite")
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: David Gow <davidgow@google.com>
Cc: Luis Felipe Hernandez <luis.hernandez093@gmail.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In mast_fill_bnode(), we first clear some fields of maple_big_node and set
the 'type' unconditionally before return. This means we won't leverage
any information in maple_big_node and it is safe to clear the whole
structure.
In maple_big_node, we define slot and padding/gap in a union. And based
on current definition of MAPLE_BIG_NODE_SLOTS/GAPS, padding is always less
than slot and part of the gap is overlapped by slot.
For example on 64bit system:
MAPLE_BIG_NODE_SLOT is 34
MAPLE_BIG_NODE_GAP is 21
With this knowledge, current code may clear some space by twice. And
this could be avoid by clearing the structure as a whole.
Link: https://lkml.kernel.org/r/20240908140554.20378-3-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Reduce the space to be cleared for maple_big_node", v2.
Found current code may clear maple_big_node redundantly.
First we define a field parent, which is never used. After removing this,
we reduce the size of memory to be cleared by memset.
Then mast_fill_bnode() clears part of the structure twice, since slot and
gap share some space. By clearing the whole structure, we can avoid this.
This patch (of 2):
The member parent of maple_big_node is never used.
Let's remove it which could reduce the number of space to be cleared on
memset.
Link: https://lkml.kernel.org/r/20240908140554.20378-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20240908140554.20378-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When we break the loop after assigning a pivot, the index i/j is not
changed. Then the following code assign pivot, which means we do the
assignment with same i/j by mas_safe_pivot.
Since the loop condition is (i < piv_end), from which we can get i is less
than mt_pivots[mt]. It implies mas_safe_pivot() return pivot[i] which is
the same value we get in loop.
Now we can conclude it does a redundant assignment on a pivot of 0. Let's
just go to complete to avoid it.
Link: https://lkml.kernel.org/r/20240911142759.20989-3-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "refine mas_mab_cp()".
By analysis of the code, one condition check can be removed and one case
would hit a redundant assignment.
This patch (of 2):
mas_mab_cp() copy range [mas_start, mas_end] inclusively from a
maple_node to maple_big_node. This implies mas_start <= mas_end.
Based on the relationship of mas_start and mas_end, we can have the
following four cases:
| mas_start == mas_end | mas_start < mas_end
---------------+----------------------+----------------------
mas_start == 0 | 1 | 2
---------------+----------------------+----------------------
mas_start != 0 | 3 | 4
We can see in all these four cases, i is always less than or equal to
mas_end after finish the loop:
Case 1: After assign pivot 0, i is set to 1, which is bigger than
mas_end 0. So it jumps to complete and skip the check.
Case 2: After assign pivot 0, i is set to 1.
∵ (mas_start < mas_end) && (mas_start == 0)
==> (1 <= mas_end)
∵ (i == 1) && (1 <= mas_end)
==> (i <= mas_end)
∴ Before loop, we have (i <= mas_end). And we still hold this
if it skips the loop. For example, (i == mas_end).
Now let's see what happens in the loop:
∵ piv_end = min(mas_end, mt_pivots[mt])
==> (piv_end <= mas_end)
∵ loop condition is (i < piv_end)
==> (i <= piv_end) on finish the loop both normally or break
∵ (i <= piv_end) && (piv_end <= mas_end)
==> (i <= mas_end)
∴ After loop, we still get (i <= mas_end) in this case
Case 3: This case would skip both if clause and loop. So when it comes
to the check, i is still mas_start which equals to mas_end.
Case 4: This case would skip the if clause.
∵ (mas_start < mas_end) && (i == mas_start)
==> (i < mas_end)
∴ Before loop, we have (i < mas_end).
The loop process is similar with Case 2, so we get the same
result.
Now we can conclude in all cases, we get (i <= mas_end) when doing
check. Then it is not necessary to do the check.
Link: https://lkml.kernel.org/r/20240911142759.20989-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20240911142759.20989-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZyP6TxUcZGFuaWVsQGlv
Z2VhcmJveC5uZXQACgkQ2yufC7HISINz7QD/RTuJAzPJXPQmjdzMj7pepjnSQH4K
DnOc1soDqjJPSFkBAMlklDCZqSsFoNtNxagbyILrYQBC/MsV9jngimK46DEN
=pDzC
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-10-31
We've added 13 non-merge commits during the last 16 day(s) which contain
a total of 16 files changed, 710 insertions(+), 668 deletions(-).
The main changes are:
1) Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it, from Puranjay Mohan.
2) Rewrite and migrate the test_tcp_check_syncookie.sh BPF selftest
into test_progs so that it can be run in BPF CI, from Alexis Lothoré.
3) Two BPF sockmap selftest fixes, from Zijian Zhang.
4) Small XDP synproxy BPF selftest cleanup to remove IP_DF check,
from Vincent Li.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Add a selftest for bpf_csum_diff()
selftests/bpf: Don't mask result of bpf_csum_diff() in test_verifier
bpf: bpf_csum_diff: Optimize and homogenize for all archs
net: checksum: Move from32to16() to generic header
selftests/bpf: remove xdp_synproxy IP_DF check
selftests/bpf: remove test_tcp_check_syncookie
selftests/bpf: test MSS value returned with bpf_tcp_gen_syncookie
selftests/bpf: add ipv4 and dual ipv4/ipv6 support in btf_skc_cls_ingress
selftests/bpf: get rid of global vars in btf_skc_cls_ingress
selftests/bpf: add missing ns cleanups in btf_skc_cls_ingress
selftests/bpf: factorize conn and syncookies tests in a single runner
selftests/bpf: Fix txmsg_redir of test_txmsg_pull in test_sockmap
selftests/bpf: Fix msg_verify_data in test_sockmap
====================
Link: https://patch.msgid.link/20241031221543.108853-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net_dim() is currently passed a struct dim_sample argument by value.
struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64
passes it on the stack. All callers have already initialized dim_sample
on the stack, so passing it by value requires pushing a duplicated copy
to the stack. Either witing to the stack and immediately reading it, or
perhaps dereferencing addresses relative to the stack pointer in a chain
of push instructions, seems to perform quite poorly.
In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time,
94% of which is attributed to the first push instruction to copy
dim_sample on the stack for the call to net_dim():
// Call ktime_get()
0.26 |4ead2: call 4ead7 <mlx5e_handle_rx_dim+0x47>
// Pass the address of struct dim in %rdi
|4ead7: lea 0x3d0(%rbx),%rdi
// Set dim_sample.pkt_ctr
|4eade: mov %r13d,0x8(%rsp)
// Set dim_sample.byte_ctr
|4eae3: mov %r12d,0xc(%rsp)
// Set dim_sample.event_ctr
0.15 |4eae8: mov %bp,0x10(%rsp)
// Duplicate dim_sample on the stack
94.16 |4eaed: push 0x10(%rsp)
2.79 |4eaf1: push 0x10(%rsp)
0.07 |4eaf5: push %rax
// Call net_dim()
0.21 |4eaf6: call 4eafb <mlx5e_handle_rx_dim+0x6b>
To allow the caller to reuse the struct dim_sample already on the stack,
pass the struct dim_sample by reference to net_dim().
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make the start and end arguments to dim_calc_stats() const pointers
to clarify that the function does not modify their values.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Link: https://patch.msgid.link/20241031002326.3426181-1-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The output of ".%03u" with the unsigned int in range [0, 4294966295] may
get truncated if the target buffer is not 12 bytes. This can't really
happen here as the 'remainder' variable cannot exceed 999 but the
compiler doesn't know it. To make it happy just increase the buffer to
where the warning goes away.
Fixes: 3c9f3681d0 ("[SCSI] lib: add generic helper to print sizes rounded to the correct SI range")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Kees Cook <kees@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20241101205453.9353-1-brgl@bgdev.pl
Signed-off-by: Kees Cook <kees@kernel.org>
Since 135225a363 timekeeping_cycles_to_ns() handles large offsets which
would lead to 64bit multiplication overflows correctly. It's also protected
against negative motion of the clocksource unconditionally, which was
exclusive to x86 before.
timekeeping_advance() handles large offsets already correctly.
That means the value of CONFIG_DEBUG_TIMEKEEPING which analyzed these cases
is very close to zero. Remove all of it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20241031120328.536010148@linutronix.de
.bi_size of bvec iterator should be initialized as real max size for
walking, and .bi_bvec_done just counts how many bytes need to be
skipped in the 1st bvec, so .bi_size isn't related with .bi_bvec_done.
This patch fixes bvec iterator initialization, and the inner `size`
check isn't needed any more, so revert Eric Dumazet's commit
7bc802acf193 ("iov-iter: do not return more bytes than requested in
iov_iter_extract_bvec_pages()").
Cc: Eric Dumazet <edumazet@google.com>
Fixes: e4e535bff2 ("iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages")
Reported-by: syzbot+71abe7ab2b70bca770fd@syzkaller.appspotmail.com
Tested-by: syzbot+71abe7ab2b70bca770fd@syzkaller.appspotmail.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Fix handling of POR_EL0 during signal delivery so that pushing the
signal context doesn't fail based on the pkey configuration of the
interrupted context and align our user-visible behaviour with that of
x86.
- Fix a bogus pointer being passed to the CPU hotplug code from the
Arm SDEI driver.
- Re-enable software tag-based KASAN with GCC by using an alternative
implementation of '__no_sanitize_address'.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmcjr8wQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNL2DB/4tNl7feCA2V4fW/Eu3RzXrHTdJbZvTjLDl
JjeXPZr4WdGQQMgQ0DPZtpnmeBzd5nswx9WHG9VSsUxc5g+rzWxwvMnUeplDvEXo
Y/QMUq4JZN3eqDZWPs0mEN4fMI+QOihInErVHvFXaJLcbxYrU5BvfwExgfY53AjT
ZJEPmF291OL6V4UCWVWggk44BQaTBeWmc4itJcYm6z6mIgAgh84MZGK5M0e582ip
CRAImDiAPqLxRO9kzKcYthI3FDyyVi1HtiSL1CiNktOXMNz19qPelq1XAnDEyvBt
TEUitTLTwbUJ0nqi4u7ve09aebneAq8nsGucteYTrBU4U/PRjvQO
=LTB9
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The important one is a change to the way in which we handle protection
keys around signal delivery so that we're more closely aligned with
the x86 behaviour, however there is also a revert of the previous fix
to disable software tag-based KASAN with GCC, since a workaround
materialised shortly afterwards.
I'd love to say we're done with 6.12, but we're aware of some
longstanding fpsimd register corruption issues that we're almost at
the bottom of resolving.
Summary:
- Fix handling of POR_EL0 during signal delivery so that pushing the
signal context doesn't fail based on the pkey configuration of the
interrupted context and align our user-visible behaviour with that
of x86.
- Fix a bogus pointer being passed to the CPU hotplug code from the
Arm SDEI driver.
- Re-enable software tag-based KASAN with GCC by using an alternative
implementation of '__no_sanitize_address'"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: signal: Improve POR_EL0 handling to avoid uaccess failures
firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state()
Revert "kasan: Disable Software Tag-Based KASAN with GCC"
kasan: Fix Software Tag-Based KASAN with GCC
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZyTGAQAKCRCRxhvAZXjc
opd6AQCal4omyfS8FYe4VRRZ/0XHouagq99I0U0TAmKkvoKAsgD/XrdE+pSTEkPX
Pv4T9phh1cZRxcyKVu77UoYkuHJEDAg=
=Lu9R
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull filesystem fixes from Christian Brauner:
"VFS:
- Fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP=y is set
- Add a get_tree_bdev_flags() helper that allows to modify e.g.,
whether errors are logged into the filesystem context during
superblock creation. This is used by erofs to fix a userspace
regression where an error is currently logged when its used on a
regular file which is an new allowed mode in erofs.
netfs:
- Fix the sysfs debug path in the documentation.
- Fix iov_iter_get_pages*() for folio queues by skipping the page
extracation if we're at the end of a folio.
afs:
- Fix moving subdirectories to different parent directory.
autofs:
- Fix handling of AUTOFS_DEV_IOCTL_TIMEOUT_CMD ioctl in
validate_dev_ioctl(). The actual ioctl number, not the ioctl
command needs to be checked for autofs"
* tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP
autofs: fix thinko in validate_dev_ioctl()
iov_iter: Fix iov_iter_get_pages*() for folio_queue
afs: Fix missing subdir edit when renamed between parent dirs
doc: correcting the debug path for cachefiles
erofs: use get_tree_bdev_flags() to avoid misleading messages
fs/super.c: introduce get_tree_bdev_flags()
dql->last_obj_cnt is read/written from different contexts,
without any lock synchronization.
Use READ_ONCE()/WRITE_ONCE() to avoid load/store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20241029191425.2519085-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Initialize bi.bi_idx as 0 before iterating over bvec, otherwise
garbage data can be used as ->bi_idx.
Cc: Christoph Hellwig <hch@lst.de>
Reported-and-tested-by: Klara Modin <klarasmodin@gmail.com>
Fixes: e4e535bff2 ("iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
from32to16() is used by lib/checksum.c and also by
arch/parisc/lib/checksum.c. The next patch will use it in the
bpf_csum_diff helper.
Move from32to16() to the include/net/checksum.h as csum_from32to16() and
remove other implementations.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20241026125339.26459-2-puranjay@kernel.org
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmcgrxcACgkQu+CwddJF
iJrq9ggAiZ/2c7p23s52LdVhT9GTyV5omVOh2kDztVx4w6RM3RbkhkLWdqt0XUag
uf1TJe6kOvnCeHEFEEo3sqPj820XebxKDf0GGCdI6a9f4n30ipKH+vWSQ0iutKO/
dOBdArxr0FGOV5VZR9i3xQ6sUqZXXUbJdte0c0ovp6Q6HDHTeQeKNhOQ2fv33TG/
7jBh5HVyhI6JE/+TOxrMaklH0IqYBb6z49wdbaN7XBvXVXlb5MtOZy109gfUHDwe
tfktifyE45VtmF0WdHfxDbCnqyDSG1Jm3wsLDbMq+voJ1BQlUvIZ5Dv4kucYqffm
VN5HkH6uQ09aoounBoU4g50UYeNpiQ==
=xAw8
-----END PGP SIGNATURE-----
Merge tag 'slab-for-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
- Fix for a slub_kunit test warning with MEM_ALLOC_PROFILING_DEBUG (Pei
Xiao)
- Fix for a MTE-based KASAN BUG in krealloc() (Qun-Wei Lin)
* tag 'slab-for-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm: krealloc: Fix MTE false alarm in __do_krealloc
slub/kunit: fix a WARNING due to unwrapped __kmalloc_cache_noprof
The iov_iter_extract_pages interface allows to return physically
discontiguous pages, as long as all but the first and last page
in the array are page aligned and page size. Rewrite
iov_iter_extract_bvec_pages to take advantage of that instead of only
returning ranges of physically contiguous pages.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
[hch: minor cleanups, new commit log]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241024050021.627350-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The newly added file did not quite get the punctuation right:
lib/iomem_copy.c:14: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410290907.0mDZVYPK-lkp@intel.com/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The newly added test script creates modules that are lacking
a description line in order to build cleanly:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/tests/module/test_kallsyms_a.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/tests/module/test_kallsyms_b.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/tests/module/test_kallsyms_c.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/tests/module/test_kallsyms_d.o
Fixes: 84b4a51fce ("selftests: add new kallsyms selftests")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
The IO memcpy and IO memset functions in asm-generic/io.h simply call
memcpy and memset. This can lead to alignment problems or faults on
architectures that do not define their own version and fall back to
these defaults.
This patch introduces new implementations for IO memcpy and IO memset,
that use read{l,q} accessor functions, align accesses to machine word
size, and resort to byte accesses when the target memory is not aligned.
For new architectures and existing ones that were using the old
fallbacks these functions are save to use, because IO memory constraints
are taken into account. Moreover, architectures with similar
implementations can now use these new versions, not needing to implement
their own.
Reviewed-by: Yann Sionneau <ysionneau@kalrayinc.com>
Signed-off-by: Julian Vetter <jvetter@kalrayinc.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Be sure to test the extreme cases with and without bias.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The use of struct range in the CXL subsystem is growing. In particular,
the addition of Dynamic Capacity devices uses struct range in a number
of places which are reported in debug and error messages.
To wit requiring the printing of the start/end fields in each print
became cumbersome. Dan Williams mentions in [1] that it might be time
to have a print specifier for struct range similar to struct resource.
A few alternatives were considered including '%par', '%r', and '%pn'.
%pra follows that struct range is similar to struct resource (%p[rR])
but needs to be different. Based on discussions with Petr and Andy
'%pra' was chosen.[2]
Andy also suggested to keep the range prints similar to struct resource
though combined code. Add hex_range() to handle printing for both
pointer types.
Finally introduce DEFINE_RANGE() as a parallel to DEFINE_RES_*() and use
it in the tests.
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: open list <linux-kernel@vger.kernel.org>
Link: https://lore.kernel.org/all/663922b475e50_d54d72945b@dwillia2-xfh.jf.intel.com.notmuch/ [1]
Link: https://lore.kernel.org/all/66cea3bf3332f_f937b29424@iweiny-mobl.notmuch/ [2]
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20241025-cxl-pra-v2-3-123a825daba2@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
The printf tests for struct resource were stubbed out. struct range
printing will leverage the struct resource implementation.
To prevent regression add some basic sanity tests for struct resource.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20241007-dcd-type2-upstream-v4-1-c261ee6eeded@intel.com
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/20241025-cxl-pra-v2-1-123a825daba2@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
generic/077 on x86_32 CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y with highmem,
on huge=always tmpfs, issues a warning and then hangs (interruptibly):
WARNING: CPU: 5 PID: 3517 at mm/highmem.c:622 kunmap_local_indexed+0x62/0xc9
CPU: 5 UID: 0 PID: 3517 Comm: cp Not tainted 6.12.0-rc4 #2
...
copy_page_from_iter_atomic+0xa6/0x5ec
generic_perform_write+0xf6/0x1b4
shmem_file_write_iter+0x54/0x67
Fix copy_page_from_iter_atomic() by limiting it in that case
(include/linux/skbuff.h skb_frag_must_loop() does similar).
But going forward, perhaps CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP is too
surprising, has outlived its usefulness, and should just be removed?
Fixes: 908a1ad894 ("iov_iter: Handle compound highmem pages in copy_page_from_iter_atomic()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Link: https://lore.kernel.org/r/dd5f0c89-186e-18e1-4f43-19a60f5a9774@google.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Move crypto_simd_disabled_for_test to lib/ so that crypto_simd_usable()
can be used by library code.
This was discussed previously
(https://lore.kernel.org/linux-crypto/20220716062920.210381-4-ebiggers@kernel.org/)
but was not done because there was no use case yet. However, this is
now needed for the arm64 CRC32 library code.
Tested with:
export ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
echo CONFIG_CRC32=y > .config
echo CONFIG_MODULES=y >> .config
echo CONFIG_CRYPTO=m >> .config
echo CONFIG_DEBUG_KERNEL=y >> .config
echo CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=n >> .config
echo CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y >> .config
make olddefconfig
make -j$(nproc)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crc32c-generic is currently backed by the architecture's CRC-32c library
code, which may offer a variety of implementations depending on the
capabilities of the platform. These are not covered by the crypto
subsystem's fuzz testing capabilities because crc32c-generic is the
reference driver that the fuzzing logic uses as a source of truth.
Fix this by providing a crc32c-arch implementation which is based on the
arch library code if available, and modify crc32c-generic so it is
always based on the generic C implementation. If the arch has no CRC-32c
library code, this change does nothing.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crc32-generic is currently backed by the architecture's CRC-32 library
code, which may offer a variety of implementations depending on the
capabilities of the platform. These are not covered by the crypto
subsystem's fuzz testing capabilities because crc32-generic is the
reference driver that the fuzzing logic uses as a source of truth.
Fix this by providing a crc32-arch implementation which is based on the
arch library code if available, and modify crc32-generic so it is
always based on the generic C implementation. If the arch has no CRC-32
library code, this change does nothing.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
- objpool: Fix choosing allocation for percpu slots
Fixes to allocate objpool's percpu slots correctly according to the
GFP flag. It checks whether "any bit" in GFP_ATOMIC is set to choose
the vmalloc source, but it should check "all bits" in GFP_ATOMIC flag
is set, because GFP_ATOMIC is a combined flag.
- tracing/probes: Fix MAX_TRACE_ARGS limit handling
If more than MAX_TRACE_ARGS are passed for creating a probe event, the
entries over MAX_TRACE_ARG in trace_arg array are not initialized.
Thus if the kernel accesses those entries, it crashes. This rejects
creating event if the number of arguments is over MAX_TRACE_ARGS.
- tracing: Consider the NULL character when validating the event length
A strlen() is used when parsing the event name, and the original code
does not consider the terminal null byte. Thus it can pass the name
1 byte longer than the buffer. This fixes to check it correctly.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmcZBJ0ACgkQ2/sHvwUr
Pxu4qAgAm+mIiCaBGyolsT1oB5EF+9gztbwRtcAOY1811RJZ0XiQPuOwtZfijpBr
1Pl+SjubRKhLg+lLHEuCQHxkqlTSp+zrjkF+A0hFlB38nJ5P3pIw+b5pM5FCvhY+
w0tBTwkjiRBS9h1z88c74ciKYA/XR4apcMMUrPQZUCHq8P73Wu/Fo2lhnCVGBs6q
nYESyrTcOCDR0c6HP9D2GWxQFtbbCyAfotUjX37EIooTcl7ufAr8IPm8jBx7EzCa
WM841FwbuIgGbFCGYlG1/lOR+Qf7FszKAY5SBJMV/BiyFbxJqZfA5DWfJcrZ9YpW
pl86oKWyEkidwx8OIiB3Y1enPzUUJQ==
=8oUB
-----END PGP SIGNATURE-----
Merge tag 'probes-fixes-v6.12-rc4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- objpool: Fix choosing allocation for percpu slots
Fixes to allocate objpool's percpu slots correctly according to the
GFP flag. It checks whether "any bit" in GFP_ATOMIC is set to choose
the vmalloc source, but it should check "all bits" in GFP_ATOMIC flag
is set, because GFP_ATOMIC is a combined flag.
- tracing/probes: Fix MAX_TRACE_ARGS limit handling
If more than MAX_TRACE_ARGS are passed for creating a probe event,
the entries over MAX_TRACE_ARG in trace_arg array are not
initialized. Thus if the kernel accesses those entries, it crashes.
This rejects creating event if the number of arguments is over
MAX_TRACE_ARGS.
- tracing: Consider the NUL character when validating the event length
A strlen() is used when parsing the event name, and the original code
does not consider the terminal null byte. Thus it can pass the name
one byte longer than the buffer. This fixes to check it correctly.
* tag 'probes-fixes-v6.12-rc4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Consider the NULL character when validating the event length
tracing/probes: Fix MAX_TRACE_ARGS limit handling
objpool: fix choosing allocation for percpu slots
We lack find_symbol() selftests, so add one. This let's us stress test
improvements easily on find_symbol() or optimizations. It also inherently
allows us to test the limits of kallsyms on Linux today.
We test a pathalogical use case for kallsyms by introducing modules
which are automatically written for us with a larger number of symbols.
We have 4 kallsyms test modules:
A: has KALLSYSMS_NUMSYMS exported symbols
B: uses one of A's symbols
C: adds KALLSYMS_SCALE_FACTOR * KALLSYSMS_NUMSYMS exported
D: adds 2 * the symbols than C
By using anything much larger than KALLSYSMS_NUMSYMS as 10,000 and
KALLSYMS_SCALE_FACTOR of 8 we segfault today. So we're capped at
around 160000 symbols somehow today. We can inpsect that issue at
our leasure later, but for now the real value to this test is that
this will easily allow us to test improvements on find_symbol().
We want to enable this test on allyesmodconfig builds so we can't
use this combination, so instead just use a safe value for now and
be informative on the Kconfig symbol documentation about where our
thresholds are for testers. We default then to KALLSYSMS_NUMSYMS of
just 100 and KALLSYMS_SCALE_FACTOR of 8.
On x86_64 we can use perf, for other architectures we just use 'time'
and allow for customizations. For example a future enhancements could
be done for parisc to check for unaligned accesses which triggers a
special special exception handler assembler code inside the kernel.
The negative impact on performance is so large on parisc that it
keeps track of its accesses on /proc/cpuinfo as UAH:
IRQ: CPU0 CPU1
3: 1332 0 SuperIO ttyS0
7: 1270013 0 SuperIO pata_ns87415
64: 320023012 320021431 CPU timer
65: 17080507 20624423 CPU IPI
UAH: 10948640 58104 Unaligned access handler traps
While at it, this tidies up lib/ test modules to allow us to have
a new directory for them. The amount of test modules under lib/
is insane.
This should also hopefully showcase how to start doing basic
self module writing code, which may be more useful for more complex
cases later in the future.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
p9_get_mapped_pages() uses iov_iter_get_pages_alloc2() to extract pages
from an iterator when performing a zero-copy request and under some
circumstances, this crashes with odd page errors[1], for example, I see:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbcf0
flags: 0x2000000000000000(zone=1)
...
page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:1444!
This is because, unlike in iov_iter_extract_folioq_pages(), the
iter_folioq_get_pages() helper function doesn't skip the current folio
when iov_offset points to the end of it, but rather extracts the next
page beyond the end of the folio and adds it to the list. Reading will
then clobber the contents of this page, leading to system corruption,
and if the page is not in use, put_page() may try to clean up the unused
page.
This can be worked around by copying the iterator before each
extraction[2] and using iov_iter_advance() on the original as the
advance function steps over the page we're at the end of.
Fix this by skipping the page extraction if we're at the end of the
folio.
This was reproduced in the ktest environment[3] by forcing 9p to use the
fscache caching mode and then reading a file through 9p.
Fixes: db0aa2e956 ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios")
Reported-by: Antony Antony <antony@phenome.org>
Closes: https://lore.kernel.org/r/ZxFQw4OI9rrc7UYc@Antony2201.local/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: v9fs@lists.linux.dev
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/ZxFEi1Tod43pD6JC@moon.secunet.de/ [1]
Link: https://lore.kernel.org/r/2299159.1729543103@warthog.procyon.org.uk/ [2]
Link: https://github.com/koverstreet/ktest.git [3]
Tested-by: Antony Antony <antony.antony@secunet.com>
Link: https://lore.kernel.org/r/3327438.1729678025@warthog.procyon.org.uk
Signed-off-by: Christian Brauner <brauner@kernel.org>
This reverts commit 7aed6a2c51.
Now that __no_sanitize_address attribute is fixed for KASAN_SW_TAGS with
GCC, allow re-enabling KASAN_SW_TAGS with GCC.
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrew Pinski <pinskia@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20241021120013.3209481-2-elver@google.com
Signed-off-by: Will Deacon <will@kernel.org>
'modprobe slub_kunit' will have a warning as shown below. The root cause
is that __kmalloc_cache_noprof was directly used, which resulted in no
alloc_tag being allocated. This caused current->alloc_tag to be null,
leading to a warning in alloc_tag_add_check.
Let's add an alloc_hook layer to __kmalloc_cache_noprof specifically
within lib/slub_kunit.c, which is the only user of this internal slub
function outside kmalloc implementation itself.
[58162.947016] WARNING: CPU: 2 PID: 6210 at
./include/linux/alloc_tag.h:125 alloc_tagging_slab_alloc_hook+0x268/0x27c
[58162.957721] Call trace:
[58162.957919] alloc_tagging_slab_alloc_hook+0x268/0x27c
[58162.958286] __kmalloc_cache_noprof+0x14c/0x344
[58162.958615] test_kmalloc_redzone_access+0x50/0x10c [slub_kunit]
[58162.959045] kunit_try_run_case+0x74/0x184 [kunit]
[58162.959401] kunit_generic_run_threadfn_adapter+0x2c/0x4c [kunit]
[58162.959841] kthread+0x10c/0x118
[58162.960093] ret_from_fork+0x10/0x20
[58162.960363] ---[ end trace 0000000000000000 ]---
Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
Fixes: a0a44d9175 ("mm, slab: don't wrap internal functions with alloc_hooks()")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
objpool intends to use vmalloc for default (non-atomic) allocations of
percpu slots and objects. However, the condition checking if GFP flags
set any bit of GFP_ATOMIC is wrong b/c GFP_ATOMIC is a combination of bits
(__GFP_HIGH|__GFP_KSWAPD_RECLAIM) and so `pool->gfp & GFP_ATOMIC` will
be true if either bit is set. Since GFP_ATOMIC and GFP_KERNEL share the
___GFP_KSWAPD_RECLAIM bit, kmalloc will be used in cases when GFP_KERNEL
is specified, i.e. in all current usages of objpool.
This may lead to unexpected OOM errors since kmalloc cannot allocate
large amounts of memory.
For instance, objpool is used by fprobe rethook which in turn is used by
BPF kretprobe.multi and kprobe.session probe types. Trying to attach
these to all kernel functions with libbpf using
SEC("kprobe.session/*")
int kprobe(struct pt_regs *ctx)
{
[...]
}
fails on objpool slot allocation with ENOMEM.
Fix the condition to truly use vmalloc by default.
Link: https://lore.kernel.org/all/20240826060718.267261-1-vmalik@redhat.com/
Fixes: b4edb8d2d4 ("lib: objpool added: ring-array based lockless MPMC")
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Matt Wu <wuqiang.matt@bytedance.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Toolchain and infrastructure:
- Fix several issues with the 'rustc-option' macro. It includes a
refactor from Masahiro of three '{cc,rust}-*' macros, which is not
a fix but avoids repeating the same commands (which would be several
lines in the case of 'rustc-option').
- Fix conditions for 'CONFIG_HAVE_CFI_ICALL_NORMALIZE_INTEGERS'. It
includes the addition of 'CONFIG_RUSTC_LLVM_VERSION', which is not a
fix but is needed for the actual fix.
And a trivial grammar fix.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmcS5LkACgkQGXyLc2ht
IW07ghAAxP94zqWzf8bQ4IIgTYrV9WSqR9vMpd31VAPknRJjGUq5dehFxiQxDJ5X
ibMcpyja8V1CGeOh4qthLJAD/OGw+ANafjLfHM/l9cQRx1uwLEac3h4/YR1x52Ep
al3ISewhbs3cjko2aa6Gnym3hdYizqkKY9Bca6kvo7k4ZRRmWT3sKAsle6rV93Hw
q9AjC40XC8iy2VYv/JPvP1zcr3T7ZzCrs3ELG8sLSeR0gZZEmI3e3FOWWHcRlVRa
uig4SSPvhHVssG8k64CHmzUtVQCApuJuzQGG72Ozs4V5Xxk86ZRE0XzyMXaw15nu
Mm8s+hDxsFXfESQg0GMCVQ7wnGFSuvRwK3sWALltXmqtGQxkYgcJ3mYtu0sP8p51
VIzDIomdUfGLxk+sDn7Lnl5PrSLaetUd94nr5qCMmfb2/7/kSaB4aHmML+8ZHCn5
I4TQONL/pVmmRm97HFaAFOzCaGRWfVoIzQ/cRaQhqK+qrTfRjyFcsMzN+Flp5A58
c3AgnTVlm4pPqtlLQ1z9BiGYT50dI0fHBOQiisogGsZwwMUqzEMOnbZjbhS/HKSp
FG8hu/OyzIsNnNqOfQZN4DSTyf4qfIuyTmFM1OAel8zllCwlxy5F2hVp/opwH3/y
On6CW0lunUBzCXZZ+byWudo7Vg8YpMVHATLqp9FHZpJb8JK688w=
=Y7fL
-----END PGP SIGNATURE-----
Merge tag 'rust-fixes-6.12-2' of https://github.com/Rust-for-Linux/linux
Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Fix several issues with the 'rustc-option' macro. It includes a
refactor from Masahiro of three '{cc,rust}-*' macros, which is not
a fix but avoids repeating the same commands (which would be
several lines in the case of 'rustc-option').
- Fix conditions for 'CONFIG_HAVE_CFI_ICALL_NORMALIZE_INTEGERS'. It
includes the addition of 'CONFIG_RUSTC_LLVM_VERSION', which is not
a fix but is needed for the actual fix.
And a trivial grammar fix"
* tag 'rust-fixes-6.12-2' of https://github.com/Rust-for-Linux/linux:
cfi: fix conditions for HAVE_CFI_ICALL_NORMALIZE_INTEGERS
kbuild: rust: add `CONFIG_RUSTC_LLVM_VERSION`
kbuild: fix issues with rustc-option
kbuild: refactor cc-option-yn, cc-disable-warning, rust-option-yn macros
lib/Kconfig.debug: fix grammar in RUST_BUILD_ASSERT_ALLOW
- Fix BPF verifier to not affect subreg_def marks in its range
propagation, from Eduard Zingerman.
- Fix a truncation bug in the BPF verifier's handling of
coerce_reg_to_size_sx, from Dimitar Kanaliev.
- Fix the BPF verifier's delta propagation between linked
registers under 32-bit addition, from Daniel Borkmann.
- Fix a NULL pointer dereference in BPF devmap due to missing
rxq information, from Florian Kauer.
- Fix a memory leak in bpf_core_apply, from Jiri Olsa.
- Fix an UBSAN-reported array-index-out-of-bounds in BTF
parsing for arrays of nested structs, from Hou Tao.
- Fix build ID fetching where memory areas backing the file
were created with memfd_secret, from Andrii Nakryiko.
- Fix BPF task iterator tid filtering which was incorrectly
using pid instead of tid, from Jordan Rome.
- Several fixes for BPF sockmap and BPF sockhash redirection
in combination with vsocks, from Michal Luczaj.
- Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered,
from Andrea Parri.
- Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the
possibility of an infinite BPF tailcall, from Pu Lehui.
- Fix a build warning from resolve_btfids that bpf_lsm_key_free
cannot be resolved, from Thomas Weißschuh.
- Fix a bug in kfunc BTF caching for modules where the wrong
BTF object was returned, from Toke Høiland-Jørgensen.
- Fix a BPF selftest compilation error in cgroup-related tests
with musl libc, from Tony Ambardar.
- Several fixes to BPF link info dumps to fill missing fields,
from Tyrone Wu.
- Add BPF selftests for kfuncs from multiple modules, checking
that the correct kfuncs are called, from Simon Sundberg.
- Ensure that internal and user-facing bpf_redirect flags
don't overlap, also from Toke Høiland-Jørgensen.
- Switch to use kvzmalloc to allocate BPF verifier environment,
from Rik van Riel.
- Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic
splat under RT, from Wander Lairson Costa.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZxK4OhUcZGFuaWVsQGlv
Z2VhcmJveC5uZXQACgkQ2yufC7HISIOCrwEAib2kC5EEQn5+wKVE/bnZryVX2leT
YXdfItDCBU6zCYUA+wTU5hGGn9lcDUcZx72l/KZPDyPw7HdzNJ+6iR1zQqoM
=f9kv
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Daniel Borkmann:
- Fix BPF verifier to not affect subreg_def marks in its range
propagation (Eduard Zingerman)
- Fix a truncation bug in the BPF verifier's handling of
coerce_reg_to_size_sx (Dimitar Kanaliev)
- Fix the BPF verifier's delta propagation between linked registers
under 32-bit addition (Daniel Borkmann)
- Fix a NULL pointer dereference in BPF devmap due to missing rxq
information (Florian Kauer)
- Fix a memory leak in bpf_core_apply (Jiri Olsa)
- Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for
arrays of nested structs (Hou Tao)
- Fix build ID fetching where memory areas backing the file were
created with memfd_secret (Andrii Nakryiko)
- Fix BPF task iterator tid filtering which was incorrectly using pid
instead of tid (Jordan Rome)
- Several fixes for BPF sockmap and BPF sockhash redirection in
combination with vsocks (Michal Luczaj)
- Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri)
- Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility
of an infinite BPF tailcall (Pu Lehui)
- Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot
be resolved (Thomas Weißschuh)
- Fix a bug in kfunc BTF caching for modules where the wrong BTF object
was returned (Toke Høiland-Jørgensen)
- Fix a BPF selftest compilation error in cgroup-related tests with
musl libc (Tony Ambardar)
- Several fixes to BPF link info dumps to fill missing fields (Tyrone
Wu)
- Add BPF selftests for kfuncs from multiple modules, checking that the
correct kfuncs are called (Simon Sundberg)
- Ensure that internal and user-facing bpf_redirect flags don't overlap
(Toke Høiland-Jørgensen)
- Switch to use kvzmalloc to allocate BPF verifier environment (Rik van
Riel)
- Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat
under RT (Wander Lairson Costa)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits)
lib/buildid: Handle memfd_secret() files in build_id_parse()
selftests/bpf: Add test case for delta propagation
bpf: Fix print_reg_state's constant scalar dump
bpf: Fix incorrect delta propagation between linked registers
bpf: Properly test iter/task tid filtering
bpf: Fix iter/task tid filtering
riscv, bpf: Make BPF_CMPXCHG fully ordered
bpf, vsock: Drop static vsock_bpf_prot initialization
vsock: Update msg_count on read_skb()
vsock: Update rx_bytes on read_skb()
bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock
selftests/bpf: Add asserts for netfilter link info
bpf: Fix link info netfilter flags to populate defrag flag
selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx()
selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx()
bpf: Fix truncation bug in coerce_reg_to_size_sx()
selftests/bpf: Assert link info uprobe_multi count & path_size if unset
bpf: Fix unpopulated path_size when uprobe_multi fields unset
selftests/bpf: Fix cross-compiling urandom_read
selftests/bpf: Add test for kfunc module order
...
With the printk issues solved, the last known splat created by
PROVE_RAW_LOCK_NESTING is gone.
Enable PROVE_RAW_LOCK_NESTING by default as part of PROVE_LOCKING. Keep
the defines around in case something serious pops up and it needs to be
disabled.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20241009161041.1018375-2-bigeasy@linutronix.de
Add a test case to ensure that no new name string literal will be
created in lockdep_set_subclass(), otherwise a warning will be triggered
in look_up_lock_class(). Add this to catch the problem in the future.
[boqun: Reword the title, replace #if with #ifdef and rename functions
and variables]
Signed-off-by: Ahmed Ehab <bottaawesome633@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/lkml/20240905011220.356973-1-bottaawesome633@gmail.com/
It is the usual shower of unrelated singletons - please see the individual
changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZxGY5wAKCRDdBJ7gKXxA
js6RAQC16zQ7WRV091i79cEi1C5648NbZjMCU626hZjuyfbzKgEA2v8PYtjj9w2e
UGLxMY+PYZki2XNEh75Sikdkiyl9Vgg=
=xcWT
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"28 hotfixes. 13 are cc:stable. 23 are MM.
It is the usual shower of unrelated singletons - please see the
individual changelogs for details"
* tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
maple_tree: add regression test for spanning store bug
maple_tree: correct tree corruption on spanning store
mm/mglru: only clear kswapd_failures if reclaimable
mm/swapfile: skip HugeTLB pages for unuse_vma
selftests: mm: fix the incorrect usage() info of khugepaged
MAINTAINERS: add Jann as memory mapping/VMA reviewer
mm: swap: prevent possible data-race in __try_to_reclaim_swap
mm: khugepaged: fix the incorrect statistics when collapsing large file folios
MAINTAINERS: kasan, kcov: add bugzilla links
mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
Docs/damon/maintainer-profile: update deprecated awslabs GitHub URLs
Docs/damon/maintainer-profile: add missing '_' suffixes for external web links
maple_tree: check for MA_STATE_BULK on setting wr_rebalance
mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets()
mm: remove unused stub for can_swapin_thp()
mailmap: add an entry for Andy Chiu
MAINTAINERS: add memory mapping/VMA co-maintainers
fs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization
...
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai@intel.com>
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
- Disable software tag-based KASAN when compiling with GCC, as functions
are incorrectly instrumented leading to a crash early during boot.
- Fix pkey configuration for kernel threads when POE is enabled.
- Fix invalid memory accesses in uprobes when targetting load-literal
instructions.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmcPrzQQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNIr6B/wN+o1xI7Fv/QdlaTuKYLvOOg/XTl6sbUDj
YssxtjhpKuaFVG4zJHNsWvgUqO+YCM7m3F1L8LVPMF7l2xoKtRTIB1Ye315hTjYm
dW5Te6xBMVKF8SVxE8sBbZobdokIW1JNPBrvGvHO3d5ujmofzwHU8RNMXuTUItRw
z85Qy75FkEDTEbsWhS3VL5HOgEr+k0TYDRa8SXwKWVj7/rYna3tO39kIdS5dt9VX
wDJbnxtWJMhiHmDnevFFhBkSZrips12P1Rb6HUSmhpUJh0Rk4TAZntSl2f/lr+jA
PuboBbSG68UOCwAHoNmTcLdFhkiNaiyw4w2F7hk2A6aNRtme+bT0
=M/ug
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- Disable software tag-based KASAN when compiling with GCC, as
functions are incorrectly instrumented leading to a crash early
during boot
- Fix pkey configuration for kernel threads when POE is enabled
- Fix invalid memory accesses in uprobes when targetting load-literal
instructions
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
kasan: Disable Software Tag-Based KASAN with GCC
Documentation/protection-keys: add AArch64 to documentation
arm64: set POR_EL0 for kernel threads
arm64: probes: Fix uprobes for big-endian kernels
arm64: probes: Fix simulate_ldr*_literal()
arm64: probes: Remove broken LDR (literal) uprobe support
Patch series "maple_tree: correct tree corruption on spanning store", v3.
There has been a nasty yet subtle maple tree corruption bug that appears
to have been in existence since the inception of the algorithm.
This bug seems far more likely to happen since commit f8d112a4e6
("mm/mmap: avoid zeroing vma tree in mmap_region()"), which is the point
at which reports started to be submitted concerning this bug.
We were made definitely aware of the bug thanks to the kind efforts of
Bert Karwatzki who helped enormously in my being able to track this down
and identify the cause of it.
The bug arises when an attempt is made to perform a spanning store across
two leaf nodes, where the right leaf node is the rightmost child of the
shared parent, AND the store completely consumes the right-mode node.
This results in mas_wr_spanning_store() mitakenly duplicating the new and
existing entries at the maximum pivot within the range, and thus maple
tree corruption.
The fix patch corrects this by detecting this scenario and disallowing the
mistaken duplicate copy.
The fix patch commit message goes into great detail as to how this occurs.
This series also includes a test which reliably reproduces the issue, and
asserts that the fix works correctly.
Bert has kindly tested the fix and confirmed it resolved his issues. Also
Mikhail Gavrilov kindly reported what appears to be precisely the same
bug, which this fix should also resolve.
This patch (of 2):
There has been a subtle bug present in the maple tree implementation from
its inception.
This arises from how stores are performed - when a store occurs, it will
overwrite overlapping ranges and adjust the tree as necessary to
accommodate this.
A range may always ultimately span two leaf nodes. In this instance we
walk the two leaf nodes, determine which elements are not overwritten to
the left and to the right of the start and end of the ranges respectively
and then rebalance the tree to contain these entries and the newly
inserted one.
This kind of store is dubbed a 'spanning store' and is implemented by
mas_wr_spanning_store().
In order to reach this stage, mas_store_gfp() invokes
mas_wr_preallocate(), mas_wr_store_type() and mas_wr_walk() in turn to
walk the tree and update the object (mas) to traverse to the location
where the write should be performed, determining its store type.
When a spanning store is required, this function returns false stopping at
the parent node which contains the target range, and mas_wr_store_type()
marks the mas->store_type as wr_spanning_store to denote this fact.
When we go to perform the store in mas_wr_spanning_store(), we first
determine the elements AFTER the END of the range we wish to store (that
is, to the right of the entry to be inserted) - we do this by walking to
the NEXT pivot in the tree (i.e. r_mas.last + 1), starting at the node we
have just determined contains the range over which we intend to write.
We then turn our attention to the entries to the left of the entry we are
inserting, whose state is represented by l_mas, and copy these into a 'big
node', which is a special node which contains enough slots to contain two
leaf node's worth of data.
We then copy the entry we wish to store immediately after this - the copy
and the insertion of the new entry is performed by mas_store_b_node().
After this we copy the elements to the right of the end of the range which
we are inserting, if we have not exceeded the length of the node (i.e.
r_mas.offset <= r_mas.end).
Herein lies the bug - under very specific circumstances, this logic can
break and corrupt the maple tree.
Consider the following tree:
Height
0 Root Node
/ \
pivot = 0xffff / \ pivot = ULONG_MAX
/ \
1 A [-----] ...
/ \
pivot = 0x4fff / \ pivot = 0xffff
/ \
2 (LEAVES) B [-----] [-----] C
^--- Last pivot 0xffff.
Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note
that all ranges expressed in maple tree code are inclusive):
1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then
determines that this is a spanning store across nodes B and C. The mas
state is set such that the current node from which we traverse further
is node A.
2. In mas_wr_spanning_store() we try to find elements to the right of pivot
0xffff by searching for an index of 0x10000:
- mas_wr_walk_index() invokes mas_wr_walk_descend() and
mas_wr_node_walk() in turn.
- mas_wr_node_walk() loops over entries in node A until EITHER it
finds an entry whose pivot equals or exceeds 0x10000 OR it
reaches the final entry.
- Since no entry has a pivot equal to or exceeding 0x10000, pivot
0xffff is selected, leading to node C.
- mas_wr_walk_traverse() resets the mas state to traverse node C. We
loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk()
in turn once again.
- Again, we reach the last entry in node C, which has a pivot of
0xffff.
3. We then copy the elements to the left of 0x4000 in node B to the big
node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry
too.
4. We determine whether we have any entries to copy from the right of the
end of the range via - and with r_mas set up at the entry at pivot
0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at
pivot 0xffff.
5. BUG! The maple tree is corrupted with a duplicate entry.
This requires a very specific set of circumstances - we must be spanning
the last element in a leaf node, which is the last element in the parent
node.
spanning store across two leaf nodes with a range that ends at that shared
pivot.
A potential solution to this problem would simply be to reset the walk
each time we traverse r_mas, however given the rarity of this situation it
seems that would be rather inefficient.
Instead, this patch detects if the right hand node is populated, i.e. has
anything we need to copy.
We do so by only copying elements from the right of the entry being
inserted when the maximum value present exceeds the last, rather than
basing this on offset position.
The patch also updates some comments and eliminates the unused bool return
value in mas_wr_walk_index().
The work performed in commit f8d112a4e6 ("mm/mmap: avoid zeroing vma
tree in mmap_region()") seems to have made the probability of this event
much more likely, which is the point at which reports started to be
submitted concerning this bug.
The motivation for this change arose from Bert Karwatzki's report of
encountering mm instability after the release of kernel v6.12-rc1 which,
after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration
options, was identified as maple tree corruption.
After Bert very generously provided his time and ability to reproduce this
event consistently, I was able to finally identify that the issue
discussed in this commit message was occurring for him.
Link: https://lkml.kernel.org/r/cover.1728314402.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Bert Karwatzki <spasswolf@web.de>
Closes: https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/
Tested-by: Bert Karwatzki <spasswolf@web.de>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Closes: https://lore.kernel.org/all/CABXGCsOPwuoNOqSMmAvWO2Fz4TEmPnjFj-b7iF+XFRu1h7-+Dg@mail.gmail.com/
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It is possible for a bulk operation (MA_STATE_BULK is set) to enter the
new_end < mt_min_slots[type] case and set wr_rebalance as a store type.
This is incorrect as bulk stores do not rebalance per write, but rather
after the all of the writes are done through the mas_bulk_rebalance()
path. Therefore, add a check to make sure MA_STATE_BULK is not set before
we return wr_rebalance as the store type.
Also add a test to make sure wr_rebalance is never the store type when
doing bulk operations via mas_expected_entries()
This is a hotfix for this rc however it has no userspace effects as there
are no users of the bulk insertion mode.
Link: https://lkml.kernel.org/r/20241011214451.7286-1-sidhartha.kumar@oracle.com
Fixes: 5d659bbb52 ("maple_tree: introduce mas_wr_store_type()")
Suggested-by: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The "err" variable may be returned without an initialized value.
Fixes: 8e3a67f2de ("crypto: lib/mpi - Add error checks to extension")
Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The freelist is freed at a constant rate independent of the actual usage
requirements. That's bad in scenarios where usage comes in bursts. The end
of a burst puts the objects on the free list and freeing proceeds even when
the next burst which requires objects started again.
Keep track of the usage with a exponentially wheighted moving average and
take that into account in the worker function which frees objects from the
free list.
This further reduces the kmem_cache allocation/free rate for a full kernel
compile:
kmem_cache_alloc() kmem_cache_free()
Baseline: 225k 173k
Usage: 170k 117k
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/all/87bjznhme2.ffs@tglx
Right now the per CPU pools are only refilled when they become
empty. That's suboptimal especially when there are still non-freed objects
in the to free list.
Check whether an allocation from the per CPU pool emptied a batch and try
to allocate from the free pool if that still has objects available.
kmem_cache_alloc() kmem_cache_free()
Baseline: 295k 245k
Refill: 225k 173k
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/all/20241007164914.439053085@linutronix.de
In situations where objects are rapidly allocated from the pool and handed
back, the size of the per CPU pool turns out to be too small.
Double the size of the per CPU pool.
This reduces the kmem cache allocation and free operations during a kernel compile:
alloc free
Baseline: 380k 330k
Double size: 295k 245k
Especially the reduction of allocations is important because that happens
in the hot path when objects are initialized.
The maximum increase in per CPU pool memory consumption is about 2.5K per
online CPU, which is acceptable.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/all/20241007164914.378676302@linutronix.de
Keep it along with the pool as that's a hot cache line anyway and it makes
the code more comprehensible.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/all/20241007164914.318776207@linutronix.de
Adding and removing single objects in a loop is bad in terms of lock
contention and cache line accesses.
To implement batching, record the last object in a batch in the object
itself. This is trivialy possible as hlists are strictly stacks. At a batch
boundary, when the first object is added to the list the object stores a
pointer to itself in debug_obj::batch_last. When the next object is added
to the list then the batch_last pointer is retrieved from the first object
in the list and stored in the to be added one.
That means for batch processing the first object always has a pointer to
the last object in a batch, which allows to move batches in a cache line
efficient way and reduces the lock held time.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/all/20241007164914.258995000@linutronix.de
Move the debug_obj::object pointer into a union and add a pointer to the
last node in a batch. That allows to implement batch processing efficiently
by utilizing the stack property of hlist:
When the first object of a batch is added to the list, then the batch
pointer is set to the hlist node of the object itself. Any subsequent add
retrieves the pointer to the last node from the first object in the list
and uses that for storing the last node pointer in the newly added object.
Add the pointer to the data structure and ensure that all relevant pool
sizes are strictly batch sized. The actual batching implementation follows
in subsequent changes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/all/20241007164914.139204961@linutronix.de
Convert it to batch processing with intermediate helper functions. This
reduces the final changes for batch processing.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/all/20241007164914.015906394@linutronix.de
__free_object() is uncomprehensibly complex. The same can be achieved by:
1) Adding the object to the per CPU pool
2) If that pool is full, move a batch of objects into the global pool
or if the global pool is full into the to free pool
This also prepares for batch processing.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/all/20241007164913.955542307@linutronix.de
The current allocation scheme tries to allocate from the per CPU pool
first. If that fails it allocates one object from the global pool and then
refills the per CPU pool from the global pool.
That is in the way of switching the pool management to batch mode as the
global pool needs to be a strict stack of batches, which does not allow
to allocate single objects.
Rework the code to refill the per CPU pool first and then allocate the
object from the refilled batch. Also try to allocate from the to free pool
first to avoid freeing and reallocating objects.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/all/20241007164913.893554162@linutronix.de
Having the accounting in the datastructure is better in terms of cache
lines and allows more optimizations later on.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/all/20241007164913.831908427@linutronix.de
No point in having a separate data structure. Reuse struct obj_pool and
tidy up the code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/all/20241007164913.770595795@linutronix.de
There is no point to handle the statically allocated objects during early
boot in the actual pool list. This phase does not require accounting, so
all of the related complexity can be avoided.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/all/20241007164913.708939081@linutronix.de
The contention on the global pool lock can be reduced by strict batch
processing where batches of objects are moved from one list head to another
instead of moving them object by object. This also reduces the cache
footprint because it avoids the list walk and dirties at maximum three
cache lines instead of potentially up to eighteen.
To prepare for that, move the hlist head and related counters into a
struct.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/all/20241007164913.646171170@linutronix.de
The contention on the global pool_lock can be massive when the global pool
needs to be refilled and many CPUs try to handle this.
Address this by:
- splitting the refill from free list and allocation.
Refill from free list has no constraints vs. the context on RT, so
it can be tried outside of the RT specific preemptible() guard
- Let only one CPU handle the free list
- Let only one CPU do allocations unless the pool level is below
half of the minimum fill level.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240911083521.2257-4-thunder.leizhen@huawei.com-
Link: https://lore.kernel.org/all/20241007164913.582118421@linutronix.de
--
lib/debugobjects.c | 84 +++++++++++++++++++++++++++++++++++++----------------
1 file changed, 59 insertions(+), 25 deletions(-)
Freeing the per CPU pool of the unplugged CPU directly is suboptimal as the
objects can be reused in the real pool if there is room. Aside of that this
gets the accounting wrong.
Use the regular free path, which allows reuse and has the accounting correct.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/all/20241007164913.263960570@linutronix.de
debug_objects_mem_init() is invoked from mm_core_init() before work queues
are available. If debug_objects_mem_init() destroys the kmem cache in the
error path it causes an Oops in __queue_work():
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
RIP: 0010:__queue_work+0x35/0x6a0
queue_work_on+0x66/0x70
flush_all_cpus_locked+0xdf/0x1a0
__kmem_cache_shutdown+0x2f/0x340
kmem_cache_destroy+0x4e/0x150
mm_core_init+0x9e/0x120
start_kernel+0x298/0x800
x86_64_start_reservations+0x18/0x30
x86_64_start_kernel+0xc5/0xe0
common_startup_64+0x12c/0x138
Further the object cache pointer is used in various places to check for
early boot operation. It is exposed before the replacments for the static
boot time objects are allocated and the self test operates on it.
This can be avoided by:
1) Running the self test with the static boot objects
2) Exposing it only after the replacement objects have been added to
the pool.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20241007164913.137021337@linutronix.de
The statically allocated objects are all located in obj_static_pool[],
the whole memory of obj_static_pool[] will be reclaimed later. Therefore,
there is no need to split the remaining statically nodes in list obj_pool
into isolated ones, no one will use them anymore. Just write
INIT_HLIST_HEAD(&obj_pool) is enough. Since hlist_move_list() directly
discards the old list, even this can be omitted.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240911083521.2257-2-thunder.leizhen@huawei.com
Link: https://lore.kernel.org/all/20241007164913.009849239@linutronix.de
Syzbot reports a KASAN failure early during boot on arm64 when building
with GCC 12.2.0 and using the Software Tag-Based KASAN mode:
| BUG: KASAN: invalid-access in smp_build_mpidr_hash arch/arm64/kernel/setup.c:133 [inline]
| BUG: KASAN: invalid-access in setup_arch+0x984/0xd60 arch/arm64/kernel/setup.c:356
| Write of size 4 at addr 03ff800086867e00 by task swapper/0
| Pointer tag: [03], memory tag: [fe]
Initial triage indicates that the report is a false positive and a
thorough investigation of the crash by Mark Rutland revealed the root
cause to be a bug in GCC:
> When GCC is passed `-fsanitize=hwaddress` or
> `-fsanitize=kernel-hwaddress` it ignores
> `__attribute__((no_sanitize_address))`, and instruments functions
> we require are not instrumented.
>
> [...]
>
> All versions [of GCC] I tried were broken, from 11.3.0 to 14.2.0
> inclusive.
>
> I think we have to disable KASAN_SW_TAGS with GCC until this is
> fixed
Disable Software Tag-Based KASAN when building with GCC by making
CC_HAS_KASAN_SW_TAGS depend on !CC_IS_GCC.
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: syzbot+908886656a02769af987@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000f362e80620e27859@google.com
Link: https://lore.kernel.org/r/ZvFGwKfoC4yVjN_X@J2N7QTR9R3
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218854
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20241014161100.18034-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The fwnode_handle passed into find_io_range_by_fwnode() and
logic_pio_trans_hwaddr() are not modified, so make them const.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20241010-dt-const-v1-2-87a51f558425@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Simplify devm_ioport_unmap() implementation by dedicated API
devres_release(), compared with current solution, namely
ioport_unmap() + devres_destroy(), devres_release() has below advantages:
- it is simpler if devm_ioport_unmap()'s parameter @addr was ever
returned by devm_ioport_map().
- it can avoid unnecessary ioport_unmap(@addr) if @addr was not
ever returned by devm_ioport_map().
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20240918-fix_lib_devres-v1-2-e696ab5486e6@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Simplify devm_iounmap() implementation by dedicated API devres_release()
compared with current solution, namely, devres_destroy() + iounmap()
devres_release() has the following advantages:
- it is simpler if devm_iounmap()'s parameter @addr is valid, namely
@addr was ever returned by one of devm_ioremap() variants.
- it can avoid unnecessary iounmap(@addr) if @addr is not valid.
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20240918-fix_lib_devres-v1-1-e696ab5486e6@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kunit_kzalloc() may fail. Other call sites verify that this is the case,
either using a direct comparison with the NULL pointer, or the
KUNIT_ASSERT_NOT_NULL() or KUNIT_ASSERT_NOT_ERR_OR_NULL().
Pick KUNIT_ASSERT_NOT_NULL() as the error handling method that made most
sense to me. It's an unlikely thing to happen, but at least we call
__kunit_abort() instead of dereferencing this NULL pointer.
Fixes: e9502ea6db ("lib: packing: add KUnit tests adapted from selftests")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241004110012.1323427-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "err" variable may be returned without an initialized value.
Fixes: 8e3a67f2de ("crypto: lib/mpi - Add error checks to extension")
Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmb/8bcACgkQu+CwddJF
iJoApwf5AWWhKFbbYwFUCXDi7+/Xr7T7c9H9q+GAEOQiDLsDxihEAo1KYQ+DLl+h
Vp1ddRYIKMIUfllW3bcD4O6C8L46OX3XPHhTHnksEfvtn3fQGjcU3jKH8n0eL01J
s9eUdvduNSJorAWqjFPPRrGuLJTXmervrDYYPJLaXGITHHMOxMjKfLAxtXehvARv
mVQV1F0NTvvNqieuibUCM5XqJs37lrmqB39pLun7bQDU48z4OR1L3nkJxTFF1bGm
EcvAPayTiNybMt08QSVHIwqfSs+e0HmyKqjvSLpJPImDrfSrWOJvBCJxI4DU+1aw
UiHyWYLaxWZ7DoJgtZuHV2//8wOWww==
=EXEA
-----END PGP SIGNATURE-----
Merge tag 'slab-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
"Fixes for issues introduced in this merge window: kobject memory leak,
unsupressed warning and possible lockup in new slub_kunit tests,
misleading code in kvfree_rcu_queue_batch()"
* tag 'slab-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in
mm, slab: suppress warnings in test_leak_destroy kunit test
rcu/kvfree: Refactor kvfree_rcu_queue_batch()
mm, slab: fix use of SLAB_SUPPORTS_SYSFS in kmem_cache_release()
The QUIRK_MSB_ON_THE_RIGHT quirk is intended to modify pack() and unpack()
so that the most significant bit of each byte in the packed layout is on
the right.
The way the quirk is currently implemented is broken whenever the packing
code packs or unpacks any value that is not exactly a full byte.
The broken behavior can occur when packing any values smaller than one
byte, when packing any value that is not exactly a whole number of bytes,
or when the packing is not aligned to a byte boundary.
This quirk is documented in the following way:
1. Normally (no quirks), we would do it like this:
::
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
7 6 5 4
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 2 1 0
<snip>
2. If QUIRK_MSB_ON_THE_RIGHT is set, we do it like this:
::
56 57 58 59 60 61 62 63 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47 32 33 34 35 36 37 38 39
7 6 5 4
24 25 26 27 28 29 30 31 16 17 18 19 20 21 22 23 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
3 2 1 0
That is, QUIRK_MSB_ON_THE_RIGHT does not affect byte positioning, but
inverts bit offsets inside a byte.
Essentially, the mapping for physical bit offsets should be reserved for a
given byte within the payload. This reversal should be fixed to the bytes
in the packing layout.
The logic to implement this quirk is handled within the
adjust_for_msb_right_quirk() function. This function does not work properly
when dealing with the bytes that contain only a partial amount of data.
In particular, consider trying to pack or unpack the range 53-44. We should
always be mapping the bits from the logical ordering to their physical
ordering in the same way, regardless of what sequence of bits we are
unpacking.
This, we should grab the following logical bits:
Logical: 55 54 53 52 51 50 49 48 47 45 44 43 42 41 40 39
^ ^ ^ ^ ^ ^ ^ ^ ^
And pack them into the physical bits:
Physical: 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47
Logical: 48 49 50 51 52 53 44 45 46 47
^ ^ ^ ^ ^ ^ ^ ^ ^ ^
The current logic in adjust_for_msb_right_quirk is broken. I believe it is
intending to map according to the following:
Physical: 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47
Logical: 48 49 50 51 52 53 44 45 46 47
^ ^ ^ ^ ^ ^ ^ ^ ^ ^
That is, it tries to keep the bits at the start and end of a packing
together. This is wrong, as it makes the packing change what bit is being
mapped to what based on which bits you're currently packing or unpacking.
Worse, the actual calculations within adjust_for_msb_right_quirk don't make
sense.
Consider the case when packing the last byte of an unaligned packing. It
might have a start bit of 7 and an end bit of 5. This would have a width of
3 bits. The new_start_bit will be calculated as the width - the box_end_bit
- 1. This will underflow and produce a negative value, which will
ultimate result in generating a new box_mask of all 0s.
For any other values, the result of the calculations of the
new_box_end_bit, new_box_start_bit, and the new box_mask will result in the
exact same values for the box_end_bit, box_start_bit, and box_mask. This
makes the calculations completely irrelevant.
If box_end_bit is 0, and box_start_bit is 7, then the entire function of
adjust_for_msb_right_quirk will boil down to just:
*to_write = bitrev8(*to_write)
The other adjustments are attempting (incorrectly) to keep the bits in the
same place but just reversed. This is not the right behavior even if
implemented correctly, as it leaves the mapping dependent on the bit values
being packed or unpacked.
Remove adjust_for_msb_right_quirk() and just use bitrev8 to reverse the
byte order when interacting with the packed data.
In particular, for packing, we need to reverse both the box_mask and the
physical value being packed. This is done after shifting the value by
box_end_bit so that the reversed mapping is always aligned to the physical
buffer byte boundary. The box_mask is reversed as we're about to use it to
clear any stale bits in the physical buffer at this block.
For unpacking, we need to reverse the contents of the physical buffer
*before* masking with the box_mask. This is critical, as the box_mask is a
logical mask of the bit layout before handling the QUIRK_MSB_ON_THE_RIGHT.
Add several new tests which cover this behavior. These tests will fail
without the fix and pass afterwards. Note that no current drivers make use
of QUIRK_MSB_ON_THE_RIGHT. I suspect this is why there have been no reports
of this inconsistency before.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-8-8373e551eae3@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While reviewing the initial KUnit tests for lib/packing, Przemek pointed
out that the test values have duplicate bytes in the input sequence.
In addition, I noticed that the unit tests pack and unpack on a byte
boundary, instead of crossing bytes. Thus, we lack good coverage of the
corner cases of the API.
Add additional unit tests to cover packing and unpacking byte buffers which
do not have duplicate bytes in the unpacked value, and which pack and
unpack to an unaligned offset.
A careful reviewer may note the lack tests for QUIRK_MSB_ON_THE_RIGHT. This
is because I found issues with that quirk during test implementation. This
quirk will be fixed and the tests will be included in a future change.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-7-8373e551eae3@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add 24 simple KUnit tests for the lib/packing.c pack() and unpack() APIs.
The first 16 tests exercise all combinations of quirks with a simple magic
number value on a 16-byte buffer. The remaining 8 tests cover
non-multiple-of-4 buffer sizes.
These tests were originally written by Vladimir as simple selftest
functions. I adapted them to KUnit, refactoring them into a table driven
approach. This will aid in adding additional tests in the future.
Co-developed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-6-8373e551eae3@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
packing() is now used in some hot paths, and it would be good to get rid
of some ifs and buts that depend on "op", to speed things up a little bit.
With the main implementations now taking size_t endbit, we no longer
have to check for negative values. Update the local integer variables to
also be size_t to match.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-5-8373e551eae3@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geert Uytterhoeven described packing() as "really bad API" because of
not being able to enforce const correctness. The same function is used
both when "pbuf" is input and "uval" is output, as in the other way
around.
Create 2 wrapper functions where const correctness can be ensured.
Do ugly type casts inside, to be able to reuse packing() as currently
implemented - which will _not_ modify the input argument.
Also, take the opportunity to change the type of startbit and endbit to
size_t - an unsigned type - in these new function prototypes. When int,
an extra check for negative values is necessary. Hopefully, when
packing() goes away completely, that check can be dropped.
My concern is that code which does rely on the conditional directionality
of packing() is harder to refactor without blowing up in size. So it may
take a while to completely eliminate packing(). But let's make alternatives
available for those who do not need that.
Link: https://lore.kernel.org/netdev/20210223112003.2223332-1-geert+renesas@glider.be/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-4-8373e551eae3@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller has a use case for packing() in the intel/ice networking
driver, but it cannot be used as-is.
Simply put, the API quirks for LSW32_IS_FIRST and LITTLE_ENDIAN are
naively implemented with the undocumented assumption that the buffer
length must be a multiple of 4. All calculations of group offsets and
offsets of bytes within groups assume that this is the case. But in the
ice case, this does not hold true. For example, packing into a buffer
of 22 bytes would yield wrong results, but pretending it was a 24 byte
buffer would work.
Rather than requiring such hacks, and leaving a big question mark when
it comes to discontinuities in the accessible bit fields of such buffer,
we should extend the packing API to support this use case.
It turns out that we can keep the design in terms of groups of 4 bytes,
but also make it work if the total length is not a multiple of 4.
Just like before, imagine the buffer as a big number, and its most
significant bytes (the ones that would make up to a multiple of 4) are
missing. Thus, with a big endian (no quirks) interpretation of the
buffer, those most significant bytes would be absent from the beginning
of the buffer, and with a LSW32_IS_FIRST interpretation, they would be
absent from the end of the buffer. The LITTLE_ENDIAN quirk, in the
packing() API world, only affects byte ordering within groups of 4.
Thus, it does not change which bytes are missing. Only the significance
of the remaining bytes within the (smaller) group.
No change intended for buffer sizes which are multiples of 4. Tested
with the sja1105 driver and with downstream unit tests.
Link: https://lore.kernel.org/netdev/a0338310-e66c-497c-bc1f-a597e50aa3ff@intel.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-2-8373e551eae3@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While reworking the implementation, it became apparent that this check
does not exist.
There is no functional issue yet, because at call sites, "startbit" and
"endbit" are always hardcoded to correct values, and never come from the
user.
Even with the upcoming support of arbitrary buffer lengths, the
"startbit >= 8 * pbuflen" check will remain correct. This is because
we intend to always interpret the packed buffer in a way that avoids
discontinuities in the available bit indices.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-1-8373e551eae3@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZv5Y3gAKCRCRxhvAZXjc
ojFPAP45kz5JgVKFn8iZmwfjPa7qbCa11gEzmx0SbUt3zZ3mJAD/fL9k9KaNU+qA
LIcZW5BJn/p5fumUAw8/fKoz4ajCWQk=
=LIz1
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.12-rc2.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"vfs:
- Ensure that iter_folioq_get_pages() advances to the next slot
otherwise it will end up using the same folio with an out-of-bound
offset.
iomap:
- Dont unshare delalloc extents which can't be reflinked, and thus
can't be shared.
- Constrain the file range passed to iomap_file_unshare() directly in
iomap instead of requiring the callers to do it.
netfs:
- Use folioq_count instead of folioq_nr_slot to prevent an
unitialized value warning in netfs_clear_buffer().
- Fix missing wakeup after issuing writes by scheduling the write
collector only if all the subrequest queues are empty and thus no
writes are pending.
- Fix two minor documentation bugs"
* tag 'vfs-6.12-rc2.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
iomap: constrain the file range passed to iomap_file_unshare
iomap: don't bother unsharing delalloc extents
netfs: Fix missing wakeup after issuing writes
Documentation: add missing folio_queue entry
folio_queue: fix documentation
netfs: Fix a KMSAN uninit-value error in netfs_clear_buffer
iov_iter: fix advancing slot in iter_folioq_get_pages()
Substitute the inclusion of <linux/random.h> header with
<linux/prandom.h> to allow the removal of legacy inclusion
of <linux/prandom.h> from <linux/random.h>.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Substitute the inclusion of <linux/random.h> header with
<linux/prandom.h> to allow the removal of legacy inclusion
of <linux/prandom.h> from <linux/random.h>.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Substitute the inclusion of <linux/random.h> header with
<linux/prandom.h> to allow the removal of legacy inclusion
of <linux/prandom.h> from <linux/random.h>.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Substitute the inclusion of <linux/random.h> header with
<linux/prandom.h> to allow the removal of legacy inclusion
of <linux/prandom.h> from <linux/random.h>.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Substitute the inclusion of <linux/random.h> header with
<linux/prandom.h> to allow the removal of legacy inclusion
of <linux/prandom.h> from <linux/random.h>.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Include <linux/random.h> header to allow the removal of legacy
inclusion of <linux/prandom.h> from <linux/random.h>.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Rae Moar <rmoar@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Substitute the inclusion of <linux/random.h> header with
<linux/prandom.h> to allow the removal of legacy inclusion
of <linux/prandom.h> from <linux/random.h>.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
Guenter Roeck reports that the new slub kunit tests added by commit
4e1c44b3db ("kunit, slub: add test_kfree_rcu() and
test_leak_destroy()") cause a lockup on boot on several architectures
when the kunit tests are configured to be built-in and not modules.
The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection
showed the runner for built-in kunit tests kunit_run_all_tests() is
called before setting system_state to SYSTEM_RUNNING and calling
rcu_end_inkernel_boot(), so this seems like a likely cause. So while I
was unable to reproduce the problem myself, skipping the test when the
slub_kunit module is built-in should avoid the issue.
An alternative fix that was moving the call to kunit_run_all_tests() a
bit later in the boot was tried, but has broken tests with functions
marked as __init due to free_initmem() already being done.
Fixes: 4e1c44b3db ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: rcu@vger.kernel.org
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Rae Moar <rmoar@google.com>
Cc: linux-kselftest@vger.kernel.org
Cc: kunit-dev@googlegroups.com
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The test_leak_destroy kunit test intends to test the detection of stray
objects in kmem_cache_destroy(), which normally produces a warning. The
other slab kunit tests suppress the warnings in the kunit test context,
so suppress warnings and related printk output in this test as well.
Automated test running environments then don't need to learn to filter
the warnings.
Also rename the test's kmem_cache, the name was wrongly copy-pasted from
test_kfree_rcu.
Fixes: 4e1c44b3db ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202408251723.42f3d902-oliver.sang@intel.com
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Closes: https://lore.kernel.org/all/CAB=+i9RHHbfSkmUuLshXGY_ifEZg9vCZi3fqr99+kmmnpDus7Q@mail.gmail.com/
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
iter_folioq_get_pages() decides to advance to the next folioq slot when
it has reached the end of the current folio. However, it is checking
offset, which is the beginning of the current part, instead of
iov_offset, which is adjusted to the end of the current part, so it
doesn't advance the slot when it's supposed to. As a result, on the next
iteration, we'll use the same folio with an out-of-bounds offset and
return an unrelated page.
This manifested as various crashes and other failures in 9pfs in drgn's
VM testing setup and BPF CI.
Fixes: db0aa2e956 ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios")
Link: https://lore.kernel.org/linux-fsdevel/20240923183432.1876750-1-chantr4@gmail.com/
Tested-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Link: https://lore.kernel.org/r/cbaf141ba6c0e2e209717d02746584072844841a.1727722269.git.osandov@fb.com
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Tested-by: Leon Romanovsky <leon@kernel.org>
Tested-by: Joey Gouly <joey.gouly@arm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
- switch all bitmamp APIs from inline to __always_inline from Brian Norris;
- introduce GENMASK_U128() macro from Anshuman Khandual;
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmb22isACgkQsUSA/Tof
vsie2gwAl3l5vye90xnD6N8wFmKBKAWXMn8Iby7JyM9gAn6j1QuE5AppS+3JtIpZ
rPRSgFZIVPOgBtiKjb6zAWj7KbtCmaSW+L5ZVaLQ+vtwBVNpWIWHsHKu0uIpuugT
3wp/IeaE92bc/mioqb27pj2Gnv+lzYBmbK7Mu08a3q1Adwv0I7BJ4GvqxN1lLAEW
xrFB86xztqdV7QC45J7Q5nIyUw7UBYK078elQ8iKSj5BR8MeaEJiavETwx9DHgAO
Z8cG94ek3IpvLpiexNcgG+FTezZj9PnTVHxry9o7CIctafiqjYqXAJ9gks1Q4QUu
q1IjPAdueLTAMPkpK67sI3fwC6zPyX5d8DVDUTuA6qhCsMyHW687gTRy4LPR14LL
gd1Tzg+J9DQ5KBoG4TYN/g5VoP1hkKQqpetaJhdPqmYocfmqZuzyItb+gBjhyvSp
3YOgLg/4lULy3sZ6Qd/q8CWglWlaNYXXzf13H8f2qUpVx4NLTDOwjj/CVjZR/D0C
wje/8XU3
=8jNc
-----END PGP SIGNATURE-----
Merge tag 'bitmap-for-6.12' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- switch all bitmamp APIs from inline to __always_inline (Brian Norris)
The __always_inline series improves on code generation, and now with
the latest compiler versions is required to avoid compilation
warnings. It spent enough in my backlog, and I'm thankful to Brian
Norris for taking over and moving it forward.
- introduce GENMASK_U128() macro (Anshuman Khandual)
GENMASK_U128() is a prerequisite needed for arm64 development
* tag 'bitmap-for-6.12' of https://github.com/norov/linux:
lib/test_bits.c: Add tests for GENMASK_U128()
uapi: Define GENMASK_U128
nodemask: Switch from inline to __always_inline
cpumask: Switch from inline to __always_inline
bitmap: Switch from inline to __always_inline
find: Switch from inline to __always_inline
There's a focus on fixes for the memfd_pin_folios() work which was added
into 6.11. Apart from that, the usual shower of singleton fixes.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZvbhSAAKCRDdBJ7gKXxA
jp8CAP47txk2c+tBLggog2MkQamADY5l5MT6E3fYq3ghSiKtVQEAnqX3LiQJ02tB
o9LcPcVrM90QntpKrLP1CpWCVdR+zA8=
=e0QC
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-09-27-09-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"19 hotfixes. 13 are cc:stable.
There's a focus on fixes for the memfd_pin_folios() work which was
added into 6.11. Apart from that, the usual shower of singleton fixes"
* tag 'mm-hotfixes-stable-2024-09-27-09-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
ocfs2: fix uninit-value in ocfs2_get_block()
zram: don't free statically defined names
memory tiers: use default_dram_perf_ref_source in log message
Revert "list: test: fix tests for list_cut_position()"
kselftests: mm: fix wrong __NR_userfaultfd value
compiler.h: specify correct attribute for .rodata..c_jump_table
mm/damon/Kconfig: update DAMON doc URL
mm: kfence: fix elapsed time for allocated/freed track
ocfs2: fix deadlock in ocfs2_get_system_file_inode
ocfs2: reserve space for inline xattr before attaching reflink tree
mm: migrate: annotate data-race in migrate_folio_unmap()
mm/hugetlb: simplify refs in memfd_alloc_folio
mm/gup: fix memfd_pin_folios alloc race panic
mm/gup: fix memfd_pin_folios hugetlb page allocation
mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak
mm/hugetlb: fix memfd_pin_folios free_huge_pages leak
mm/filemap: fix filemap_get_folios_contig THP panic
mm: make SPLIT_PTE_PTLOCKS depend on SMP
tools: fix shared radix-tree build
This reverts commit e620799c41.
The commit introduces unit test failures.
Expected cur == &entries[i], but
cur == 0000037fffadfd80
&entries[i] == 0000037fffadfd60
# list_test_list_cut_position: pass:0 fail:1 skip:0 total:1
not ok 21 list_test_list_cut_position
# list_test_list_cut_before: EXPECTATION FAILED at lib/list-test.c:444
Expected cur == &entries[i], but
cur == 0000037fffa9fd70
&entries[i] == 0000037fffa9fd60
# list_test_list_cut_before: EXPECTATION FAILED at lib/list-test.c:444
Expected cur == &entries[i], but
cur == 0000037fffa9fd80
&entries[i] == 0000037fffa9fd70
Revert it.
Link: https://lkml.kernel.org/r/20240922150507.553814-1-linux@roeck-us.net
Fixes: e620799c41 ("list: test: fix tests for list_cut_position()")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: I Hsin Cheng <richard120310@gmail.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmb0T5AQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpnfHEADCXqmqZC+xr3sHZH9T1lz9KaFp1FjuBhCw
bGpUgXQ9aLcqQUWJxmYVer8N2x2+Ds+xq4fm/rP1BfvNgRupqheHBwuLxSrz14EX
lYmKZ+krMIPTDaLFewmEWflDwmZX0WFgV6nKTMLiO5BMeI4zXCkFGtwYFys2+Cdd
9zYCFPgGDZUR77Ws5PpyqPVz2MoiNtsjrGmHpEmNZ+rIDzlpVOYgYk27X9ZbvNxC
/l0KTc9+ayAeG0Kx5jO+m6Hrj3I6ehvM9JZMgpS/tF/jtccD2oVkJFJDlU+Jciv6
BwVzgyDPGV7sXFT1fnSqDBYYwr/73nzNH0Gk8wn4Jg2LhjmVANVo9eQSOXDTYZI+
O4HfIHGTIrk75TQd4bhq3dqaylS78pKBI/eQJUli2UNoyLWMrMyE88yh2YJam2Fs
vJ/MHGxvFRurYbAlqLr33nb3ajvpg+D7XuAYfqHPMc2ZUe28Kza50Dj+luNjfVCu
3qfR6qBlsdWuABtUS3vneB9jZp5jDnOpVfuBgtcAqIboUjehTXsI7If09Ex/mxLq
O0KqNwBMfunPOKd5kGXlAgY8LRMfOhNaAAFBlXYUZB2eAadQnqVselTFvHMZkXo7
wH/l6trd+/Tf+7Rav0YduNIlpVr7IctC+A7ph4zPdIjQxFEySCrC7cvAjel29LyV
zgWW0Mw/sA==
=yiWu
-----END PGP SIGNATURE-----
Merge tag 'for-6.12/block-20240925' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:
- Improve blk-integrity segment counting and merging (Keith)
- NVMe pull request via Keith:
- Multipath fixes (Hannes)
- Sysfs attribute list NULL terminate fix (Shin'ichiro)
- Remove problematic read-back (Keith)
- Fix for a regression with the IO scheduler switching freezing from
6.11 (Damien)
- Use a raw spinlock for sbitmap, as it may get called from preempt
disabled context (Ming)
- Cleanup for bd_claiming waiting, using var_waitqueue() rather than
the bit waitqueues, as that more accurately describes that it does
(Neil)
- Various cleanups (Kanchan, Qiu-ji, David)
* tag 'for-6.12/block-20240925' of git://git.kernel.dk/linux:
nvme: remove CC register read-back during enabling
nvme: null terminate nvme_tls_attrs
nvme-multipath: avoid hang on inaccessible namespaces
nvme-multipath: system fails to create generic nvme device
lib/sbitmap: define swap_lock as raw_spinlock_t
block: Remove unused blk_limits_io_{min,opt}
drbd: Fix atomicity violation in drbd_uuid_set_bm()
block: Fix elv_iosched_local_module handling of "none" scheduler
block: remove bogus union
block: change wait on bd_claiming to use a var_waitqueue
blk-integrity: improved sg segment mapping
block: unexport blk_rq_count_integrity_sg
nvme-rdma: use request to get integrity segments
scsi: use request to get integrity segments
block: provide a request helper for user integrity segments
blk-integrity: consider entire bio list for merging
blk-integrity: properly account for segments
blk-mq: set the nr_integrity_segments from bio
blk-mq: unconditional nr_integrity_segments
- Support cross-compiling linux-headers Debian package and kernel-devel
RPM package
- Add support for the linux-debug Pacman package
- Improve module rebuilding speed by factoring out the common code to
scripts/module-common.c
- Separate device tree build rules into scripts/Makefile.dtbs
- Add a new script to generate modules.builtin.ranges, which is useful
for tracing tools to find symbols in built-in modules
- Refactor Kconfig and misc tools
- Update Kbuild and Kconfig documentation
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmby2+QVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGpQ0QALWMgox3OdceNiBT8QieqRFfwKFv
5jxtsZt+MbTdWNMEfgc4Cq2i5ZAqpYGZh32RwTiZJogBvYEIoO7M4Md9VwoEe/BC
q8VZ6FhUy7358IX/FCukfB0dYvkziRalBRDrE4iFmMMdhBvZ9nrvMxllqFCMllLj
DTrBTTiMus3qiiczr4tb5QwaIR6C+yqiEBF++ftLmWvo9dn8YNNUnI65fGjyQM/w
0wMPwsB3Y2HdnRpLUS6T18gZbjoXsAk4+WX0TpdBfTs3d7AdbzlSMtc0BslEm6Tb
JjIK6SbJCM3kNC7O0/gsUenOaSBxSbKjjg33gQxn/eNoi0nRt+qnBMMreYiTd95G
Hq86QcNfKQtWAagKRTppMkYEDqMU2RKH7BmJOsfQyeG9cGpAAu+0HsQv3f/h5QP1
MlA8o+NP5oQn6RbrhZz1Pqm24+OMxiXaBhmo8XbZ+MXzi/CBR54Eo4ip/FSHzXII
EGEAQL7t7YU7xu8qMIE6ZQMH7BJsjJNee0vrNiYZa4xHLYyHi6mJl8K6LlHQ3nEx
WOsPX9MLITtSJwcvIio/0sEnuR7pjcShGfqhbHO5tiOYznsbcSvu3+18HPGCpFRt
vYFkNIRc298k7++A+Zp2wwdD2TS+SSilrAImmJXMhf0M+Nyg2vnlfAo8t0QSkFlh
1g9dJuy+8jYRjHXP
=g4t/
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Support cross-compiling linux-headers Debian package and kernel-devel
RPM package
- Add support for the linux-debug Pacman package
- Improve module rebuilding speed by factoring out the common code to
scripts/module-common.c
- Separate device tree build rules into scripts/Makefile.dtbs
- Add a new script to generate modules.builtin.ranges, which is useful
for tracing tools to find symbols in built-in modules
- Refactor Kconfig and misc tools
- Update Kbuild and Kconfig documentation
* tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (51 commits)
kbuild: doc: replace "gcc" in external module description
kbuild: doc: describe the -C option precisely for external module builds
kbuild: doc: remove the description about shipped files
kbuild: doc: drop section numbering, use references in modules.rst
kbuild: doc: throw out the local table of contents in modules.rst
kbuild: doc: remove outdated description of the limitation on -I usage
kbuild: doc: remove description about grepping CONFIG options
kbuild: doc: update the description about Kbuild/Makefile split
kbuild: remove unnecessary export of RUST_LIB_SRC
kbuild: remove append operation on cmd_ld_ko_o
kconfig: cache expression values
kconfig: use hash table to reuse expressions
kconfig: refactor expr_eliminate_dups()
kconfig: add comments to expression transformations
kconfig: change some expr_*() functions to bool
scripts: move hash function from scripts/kconfig/ to scripts/include/
kallsyms: change overflow variable to bool type
kallsyms: squash output_address()
kbuild: add install target for modules.builtin.ranges
scripts: add verifier script for builtin module range data
...
support for the newly ratified DT property 'assigned-clock-rates-u64'. I'm much
more excited about the support for loading DT overlays from KUnit tests so that
we can test how the clk framework parses DT nodes during clk registration. The
clk framework has some places that are highly DeviceTree dependent so this
charts the path to extend the KUnit tests to cover even more framework code in
the future. I've got some more tests on the list that use the DT overlay
support, but they uncovered issues with clk unregistration that I'm still
working on fixing.
Outside the core, the clk driver update pile is dominated by Qualcomm and
Renesas SoCs, making it fairly usual. Looking closer, there are fixes for
things all over the place, like adding missing clk frequencies or moving
defines for the number of clks out of DT binding headers into the drivers.
There are even conversions of DT bindings to YAML and migration away from
strings to describe clk topology. Overall it doesn't look unusual so I expect
the new drivers to be where we'll have fixes in the coming weeks.
Core:
- KUnit tests for clk registration and fixed rate basic clk type
- A couple more devm helpers, one consumer and one provider
- Support for assigned-clock-rates-u64
New Drivers:
- Camera, display and GPU clocks on Qualcomm SM4450
- Camera clocks on Qualcomm SM8150
- Rockchip rk3576 clks
- Microchip SAM9X7 clks
- Renesas RZ/V2H(P) (R9A09G057) clks
Updates:
- Mark a bunch of struct freq_tbl const to reduce .data usage
- Add Qualcomm MSM8226 A7PLL and Regera PLL support
- Fix the Qualcomm Lucid 5LPE PLL configuration sequence to not reuse
Trion, as they do differ
- A number of fixes to the Qualcomm SM8550 display clock driver
- Fold Qualcomm SM8650 display clock driver into SM8550 one
- Add missing clocks and GDSCs needed for audio on Qualcomm MSM8998
- Add missing USB MP resets, GPLL9, and QUPv3 DFS to Qualcomm SC8180X
- Fix sdcc clk frequency tables on Qualcomm SC8180X
- Drop the Qualcomm SM8150 gcc_cpuss_ahb_clk_src
- Mark Qualcomm PCIe GDSCs as RET_ON on sm8250 and sm8540 to avoid them
turning off during suspend
- Use the HW_CTRL mechanism on Qualcomm SM8550 video clock controller
GDSCs
- Get rid of CLK_NR_CLKS defines in Rockchip DT binding headers
- Some fixes for Rockchip rk3228 and rk3588
- Exynos850: Add clock for Thermal Management Unit
- Exynos7885: Fix duplicated ID in the header, add missing TOP PLLs and
add clocks for USB block in the FSYS clock controller
- ExynosAutov9: Add DPUM clock controller
- ExynosAutov920: Add new (first) clock controllers: TOP and PERIC0
(and a bit more complete bindings)
- Use clk_hw pointer instead of fw_name for acm_aud_clk[0-1]_sel clocks
on i.MX8Q as parents in ACM provider
- Add i.MX95 NETCMIX support to the block control provider
- Fix parents for ENETx_REF_SEL clocks on i.MX6UL
- Add USB clocks, resets and power domains on Renesas RZ/G3S
- Add Generic Timer (GTM), I2C Bus Interface (RIIC), SD/MMC Host
Interface (SDHI) and Watchdog Timer (WDT) clocks and resets on
Renesas RZ/V2H
- Add PCIe, PWM, and CAN-FD clocks on Renesas R-Car V4M
- Add LCD controller clocks and resets on Renesas RZ/G2UL
- Add DMA clocks and resets on Renesas RZ/G3S
- Add fractional multiplication PLL support on Renesas R-Car Gen4
- Document support for the Renesas RZ/G2M v3.0 (r8a774a3) SoC
- Support for the Microchip SAM9X7 SoC as follows:
- Updates for the Microchip PLL drivers
- DT binding documentation updates (for the new clock driver and for
the slow clock controller that SAM9X7 is using)
- A fix for the Microchip SAMA7G5 clock driver to avoid allocating more
memory than necessary
- Constify some Amlogic structs
- Add SM1 eARC clocks for Amlogic
- Introduce a symbol namespace for Amlogic clock specific symbols
- Add reset controller support to audiomix block control on i.MX
- Add CLK_SET_RATE_PARENT flag to all audiomix clocks and to
i.MX7D lcdif_pixel_src clock
- Fix parent clocks for earc_phy and audpll on i.MX8MP
- Fix default parents for enet[12]_ref_sel on i.MX6UL
- Add ops in composite 8M and 93 that allow no-op on disable
- Add check for PCC present bit on composite 7ULP register
- Fix fractional part for fracn-gppll on prepare in i.MX
- Fix clock tree update for TF-A managed clocks on i.MX8M
- Drop CLK_SET_PARENT_GATE for DRAM mux on i.MX7D
- Add the SAI7 IPG clock for i.MX8MN
- Mark the 'nand_usdhc_bus' clock as non-critical on i.MX8MM
- Add LVDS bypass clocks on i.MX8QXP
- Add muxes for MIPI and PHY ref clocks on i.MX
- Reorder dc0_bypass0_clk, lcd_pxl and dc1_disp clocks on i.MX8QXP
- Add 1039.5MHz and 800MHz rates to fracn-gppll table on i.MX
- Add CLK_SET_RATE_PARENT for media_disp pixel clocks on i.MX8QXP
- Add some module descriptions to the i.MX generic and the
i.MXRT1050 driver
- Fix return value for bypass for composite i.MX7ULP
- Move Mediatek clk bindings to clock/
- Convert some more clk bindings to dt schema
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmbxswcRHHNib3lkQGtl
cm5lbC5vcmcACgkQrQKIl8bklSXjoQ/9GRwTJsRBHhFKZscwklDGHJiFOowsLnzC
q+fk0J2in+7rLezNv/5nkANOtm7eicYv5kkiY/OQArHB704neHkdVfXvSuaGMMM5
SXPLq7YtH/4haOWhs/HYfx551+cWGHv9orTVDJpF8GHQ5t37C1BX4KphLlUcgxFe
X0ZvbLdecp/VS4BiU+HM2zPM/SLU8V4xNmARUMZhur9QQ1P2n4YY8zGU87bWLaTB
u1wrwm9LMtq+A+LR6ViMRwLZKYXaR9o+rndbhCVURvYZEmrIB+x5iYS8RPJa2kvy
utsPOghOP0VRqZLT2VvLmKud7lk2Th1Uzng4xwcPxdDtpo6D5y+18VoA8tSHD2Zr
uwirN8pGbJm+7Ak9K9I4KcA9/9JgGRMsPBgCqdnvJxFgD1c7kT2/aJ5AEWmG8GBD
zUtqLzmSSnNfYBxXeWAqdrGNFzYZju53tl0ACI01W3lwUffPoJwnvHAdI4aiWMv1
WdzABSnieX7YcGJrnGzV7ZaIdGwUUyR9OQ5JEi+ajD+qCbnI+oXJgEa+tHI5/XLY
3As5WJlktmRkWzyacAPiGKsyYJYLNTy0TGwBw1CKQIrtIwjR/HF5THEr2qcy6cze
YiT7xAzhHcjUlMjjcDEe6Qg5R9ykvYSrFixRscWXbdehP1GpWJkqdgzc1+aBJWGW
QLLHSYHPkXo=
=XmiQ
-----END PGP SIGNATURE-----
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"The core clk framework is left largely untouched this time around
except for support for the newly ratified DT property
'assigned-clock-rates-u64'.
I'm much more excited about the support for loading DT overlays from
KUnit tests so that we can test how the clk framework parses DT nodes
during clk registration. The clk framework has some places that are
highly DeviceTree dependent so this charts the path to extend the
KUnit tests to cover even more framework code in the future. I've got
some more tests on the list that use the DT overlay support, but they
uncovered issues with clk unregistration that I'm still working on
fixing.
Outside the core, the clk driver update pile is dominated by Qualcomm
and Renesas SoCs, making it fairly usual. Looking closer, there are
fixes for things all over the place, like adding missing clk
frequencies or moving defines for the number of clks out of DT binding
headers into the drivers. There are even conversions of DT bindings to
YAML and migration away from strings to describe clk topology. Overall
it doesn't look unusual so I expect the new drivers to be where we'll
have fixes in the coming weeks.
Core:
- KUnit tests for clk registration and fixed rate basic clk type
- A couple more devm helpers, one consumer and one provider
- Support for assigned-clock-rates-u64
New Drivers:
- Camera, display and GPU clocks on Qualcomm SM4450
- Camera clocks on Qualcomm SM8150
- Rockchip rk3576 clks
- Microchip SAM9X7 clks
- Renesas RZ/V2H(P) (R9A09G057) clks
Updates:
- Mark a bunch of struct freq_tbl const to reduce .data usage
- Add Qualcomm MSM8226 A7PLL and Regera PLL support
- Fix the Qualcomm Lucid 5LPE PLL configuration sequence to not reuse
Trion, as they do differ
- A number of fixes to the Qualcomm SM8550 display clock driver
- Fold Qualcomm SM8650 display clock driver into SM8550 one
- Add missing clocks and GDSCs needed for audio on Qualcomm MSM8998
- Add missing USB MP resets, GPLL9, and QUPv3 DFS to Qualcomm SC8180X
- Fix sdcc clk frequency tables on Qualcomm SC8180X
- Drop the Qualcomm SM8150 gcc_cpuss_ahb_clk_src
- Mark Qualcomm PCIe GDSCs as RET_ON on sm8250 and sm8540 to avoid
them turning off during suspend
- Use the HW_CTRL mechanism on Qualcomm SM8550 video clock controller
GDSCs
- Get rid of CLK_NR_CLKS defines in Rockchip DT binding headers
- Some fixes for Rockchip rk3228 and rk3588
- Exynos850: Add clock for Thermal Management Unit
- Exynos7885: Fix duplicated ID in the header, add missing TOP PLLs
and add clocks for USB block in the FSYS clock controller
- ExynosAutov9: Add DPUM clock controller
- ExynosAutov920: Add new (first) clock controllers: TOP and PERIC0
(and a bit more complete bindings)
- Use clk_hw pointer instead of fw_name for acm_aud_clk[0-1]_sel
clocks on i.MX8Q as parents in ACM provider
- Add i.MX95 NETCMIX support to the block control provider
- Fix parents for ENETx_REF_SEL clocks on i.MX6UL
- Add USB clocks, resets and power domains on Renesas RZ/G3S
- Add Generic Timer (GTM), I2C Bus Interface (RIIC), SD/MMC Host
Interface (SDHI) and Watchdog Timer (WDT) clocks and resets on
Renesas RZ/V2H
- Add PCIe, PWM, and CAN-FD clocks on Renesas R-Car V4M
- Add LCD controller clocks and resets on Renesas RZ/G2UL
- Add DMA clocks and resets on Renesas RZ/G3S
- Add fractional multiplication PLL support on Renesas R-Car Gen4
- Document support for the Renesas RZ/G2M v3.0 (r8a774a3) SoC
- Support for the Microchip SAM9X7 SoC as follows:
- Updates for the Microchip PLL drivers
- DT binding documentation updates (for the new clock driver and for
the slow clock controller that SAM9X7 is using)
- A fix for the Microchip SAMA7G5 clock driver to avoid allocating
more memory than necessary
- Constify some Amlogic structs
- Add SM1 eARC clocks for Amlogic
- Introduce a symbol namespace for Amlogic clock specific symbols
- Add reset controller support to audiomix block control on i.MX
- Add CLK_SET_RATE_PARENT flag to all audiomix clocks and to i.MX7D
lcdif_pixel_src clock
- Fix parent clocks for earc_phy and audpll on i.MX8MP
- Fix default parents for enet[12]_ref_sel on i.MX6UL
- Add ops in composite 8M and 93 that allow no-op on disable
- Add check for PCC present bit on composite 7ULP register
- Fix fractional part for fracn-gppll on prepare in i.MX
- Fix clock tree update for TF-A managed clocks on i.MX8M
- Drop CLK_SET_PARENT_GATE for DRAM mux on i.MX7D
- Add the SAI7 IPG clock for i.MX8MN
- Mark the 'nand_usdhc_bus' clock as non-critical on i.MX8MM
- Add LVDS bypass clocks on i.MX8QXP
- Add muxes for MIPI and PHY ref clocks on i.MX
- Reorder dc0_bypass0_clk, lcd_pxl and dc1_disp clocks on i.MX8QXP
- Add 1039.5MHz and 800MHz rates to fracn-gppll table on i.MX
- Add CLK_SET_RATE_PARENT for media_disp pixel clocks on i.MX8QXP
- Add some module descriptions to the i.MX generic and the i.MXRT1050
driver
- Fix return value for bypass for composite i.MX7ULP
- Move Mediatek clk bindings to clock/
- Convert some more clk bindings to dt schema"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (180 commits)
clk: Switch back to struct platform_driver::remove()
dt-bindings: clock, reset: fix top-comment indentation rk3576 headers
clk: rockchip: remove unused mclk_pdm0_p/pdm0_p definitions
clk: provide devm_clk_get_optional_enabled_with_rate()
clk: fixed-rate: add devm_clk_hw_register_fixed_rate_parent_data()
clk: imx6ul: fix clock parent for IMX6UL_CLK_ENETx_REF_SEL
clk: renesas: r9a09g057: Add clock and reset entries for GTM/RIIC/SDHI/WDT
clk: renesas: rzv2h: Add support for dynamic switching divider clocks
clk: renesas: r9a08g045: Add clocks, resets and power domains for USB
clk: rockchip: fix error for unknown clocks
clk: rockchip: rk3588: drop unused code
clk: rockchip: Add clock controller for the RK3576
clk: rockchip: Add new pll type pll_rk3588_ddr
dt-bindings: clock, reset: Add support for rk3576
dt-bindings: clock: rockchip,rk3588-cru: drop unneeded assigned-clocks
clk: rockchip: rk3588: Fix 32k clock name for pmu_24m_32k_100m_src_p
clk: imx95: enable the clock of NETCMIX block control
dt-bindings: clock: add RMII clock selection
dt-bindings: clock: add i.MX95 NETCMIX block control
clk: imx: imx8: Use clk_hw pointer for self registered clock in clk_parent_data
...
rcu_pending, btree key cache rework: this solves lock contenting in the
key cache, eliminating the biggest source of the srcu lock hold time
warnings, and drastically improving performance on some metadata heavy
workloads - on multithreaded creates we're now 3-4x faster than xfs.
We're now using an rhashtable instead of the system inode hash table;
this is another significant performance improvement on multithreaded
metadata workloads, eliminating more lock contention.
for_each_btree_key_in_subvolume_upto(): new helper for iterating over
keys within a specific subvolume, eliminating a lot of open coded
"subvolume_get_snapshot()" and also fixing another source of srcu lock
time warnings, by running each loop iteration in its own transaction (as
the existing for_each_btree_key() does).
More work on btree_trans locking asserts; we now assert that we don't
hold btree node locks when trans->locked is false, which is important
because we don't use lockdep for tracking individual btree node locks.
Some cleanups and improvements in the bset.c btree node lookup code,
from Alan.
Rework of btree node pinning, which we use in backpointers fsck. The old
hacky implementation, where the shrinker just skipped over nodes in the
pinned range, was causing OOMs; instead we now use another shrinker with
a much higher seeks number for pinned nodes.
Rebalance now uses BCH_WRITE_ONLY_SPECIFIED_DEVS; this fixes an issue
where rebalance would sometimes fall back to allocating from the full
filesystem, which is not what we want when it's trying to move data to a
specific target.
Use __GFP_ACCOUNT, GFP_RECLAIMABLE for btree node, key cache
allocations.
Idmap mounts are now supported - Hongbo.
Rename whiteouts are now supported - Hongbo.
Erasure coding can now handle devices being marked as failed, or
forcibly removed. We still need the evacuate path for erasure coding,
but it's getting very close to ready for people to start using.
Status, and when will we be taking off experimental:
----------------------------------------------------
Going by critical, user facing bugs getting found and fixed, we're
nearly there. There are a couple key items that need to be finished
before we can take off the experimental label:
- The end-user experience is still pretty painful when the root
filesystem needs a fsck; we need some form of limited self healing so
that necessary repair gets run automatically. Errors (by type) are
recorded in the superblock, so what we need to do next is convert
remaining inconsistent() errors to fsck() errors (so that all runtime
inconsistencies are logged in the superblock), and we need to go
through the list of fsck errors and classify them by which fsck passes
are needed to repair them.
- We need comprehensive torture testing for all our repair paths, to
shake out remaining bugs there. Thomas has been working on the tooling
for this, so this is coming soonish.
Slightly less critical items:
- We need to improve the end-user experience for degraded mounts: right
now, a degraded root filesystem means dropping to an initramfs shell
or somehow inputting mount options manually (we don't want to allow
degraded mounts without some form of user input, except on unattended
servers) - we need the mount helper to prompt the user to allow
mounting degraded, and make sure this works with systemd.
- Scalabiity: we have users running 100TB+ filesystems, and that's
effectively the limit right now due to fsck times. We have some
reworks in the pipeline to address this, we're aiming to make petabyte
sized filesystems practical.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmbvHQoACgkQE6szbY3K
bnYfAw/+IXQ43/O+Jzs0MLD7pKZnrlbHiX9FqYLazD40vWvkyRTQOwgTn8pVNhq3
4YWmtuZyqh036YC+bGqYFOhz20YetS5UdgbClpwmc99JJ6xsY+Z1mdpYfz5oq1Dw
/pBX5iYb3rAt8UbQoZ8lcWM+GpT3GKJVgJuiLB2gRp9gATFesuh+0qU42oIVVVU5
4y3VhDBUmRk4XqEnk8hr7EIDMW0wWP3aptxYMZzeUPW0x1cEQ+FWrJo5D6lXv2KK
dKv3MogvA0FFNi/eNexclPiu2pXtI7vrxT7umsxAICHLt41rWpV5ttE6io3bC4ZN
qvwF9w2CpmKPKchFru9PO+QrWHVR7e6bphwf3TzyoKZ7tTn42f1RQlub7gBzI3bz
ai5ZwGRIvpUoPVBj+CO+Ipog81uUb23Ma+gXg1akEFBOAb+o7I3KOOSBh5l+0cHj
3Ov1n0TLcsoO2cqoqfsV2QubW9YcWEZ76g5mKwQnUn8Cs6Fp0wWaIyK9aNkIAxcr
tNDPGtH1gKitxUvju5i/LyI7y1UoeFvqJFee0VsU6QnixHn1ySzhePsJt6UEnIJT
Ia3C96Igqu2mV9FxhfGHj/qi7TGjqqkZHa8+B610cDpgf15cx7Ps2DYjkuQMFCqZ
Q3Q1o5De9roRq5xF2hLiYJCbzJKqd5ichFsBtLQuX572ICxbICg=
=oVCy
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs
Pull bcachefs updates from Kent Overstreet:
- rcu_pending, btree key cache rework: this solves lock contenting in
the key cache, eliminating the biggest source of the srcu lock hold
time warnings, and drastically improving performance on some metadata
heavy workloads - on multithreaded creates we're now 3-4x faster than
xfs.
- We're now using an rhashtable instead of the system inode hash table;
this is another significant performance improvement on multithreaded
metadata workloads, eliminating more lock contention.
- for_each_btree_key_in_subvolume_upto(): new helper for iterating over
keys within a specific subvolume, eliminating a lot of open coded
"subvolume_get_snapshot()" and also fixing another source of srcu
lock time warnings, by running each loop iteration in its own
transaction (as the existing for_each_btree_key() does).
- More work on btree_trans locking asserts; we now assert that we don't
hold btree node locks when trans->locked is false, which is important
because we don't use lockdep for tracking individual btree node
locks.
- Some cleanups and improvements in the bset.c btree node lookup code,
from Alan.
- Rework of btree node pinning, which we use in backpointers fsck. The
old hacky implementation, where the shrinker just skipped over nodes
in the pinned range, was causing OOMs; instead we now use another
shrinker with a much higher seeks number for pinned nodes.
- Rebalance now uses BCH_WRITE_ONLY_SPECIFIED_DEVS; this fixes an issue
where rebalance would sometimes fall back to allocating from the full
filesystem, which is not what we want when it's trying to move data
to a specific target.
- Use __GFP_ACCOUNT, GFP_RECLAIMABLE for btree node, key cache
allocations.
- Idmap mounts are now supported (Hongbo Li)
- Rename whiteouts are now supported (Hongbo Li)
- Erasure coding can now handle devices being marked as failed, or
forcibly removed. We still need the evacuate path for erasure coding,
but it's getting very close to ready for people to start using.
* tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs: (99 commits)
bcachefs: return err ptr instead of null in read sb clean
bcachefs: Remove duplicated include in backpointers.c
bcachefs: Don't drop devices with stripe pointers
bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices
bcachefs: bch_fs.rw_devs_change_count
bcachefs: bch2_dev_remove_stripes()
bcachefs: bch2_trigger_ptr() calculates sectors even when no device
bcachefs: improve error messages in bch2_ec_read_extent()
bcachefs: improve error message on too few devices for ec
bcachefs: improve bch2_new_stripe_to_text()
bcachefs: ec_stripe_head.nr_created
bcachefs: bch_stripe.disk_label
bcachefs: stripe_to_mem()
bcachefs: EIO errcode cleanup
bcachefs: Rework btree node pinning
bcachefs: split up btree cache counters for live, freeable
bcachefs: btree cache counters should be size_t
bcachefs: Don't count "skipped access bit" as touched in btree cache scan
bcachefs: Failed devices no longer require mounting in degraded mode
bcachefs: bch2_dev_rcu_noerror()
...
Merge user access fast validation using address masking.
This allows architectures to optionally use a data dependent address
masking model instead of a conditional branch for validating user
accesses. That avoids the Spectre-v1 speculation barriers.
Right now only x86-64 takes advantage of this, and not all architectures
will be able to do it. It requires a guard region between the user and
kernel address spaces (so that you can't overflow from one to the
other), and an easy way to generate a guaranteed-to-fault address for
invalid user pointers.
Also note that this currently assumes that there is no difference
between user read and write accesses. If extended to architectures like
powerpc, we'll also need to separate out the user read-vs-write cases.
* address-masking:
x86: make the masked_user_access_begin() macro use its argument only once
x86: do the user address masking outside the user access area
x86: support user address masking instead of non-speculative conditional
This is the initial pull request of sched_ext. The v7 patchset
(https://lkml.kernel.org/r/20240618212056.2833381-1-tj@kernel.org) is
applied on top of tip/sched/core + bpf/master as of Jun 18th.
tip/sched/core 793a62823d1c ("sched/core: Drop spinlocks on contention iff kernel is preempti
ble")
bpf/master f6afdaf72a ("Merge branch 'bpf-support-resilient-split-btf'")
Since then, the following pulls were made:
- v6.11-rc1 is pulled to keep up with the mainline.
- tip/sched/core was pulled several times:
- 7b9f6c864a, 0df340ceae, 5ac998574f, 0b1777f0fa: To resolve
conflicts. See each commit for details on conflicts and their
resolutions.
- d7b01aef9d: To receive fd03c5b858 ("sched: Rework pick_next_task()")
and related commits. @prev in added to sched_class->put_prev_task() and
put_prev_task() is reordered after ->pick_task(), which makes
sched_class->switch_class() unnecessary. The follow-up commits update
sched_ext accordingly and drop sched_class->switch_class().
- bpf/master was pulled to receive baebe9aaba ("bpf: allow passing struct
bpf_iter_<type> as kfunc arguments") and related changes in preparation
for the DSQ iterator patchset
To obtain the net sched_ext changes, diff against:
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git for-6.12-base
which is the merge of:
tip/sched/core bc9057da1a ("sched/cpufreq: Use NSEC_PER_MSEC for deadline task")
bpf/master 2ad6d23f46 ("selftests/bpf: Do not update vmlinux.h unnecessarily")
Since the v7 patchset, the following changes were made:
- cpuperf support which was a part of the v6 patchset was posted separately
and then applied after reviews.
- cgroup support which was a part of the v6 patchset was posted seprately,
iterated and then applied.
- Improve integration with sched core.
- Double locking usage in migration paths dropped. Depend on
TASK_ON_RQ_MIGRATING synchronization instead.
- The BPF scheduler couldn't directly dispatch to the local DSQ of another
CPU using a SCX_DSQ_LOCAL_ON verdict. This caused difficulties around
handling non-wakeup enqueues. Updated so that SCX_DSQ_LOCAL_ON can be used
in the enqueue path too.
- DSQ iterator which was a part of the v6 patchset was posted separately.
The iterator itself was applied after a couple revisions. The associated
selective consumption kfunc can use further improvements and is still
being worked on.
- scx_bpf_dispatch[_vtime]_from_dsq() added to increase flexibility. A task
can now be transferred between two DSQs from almost any context. This
involved significant refactoring of migration code.
- Various fixes and improvements.
As the branch is based on top of tip/sched/core + bpf/master, please merge
after both are applied.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZuOSuA4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGVZyAQDBU3WPkYKB8gl6a6YQ+/PzBXorOK7mioS9A2iJ
vBR3FgEAg1vtcss1S+2juWmVq7ItiFNWCqtXzUr/bVmL9CqqDwA=
=bOOC
-----END PGP SIGNATURE-----
Merge tag 'sched_ext-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext support from Tejun Heo:
"This implements a new scheduler class called ‘ext_sched_class’, or
sched_ext, which allows scheduling policies to be implemented as BPF
programs.
The goals of this are:
- Ease of experimentation and exploration: Enabling rapid iteration
of new scheduling policies.
- Customization: Building application-specific schedulers which
implement policies that are not applicable to general-purpose
schedulers.
- Rapid scheduler deployments: Non-disruptive swap outs of scheduling
policies in production environments"
See individual commits for more documentation, but also the cover letter
for the latest series:
Link: https://lore.kernel.org/all/20240618212056.2833381-1-tj@kernel.org/
* tag 'sched_ext-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (110 commits)
sched: Move update_other_load_avgs() to kernel/sched/pelt.c
sched_ext: Don't trigger ops.quiescent/runnable() on migrations
sched_ext: Synchronize bypass state changes with rq lock
scx_qmap: Implement highpri boosting
sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()
sched_ext: Compact struct bpf_iter_scx_dsq_kern
sched_ext: Replace consume_local_task() with move_local_task_to_local_dsq()
sched_ext: Move consume_local_task() upward
sched_ext: Move sanity check and dsq_mod_nr() into task_unlink_from_dsq()
sched_ext: Reorder args for consume_local/remote_task()
sched_ext: Restructure dispatch_to_local_dsq()
sched_ext: Fix processs_ddsp_deferred_locals() by unifying DTL_INVALID handling
sched_ext: Make find_dsq_for_dispatch() handle SCX_DSQ_LOCAL_ON
sched_ext: Refactor consume_remote_task()
sched_ext: Rename scx_kfunc_set_sleepable to unlocked and relocate
sched_ext: Add missing static to scx_dump_data
sched_ext: Add missing static to scx_has_op[]
sched_ext: Temporarily work around pick_task_scx() being called without balance_scx()
sched_ext: Add a cgroup scheduler which uses flattened hierarchy
sched_ext: Add cgroup support
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmbk/nIACgkQ6rmadz2v
bTqxuBAAnqW81Rr0nORIxeJMbyo4EiFuYHGk6u5BYP9NPzqHroUPCLVmSP7Hp/Ta
CJjsiZeivZsGa6Qlc3BCa4hHNpqP5WE1C/73svSDn7/99EfxdSBtirpMVFUPsUtn
DDb5chNpvnxKNS8Mw5Ty8wBrdbXHMlSx+IfaFHpv0Yn6EAcuF4UdoEUq2l3PqhfD
Il9Zm127eViPGAP+o+TBZFfW+rRw8d0ngqeRq2GvJ8ibNEDWss+GmBI1Dod7d+fC
dUDg96Ipdm1a5Xz7dnH80eXz9JHdpu6qhQrQMKKArnlpJElrKiOf9b17ZcJoPQOR
ZnstEnUyVnrWROZxUuKY72+2tx3TuSf+L9uZqFHNx3Ix5FIoS+tFbHf4b8SxtsOb
hb2X7SigdGqhQDxUT+IPeO5hsJlIvG1/VYxMXxgc++rh9DjL06hDLUSH1WBSU0fC
kFQ7HrcpAlVHtWmGbwwUyVjD+KC/qmZBTAnkcYT4C62WZVytSCnihIuSFAvV1tpZ
SSIhVPyQ599UoZIiQYihp0S4qP74FotCtErWSrThneh2Cl8kDsRq//lV1nj/PTV8
CpTvz4VCFDFTgthCfd62fP95EwW5K+aE3NjGTPW/9Hx/0+J/1tT+yqWsrToGaruf
TbrqtzQhpclz9UEqA+696cVAXNj9uRU4AoD3YIg72kVnRlkgYd0=
=MDwh
-----END PGP SIGNATURE-----
Merge tag 'bpf-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Introduce '__attribute__((bpf_fastcall))' for helpers and kfuncs with
corresponding support in LLVM.
It is similar to existing 'no_caller_saved_registers' attribute in
GCC/LLVM with a provision for backward compatibility. It allows
compilers generate more efficient BPF code assuming the verifier or
JITs will inline or partially inline a helper/kfunc with such
attribute. bpf_cast_to_kern_ctx, bpf_rdonly_cast,
bpf_get_smp_processor_id are the first set of such helpers.
- Harden and extend ELF build ID parsing logic.
When called from sleepable context the relevants parts of ELF file
will be read to find and fetch .note.gnu.build-id information. Also
harden the logic to avoid TOCTOU, overflow, out-of-bounds problems.
- Improvements and fixes for sched-ext:
- Allow passing BPF iterators as kfunc arguments
- Make the pointer returned from iter_next method trusted
- Fix x86 JIT convergence issue due to growing/shrinking conditional
jumps in variable length encoding
- BPF_LSM related:
- Introduce few VFS kfuncs and consolidate them in
fs/bpf_fs_kfuncs.c
- Enforce correct range of return values from certain LSM hooks
- Disallow attaching to other LSM hooks
- Prerequisite work for upcoming Qdisc in BPF:
- Allow kptrs in program provided structs
- Support for gen_epilogue in verifier_ops
- Important fixes:
- Fix uprobe multi pid filter check
- Fix bpf_strtol and bpf_strtoul helpers
- Track equal scalars history on per-instruction level
- Fix tailcall hierarchy on x86 and arm64
- Fix signed division overflow to prevent INT_MIN/-1 trap on x86
- Fix get kernel stack in BPF progs attached to tracepoint:syscall
- Selftests:
- Add uprobe bench/stress tool
- Generate file dependencies to drastically improve re-build time
- Match JIT-ed and BPF asm with __xlated/__jited keywords
- Convert older tests to test_progs framework
- Add support for RISC-V
- Few fixes when BPF programs are compiled with GCC-BPF backend
(support for GCC-BPF in BPF CI is ongoing in parallel)
- Add traffic monitor
- Enable cross compile and musl libc
* tag 'bpf-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (260 commits)
btf: require pahole 1.21+ for DEBUG_INFO_BTF with default DWARF version
btf: move pahole check in scripts/link-vmlinux.sh to lib/Kconfig.debug
btf: remove redundant CONFIG_BPF test in scripts/link-vmlinux.sh
bpf: Call the missed kfree() when there is no special field in btf
bpf: Call the missed btf_record_free() when map creation fails
selftests/bpf: Add a test case to write mtu result into .rodata
selftests/bpf: Add a test case to write strtol result into .rodata
selftests/bpf: Rename ARG_PTR_TO_LONG test description
selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test
bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error
bpf: Improve check_raw_mode_ok test for MEM_UNINIT-tagged types
bpf: Fix helper writes to read-only maps
bpf: Remove truncation test in bpf_strtol and bpf_strtoul helpers
bpf: Fix bpf_strtol and bpf_strtoul helpers for 32bit
selftests/bpf: Add tests for sdiv/smod overflow cases
bpf: Fix a sdiv overflow issue
libbpf: Add bpf_object__token_fd accessor
docs/bpf: Add missing BPF program types to docs
docs/bpf: Add constant values for linkages
bpf: Use fake pt_regs when doing bpf syscall tracepoint tracing
...
Quite a lot of nilfs2 work this time around.
Notable patch series in this pull request are:
"mul_u64_u64_div_u64: new implementation" by Nicolas Pitre, with
assistance from Uwe Kleine-König. Reimplement mul_u64_u64_div_u64() to
provide (much) more accurate results. The current implementation was
causing Uwe some issues in the PWM drivers.
"xz: Updates to license, filters, and compression options" from Lasse
Collin. Miscellaneous maintenance and kinor feature work to the xz
decompressor.
"Fix some GDB command error and add some GDB commands" from Kuan-Ying Lee.
Fixes and enhancements to the gdb scripts.
"treewide: add missing MODULE_DESCRIPTION() macros" from Jeff Johnson.
Adds lots of MODULE_DESCRIPTIONs, thus fixing lots of warnings about this.
"nilfs2: add support for some common ioctls" from Ryusuke Konishi. Adds
various commonly-available ioctls to nilfs2.
"This series fixes a number of formatting issues in kernel doc comments"
from Ryusuke Konishi does that.
"nilfs2: prevent unexpected ENOENT propagation" from Ryusuke Konishi. Fix
issues where -ENOENT was being unintentionally and inappropriately
returned to userspace.
"nilfs2: assorted cleanups" from Huang Xiaojia.
"nilfs2: fix potential issues with empty b-tree nodes" from Ryusuke
Konishi fixes some issues which can occur on corrupted nilfs2 filesystems.
"scripts/decode_stacktrace.sh: improve error reporting and usability" from
Luca Ceresoli does those things.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu7dpAAKCRDdBJ7gKXxA
jsPqAPwMDEZyKlfSw7QioEHNHDkmkbP7VYCYR0CbUnppbztwpAD8D37aVbWQ+UzM
3nnOq3W2Pc2o/20zqi8Upf1mnvUrygQ=
=/NWE
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-09-21-07-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Many singleton patches - please see the various changelogs for
details.
Quite a lot of nilfs2 work this time around.
Notable patch series in this pull request are:
- "mul_u64_u64_div_u64: new implementation" by Nicolas Pitre, with
assistance from Uwe Kleine-König. Reimplement mul_u64_u64_div_u64()
to provide (much) more accurate results. The current implementation
was causing Uwe some issues in the PWM drivers.
- "xz: Updates to license, filters, and compression options" from
Lasse Collin. Miscellaneous maintenance and kinor feature work to
the xz decompressor.
- "Fix some GDB command error and add some GDB commands" from
Kuan-Ying Lee. Fixes and enhancements to the gdb scripts.
- "treewide: add missing MODULE_DESCRIPTION() macros" from Jeff
Johnson. Adds lots of MODULE_DESCRIPTIONs, thus fixing lots of
warnings about this.
- "nilfs2: add support for some common ioctls" from Ryusuke Konishi.
Adds various commonly-available ioctls to nilfs2.
- "This series fixes a number of formatting issues in kernel doc
comments" from Ryusuke Konishi does that.
- "nilfs2: prevent unexpected ENOENT propagation" from Ryusuke
Konishi. Fix issues where -ENOENT was being unintentionally and
inappropriately returned to userspace.
- "nilfs2: assorted cleanups" from Huang Xiaojia.
- "nilfs2: fix potential issues with empty b-tree nodes" from Ryusuke
Konishi fixes some issues which can occur on corrupted nilfs2
filesystems.
- "scripts/decode_stacktrace.sh: improve error reporting and
usability" from Luca Ceresoli does those things"
* tag 'mm-nonmm-stable-2024-09-21-07-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (103 commits)
list: test: increase coverage of list_test_list_replace*()
list: test: fix tests for list_cut_position()
proc: use __auto_type more
treewide: correct the typo 'retun'
ocfs2: cleanup return value and mlog in ocfs2_global_read_info()
nilfs2: remove duplicate 'unlikely()' usage
nilfs2: fix potential oob read in nilfs_btree_check_delete()
nilfs2: determine empty node blocks as corrupted
nilfs2: fix potential null-ptr-deref in nilfs_btree_insert()
user_namespace: use kmemdup_array() instead of kmemdup() for multiple allocation
tools/mm: rm thp_swap_allocator_test when make clean
squashfs: fix percpu address space issues in decompressor_multi_percpu.c
lib: glob.c: added null check for character class
nilfs2: refactor nilfs_segctor_thread()
nilfs2: use kthread_create and kthread_stop for the log writer thread
nilfs2: remove sc_timer_task
nilfs2: do not repair reserved inode bitmap in nilfs_new_inode()
nilfs2: eliminate the shared counter and spinlock for i_generation
nilfs2: separate inode type information from i_state field
nilfs2: use the BITS_PER_LONG macro
...
this pull request are:
"Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
consistency to the APIs and behaviour of these two core allocation
functions. This also simplifies/enables Rustification.
"Some cleanups for shmem" from Baolin Wang. No functional changes - mode
code reuse, better function naming, logic simplifications.
"mm: some small page fault cleanups" from Josef Bacik. No functional
changes - code cleanups only.
"Various memory tiering fixes" from Zi Yan. A small fix and a little
cleanup.
"mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
simplifications and .text shrinkage.
"Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This
is a feature, it adds new feilds to /proc/vmstat such as
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0
which tells us that 11391 processes used 4k of stack while none at all
used 16k. Useful for some system tuning things, but partivularly useful
for "the dynamic kernel stack project".
"kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov.
Teaches kmemleak to detect leaksage of percpu memory.
"mm: memcg: page counters optimizations" from Roman Gushchin. "3
independent small optimizations of page counters".
"mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David
Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work
correctly by design rather than by accident.
"mm: remove arch_make_page_accessible()" from David Hildenbrand. Some
folio conversions which make arch_make_page_accessible() unneeded.
"mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel.
Cleans up and fixes our handling of the resetting of the cgroup/process
peak-memory-use detector.
"Make core VMA operations internal and testable" from Lorenzo Stoakes.
Rationalizaion and encapsulation of the VMA manipulation APIs. With a
view to better enable testing of the VMA functions, even from a
userspace-only harness.
"mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in
the zswap global shrinker, resulting in improved performance.
"mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in
some missing info in /proc/zoneinfo.
"mm: replace follow_page() by folio_walk" from David Hildenbrand. Code
cleanups and rationalizations (conversion to folio_walk()) resulting in
the removal of follow_page().
"improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some
tuning to improve zswap's dynamic shrinker. Significant reductions in
swapin and improvements in performance are shown.
"mm: Fix several issues with unaccepted memory" from Kirill Shutemov.
Improvements to the new unaccepted memory feature,
"mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX
PUDs. This was missing, although nobody seems to have notied yet.
"Introduce a store type enum for the Maple tree" from Sidhartha Kumar.
Cleanups and modest performance improvements for the maple tree library
code.
"memcg: further decouple v1 code from v2" from Shakeel Butt. Move more
cgroup v1 remnants away from the v2 memcg code.
"memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds
various warnings telling users that memcg v1 features are deprecated.
"mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li.
Greatly improves the success rate of the mTHP swap allocation.
"mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate
per-arch implementations of numa_memblk code into generic code.
"mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
improves the performance of munmap() of swap-filled ptes.
"support large folio swap-out and swap-in for shmem" from Baolin Wang.
With this series we no longer split shmem large folios into simgle-page
folios when swapping out shmem.
"mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance
improvements and code reductions for gigantic folios.
"support shmem mTHP collapse" from Baolin Wang. Adds support for
khugepaged's collapsing of shmem mTHP folios.
"mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
performance regression due to the addition of mseal().
"Increase the number of bits available in page_type" from Matthew Wilcox.
Increases the number of bits available in page_type!
"Simplify the page flags a little" from Matthew Wilcox. Many legacy page
flags are now folio flags, so the page-based flags and their
accessors/mutators can be removed.
"mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An
optimization which permits us to avoid writing/reading zero-filled zswap
pages to backing store.
"Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window
which occurs when a MAP_FIXED operqtion is occurring during an unrelated
vma tree walk.
"mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the
vma_merge() functionality, making ot cleaner, more testable and better
tested.
"misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor
fixups of DAMON selftests and kunit tests.
"mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code
cleanups and folio conversions.
"Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups
for shmem controls and stats.
"mm: count the number of anonymous THPs per size" from Barry Song. Expose
additional anon THP stats to userspace for improved tuning.
"mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio
conversions and removal of now-unused page-based APIs.
"replace per-quota region priorities histogram buffer with per-context
one" from SeongJae Park. DAMON histogram rationalization.
"Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae
Park. DAMON documentation updates.
"mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve
related doc and warn" from Jason Wang: fixes usage of page allocator
__GFP_NOFAIL and GFP_ATOMIC flags.
"mm: split underused THPs" from Yu Zhao. Improve THP=always policy - this
was overprovisioning THPs in sparsely accessed memory areas.
"zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add
support for zram run-time compression algorithm tuning.
"mm: Care about shadow stack guard gap when getting an unmapped area" from
Mark Brown. Fix up the various arch_get_unmapped_area() implementations
to better respect guard areas.
"Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of
mem_cgroup_iter() and various code cleanups.
"mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
pfnmap support.
"resource: Fix region_intersects() vs add_memory_driver_managed()" from
Huang Ying. Fix a bug in region_intersects() for systems with CXL memory.
"mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a
couple more code paths to correctly recover from the encountering of
poisoned memry.
"mm: enable large folios swap-in support" from Barry Song. Support the
swapin of mTHP memory into appropriately-sized folios, rather than into
single-page folios.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu1BBwAKCRDdBJ7gKXxA
jlWNAQDYlqQLun7bgsAN4sSvi27VUuWv1q70jlMXTfmjJAvQqwD/fBFVR6IOOiw7
AkDbKWP2k0hWPiNJBGwoqxdHHx09Xgo=
=s0T+
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Along with the usual shower of singleton patches, notable patch series
in this pull request are:
- "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
consistency to the APIs and behaviour of these two core allocation
functions. This also simplifies/enables Rustification.
- "Some cleanups for shmem" from Baolin Wang. No functional changes -
mode code reuse, better function naming, logic simplifications.
- "mm: some small page fault cleanups" from Josef Bacik. No
functional changes - code cleanups only.
- "Various memory tiering fixes" from Zi Yan. A small fix and a
little cleanup.
- "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
simplifications and .text shrinkage.
- "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
Butt. This is a feature, it adds new feilds to /proc/vmstat such as
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0
which tells us that 11391 processes used 4k of stack while none at
all used 16k. Useful for some system tuning things, but
partivularly useful for "the dynamic kernel stack project".
- "kmemleak: support for percpu memory leak detect" from Pavel
Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.
- "mm: memcg: page counters optimizations" from Roman Gushchin. "3
independent small optimizations of page counters".
- "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
David Hildenbrand. Improves PTE/PMD splitlock detection, makes
powerpc/8xx work correctly by design rather than by accident.
- "mm: remove arch_make_page_accessible()" from David Hildenbrand.
Some folio conversions which make arch_make_page_accessible()
unneeded.
- "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
Finkel. Cleans up and fixes our handling of the resetting of the
cgroup/process peak-memory-use detector.
- "Make core VMA operations internal and testable" from Lorenzo
Stoakes. Rationalizaion and encapsulation of the VMA manipulation
APIs. With a view to better enable testing of the VMA functions,
even from a userspace-only harness.
- "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
issues in the zswap global shrinker, resulting in improved
performance.
- "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
in some missing info in /proc/zoneinfo.
- "mm: replace follow_page() by folio_walk" from David Hildenbrand.
Code cleanups and rationalizations (conversion to folio_walk())
resulting in the removal of follow_page().
- "improving dynamic zswap shrinker protection scheme" from Nhat
Pham. Some tuning to improve zswap's dynamic shrinker. Significant
reductions in swapin and improvements in performance are shown.
- "mm: Fix several issues with unaccepted memory" from Kirill
Shutemov. Improvements to the new unaccepted memory feature,
- "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
DAX PUDs. This was missing, although nobody seems to have notied
yet.
- "Introduce a store type enum for the Maple tree" from Sidhartha
Kumar. Cleanups and modest performance improvements for the maple
tree library code.
- "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
more cgroup v1 remnants away from the v2 memcg code.
- "memcg: initiate deprecation of v1 features" from Shakeel Butt.
Adds various warnings telling users that memcg v1 features are
deprecated.
- "mm: swap: mTHP swap allocator base on swap cluster order" from
Chris Li. Greatly improves the success rate of the mTHP swap
allocation.
- "mm: introduce numa_memblks" from Mike Rapoport. Moves various
disparate per-arch implementations of numa_memblk code into generic
code.
- "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
improves the performance of munmap() of swap-filled ptes.
- "support large folio swap-out and swap-in for shmem" from Baolin
Wang. With this series we no longer split shmem large folios into
simgle-page folios when swapping out shmem.
- "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
performance improvements and code reductions for gigantic folios.
- "support shmem mTHP collapse" from Baolin Wang. Adds support for
khugepaged's collapsing of shmem mTHP folios.
- "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
performance regression due to the addition of mseal().
- "Increase the number of bits available in page_type" from Matthew
Wilcox. Increases the number of bits available in page_type!
- "Simplify the page flags a little" from Matthew Wilcox. Many legacy
page flags are now folio flags, so the page-based flags and their
accessors/mutators can be removed.
- "mm: store zero pages to be swapped out in a bitmap" from Usama
Arif. An optimization which permits us to avoid writing/reading
zero-filled zswap pages to backing store.
- "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
window which occurs when a MAP_FIXED operqtion is occurring during
an unrelated vma tree walk.
- "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
the vma_merge() functionality, making ot cleaner, more testable and
better tested.
- "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
Minor fixups of DAMON selftests and kunit tests.
- "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
Code cleanups and folio conversions.
- "Shmem mTHP controls and stats improvements" from Ryan Roberts.
Cleanups for shmem controls and stats.
- "mm: count the number of anonymous THPs per size" from Barry Song.
Expose additional anon THP stats to userspace for improved tuning.
- "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
folio conversions and removal of now-unused page-based APIs.
- "replace per-quota region priorities histogram buffer with
per-context one" from SeongJae Park. DAMON histogram
rationalization.
- "Docs/damon: update GitHub repo URLs and maintainer-profile" from
SeongJae Park. DAMON documentation updates.
- "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
improve related doc and warn" from Jason Wang: fixes usage of page
allocator __GFP_NOFAIL and GFP_ATOMIC flags.
- "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
This was overprovisioning THPs in sparsely accessed memory areas.
- "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
Add support for zram run-time compression algorithm tuning.
- "mm: Care about shadow stack guard gap when getting an unmapped
area" from Mark Brown. Fix up the various arch_get_unmapped_area()
implementations to better respect guard areas.
- "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
of mem_cgroup_iter() and various code cleanups.
- "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
pfnmap support.
- "resource: Fix region_intersects() vs add_memory_driver_managed()"
from Huang Ying. Fix a bug in region_intersects() for systems with
CXL memory.
- "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
a couple more code paths to correctly recover from the encountering
of poisoned memry.
- "mm: enable large folios swap-in support" from Barry Song. Support
the swapin of mTHP memory into appropriately-sized folios, rather
than into single-page folios"
* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
zram: free secondary algorithms names
uprobes: turn xol_area->pages[2] into xol_area->page
uprobes: introduce the global struct vm_special_mapping xol_mapping
Revert "uprobes: use vm_special_mapping close() functionality"
mm: support large folios swap-in for sync io devices
mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
set_memory: add __must_check to generic stubs
mm/vma: return the exact errno in vms_gather_munmap_vmas()
memcg: cleanup with !CONFIG_MEMCG_V1
mm/show_mem.c: report alloc tags in human readable units
mm: support poison recovery from copy_present_page()
mm: support poison recovery from do_cow_fault()
resource, kunit: add test case for region_intersects()
resource: make alloc_free_mem_region() works for iomem_resource
mm: z3fold: deprecate CONFIG_Z3FOLD
vfio/pci: implement huge_fault support
mm/arm64: support large pfn mappings
mm/x86: support large pfn mappings
...
When called from sbitmap_queue_get(), sbitmap_deferred_clear() may be run
with preempt disabled. In RT kernel, spin_lock() can sleep, then warning
of "BUG: sleeping function called from invalid context" can be triggered.
Fix it by replacing it with raw_spin_lock.
Cc: Yang Yang <yang.yang@vivo.com>
Fixes: 72d04bdcf3 ("sbitmap: fix io hung due to race on sbitmap_word::cleared")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Yang Yang <yang.yang@vivo.com>
Link: https://lore.kernel.org/r/20240919021709.511329-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Create file module.builtin.ranges that can be used to find where
built-in modules are located by their addresses. This will be useful for
tracing tools to find what functions are for various built-in modules.
The offset range data for builtin modules is generated using:
- modules.builtin: associates object files with module names
- vmlinux.map: provides load order of sections and offset of first member
per section
- vmlinux.o.map: provides offset of object file content per section
- .*.cmd: build cmd file with KBUILD_MODFILE
The generated data will look like:
.text 00000000-00000000 = _text
.text 0000baf0-0000cb10 amd_uncore
.text 0009bd10-0009c8e0 iosf_mbi
...
.text 00b9f080-00ba011a intel_skl_int3472_discrete
.text 00ba0120-00ba03c0 intel_skl_int3472_discrete intel_skl_int3472_tps68470
.text 00ba03c0-00ba08d6 intel_skl_int3472_tps68470
...
.data 00000000-00000000 = _sdata
.data 0000f020-0000f680 amd_uncore
For each ELF section, it lists the offset of the first symbol. This can
be used to determine the base address of the section at runtime.
Next, it lists (in strict ascending order) offset ranges in that section
that cover the symbols of one or more builtin modules. Multiple ranges
can apply to a single module, and ranges can be shared between modules.
The CONFIG_BUILTIN_MODULE_RANGES option controls whether offset range data
is generated for kernel modules that are built into the kernel image.
How it works:
1. The modules.builtin file is parsed to obtain a list of built-in
module names and their associated object names (the .ko file that
the module would be in if it were a loadable module, hereafter
referred to as <kmodfile>). This object name can be used to
identify objects in the kernel compile because any C or assembler
code that ends up into a built-in module will have the option
-DKBUILD_MODFILE=<kmodfile> present in its build command, and those
can be found in the .<obj>.cmd file in the kernel build tree.
If an object is part of multiple modules, they will all be listed
in the KBUILD_MODFILE option argument.
This allows us to conclusively determine whether an object in the
kernel build belong to any modules, and which.
2. The vmlinux.map is parsed next to determine the base address of each
top level section so that all addresses into the section can be
turned into offsets. This makes it possible to handle sections
getting loaded at different addresses at system boot.
We also determine an 'anchor' symbol at the beginning of each
section to make it possible to calculate the true base address of
a section at runtime (i.e. symbol address - symbol offset).
We collect start addresses of sections that are included in the top
level section. This is used when vmlinux is linked using vmlinux.o,
because in that case, we need to look at the vmlinux.o linker map to
know what object a symbol is found in.
And finally, we process each symbol that is listed in vmlinux.map
(or vmlinux.o.map) based on the following structure:
vmlinux linked from vmlinux.a:
vmlinux.map:
<top level section>
<included section> -- might be same as top level section)
<object> -- built-in association known
<symbol> -- belongs to module(s) object belongs to
...
vmlinux linked from vmlinux.o:
vmlinux.map:
<top level section>
<included section> -- might be same as top level section)
vmlinux.o -- need to use vmlinux.o.map
<symbol> -- ignored
...
vmlinux.o.map:
<section>
<object> -- built-in association known
<symbol> -- belongs to module(s) object belongs to
...
3. As sections, objects, and symbols are processed, offset ranges are
constructed in a straight-forward way:
- If the symbol belongs to one or more built-in modules:
- If we were working on the same module(s), extend the range
to include this object
- If we were working on another module(s), close that range,
and start the new one
- If the symbol does not belong to any built-in modules:
- If we were working on a module(s) range, close that range
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Sam James <sam@gentoo.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmboHyUACgkQSfxwEqXe
A66wGQ/8DRIjBllwf1YuTWi4T6OcfoYxK6C9bXO6QPP5gzdTyFE9pvDuuPyad6+F
FR086ydTHeodemz1dFiQCL9etcUaxo4+6FRKyXKF9/1ezGbTA5nJd0/fKJGlqbI2
EoA4LNYHOsvCZk1BTpxRNWKeKphU9zQgQdSigy6Rx8p269UkGmIZjD1PtUc+vqfR
Ox0dK/Cswyo236fRi5HzaoMntWI4vXgLfxty0e1R7tfbstkCxSKWAON1lo3uHgkA
0HpJXWgWXAPt9gp++Fs/jGNpOqbt6IaKeV5f7CjYfvWhlFjNMhQxF+PbxknaZn/k
K0gQsItOIoFTfbQdLDIdfnj9awMdLW8FB2A1WXHpNr9pVC4ickPb1bMTF/XRd0tm
wBNu4BL0gklx6017KZg5uINMIduzMLGkBLRFiBW0en/sZMLTJTMg58BJn0CL1Pmh
1ll/Q3ToSMHalvxU2OnJagTwh4fzzCEpK/hW9WiDO4jSCsMXyX0clinrCjNo1JfA
tqgTWEy3uGtg+dg0Du9VD5JASbNQSJ0ZRnas5+qz10IRWWfTolrsk61dliXLQ4Sv
tSryDtsE2znwJF1Krh4aHNSSVhD5/l/8QaXkf9aZc/kkaHxwsx83FuWnqw6nMz8c
l4B2MbH0jUgsEqEyx+0iwk+FXE9kZKWumTVLjFZ6bRnq3q+uq0U=
=mWCw
-----END PGP SIGNATURE-----
Merge tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"Originally I'd planned on sending each of the vDSO getrandom()
architecture ports to their respective arch trees. But as we started
to work on this, we found lots of interesting issues in the shared
code and infrastructure, the fixes for which the various archs needed
to base their work.
So in the end, this turned into a nice collaborative effort fixing up
issues and porting to 5 new architectures -- arm64, powerpc64,
powerpc32, s390x, and loongarch64 -- with everybody pitching in and
commenting on each other's code. It was a fun development cycle.
This contains:
- Numerous fixups to the vDSO selftest infrastructure, getting it
running successfully on more platforms, and fixing bugs in it.
- Additions to the vDSO getrandom & chacha selftests. Basically every
time manual review unearthed a bug in a revision of an arch patch,
or an ambiguity, the tests were augmented.
By the time the last arch was submitted for review, s390x, v1 of
the series was essentially fine right out of the gate.
- Fixes to the the generic C implementation of vDSO getrandom, to
build and run successfully on all archs, decoupling it from
assumptions we had (unintentionally) made on x86_64 that didn't
carry through to the other architectures.
- Port of vDSO getrandom to LoongArch64, from Xi Ruoyao and acked by
Huacai Chen.
- Port of vDSO getrandom to ARM64, from Adhemerval Zanella and acked
by Will Deacon.
- Port of vDSO getrandom to PowerPC, in both 32-bit and 64-bit
varieties, from Christophe Leroy and acked by Michael Ellerman.
- Port of vDSO getrandom to S390X from Heiko Carstens, the arch
maintainer.
While it'd be natural for there to be things to fix up over the course
of the development cycle, these patches got a decent amount of review
from a fairly diverse crew of folks on the mailing lists, and, for the
most part, they've been cooking in linux-next, which has been helpful
for ironing out build issues.
In terms of architectures, I think that mostly takes care of the
important 64-bit archs with hardware still being produced and running
production loads in settings where vDSO getrandom is likely to help.
Arguably there's still RISC-V left, and we'll see for 6.13 whether
they find it useful and submit a port"
* tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (47 commits)
selftests: vDSO: check cpu caps before running chacha test
s390/vdso: Wire up getrandom() vdso implementation
s390/vdso: Move vdso symbol handling to separate header file
s390/vdso: Allow alternatives in vdso code
s390/module: Provide find_section() helper
s390/facility: Let test_facility() generate static branch if possible
s390/alternatives: Remove ALT_FACILITY_EARLY
s390/facility: Disable compile time optimization for decompressor code
selftests: vDSO: fix vdso_config for s390
selftests: vDSO: fix ELF hash table entry size for s390x
powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO64
powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32
powerpc/vdso: Refactor CFLAGS for CVDSO build
powerpc/vdso32: Add crtsavres
mm: Define VM_DROPPABLE for powerpc/32
powerpc/vdso: Fix VDSO data access when running in a non-root time namespace
selftests: vDSO: don't include generated headers for chacha test
arm64: vDSO: Wire up getrandom() vDSO implementation
arm64: alternative: make alternative_has_cap_likely() VDSO compatible
selftests: vDSO: also test counter in vdso_test_chacha
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmbn5g0ACgkQu+CwddJF
iJq+Uwf/aqnLNEpjUBzwUUhSojCpPnTtiyjv+AILTxoSTHmbu8OvN0W79+Rpbdmk
O4QapAK+BCs+VL2VATwCCufcJ75Z78txO+buQE0DgwluFTIYZ+IwpUMPsK04ln6A
FD1/uvP1QFx60heqcp2c4zWFBUpg4DE6ufx2A5kieO268lFcWLxyVlcdgRU79ZCt
uAcV2yDLk3GvPGfxZwPKEmZUo/FmuSoBv0XgT+eWxmTu/R7hcpFse49OyjBH8Tvb
8d/RCIFgXOr8dTIjtds7eenwB/is4TkRlctezEQ0jO9/JwL/BVOgXZjD1qCtNWqz
is4TWK7VV+vdq1RD+0xC2hV/+uGEwQ==
=+WAm
-----END PGP SIGNATURE-----
Merge tag 'slab-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
"This time it's mostly refactoring and improving APIs for slab users in
the kernel, along with some debugging improvements.
- kmem_cache_create() refactoring (Christian Brauner)
Over the years have been growing new parameters to
kmem_cache_create() where most of them are needed only for a small
number of caches - most recently the rcu_freeptr_offset parameter.
To avoid adding new parameters to kmem_cache_create() and adjusting
all its callers, or creating new wrappers such as
kmem_cache_create_rcu(), we can now pass extra parameters using the
new struct kmem_cache_args. Not explicitly initialized fields
default to values interpreted as unused.
kmem_cache_create() is for now a wrapper that works both with the
new form: kmem_cache_create(name, object_size, args, flags) and the
legacy form: kmem_cache_create(name, object_size, align, flags,
ctor)
- kmem_cache_destroy() waits for kfree_rcu()'s in flight (Vlastimil
Babka, Uladislau Rezki)
Since SLOB removal, kfree() is allowed for freeing objects
allocated by kmem_cache_create(). By extension kfree_rcu() as
allowed as well, which can allow converting simple call_rcu()
callbacks that only do kmem_cache_free(), as there was never a
kmem_cache_free_rcu() variant. However, for caches that can be
destroyed e.g. on module removal, the cache owners knew to issue
rcu_barrier() first to wait for the pending call_rcu()'s, and this
is not sufficient for pending kfree_rcu()'s due to its internal
batching optimizations. Ulad has provided a new
kvfree_rcu_barrier() and to make the usage less error-prone,
kmem_cache_destroy() calls it. Additionally, destroying
SLAB_TYPESAFE_BY_RCU caches now again issues rcu_barrier()
synchronously instead of using an async work, because the past
motivation for async work no longer applies. Users of custom
call_rcu() callbacks should however keep calling rcu_barrier()
before cache destruction.
- Debugging use-after-free in SLAB_TYPESAFE_BY_RCU caches (Jann Horn)
Currently, KASAN cannot catch UAFs in such caches as it is legal to
access them within a grace period, and we only track the grace
period when trying to free the underlying slab page. The new
CONFIG_SLUB_RCU_DEBUG option changes the freeing of individual
object to be RCU-delayed, after which KASAN can poison them.
- Delayed memcg charging (Shakeel Butt)
In some cases, the memcg is uknown at allocation time, such as
receiving network packets in softirq context. With
kmem_cache_charge() these may be now charged later when the user
and its memcg is known.
- Misc fixes and improvements (Pedro Falcato, Axel Rasmussen,
Christoph Lameter, Yan Zhen, Peng Fan, Xavier)"
* tag 'slab-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (34 commits)
mm, slab: restore kerneldoc for kmem_cache_create()
io_uring: port to struct kmem_cache_args
slab: make __kmem_cache_create() static inline
slab: make kmem_cache_create_usercopy() static inline
slab: remove kmem_cache_create_rcu()
file: port to struct kmem_cache_args
slab: create kmem_cache_create() compatibility layer
slab: port KMEM_CACHE_USERCOPY() to struct kmem_cache_args
slab: port KMEM_CACHE() to struct kmem_cache_args
slab: remove rcu_freeptr_offset from struct kmem_cache
slab: pass struct kmem_cache_args to do_kmem_cache_create()
slab: pull kmem_cache_open() into do_kmem_cache_create()
slab: pass struct kmem_cache_args to create_cache()
slab: port kmem_cache_create_usercopy() to struct kmem_cache_args
slab: port kmem_cache_create_rcu() to struct kmem_cache_args
slab: port kmem_cache_create() to struct kmem_cache_args
slab: add struct kmem_cache_args
slab: s/__kmem_cache_create/do_kmem_cache_create/g
memcg: add charging of already allocated slab objects
mm/slab: Optimize the code logic in find_mergeable()
...
This pull request contains the following branches:
context_tracking.15.08.24a: Rename context tracking state related
symbols and remove references to "dynticks" in various context
tracking state variables and related helpers; force
context_tracking_enabled_this_cpu() to be inlined to avoid
leaving a noinstr section.
csd.lock.15.08.24a: Enhance CSD-lock diagnostic reports; add an API
to provide an indication of ongoing CSD-lock stall.
nocb.09.09.24a: Update and simplify RCU nocb code to handle
(de-)offloading of callbacks only for offline CPUs; fix RT
throttling hrtimer being armed from offline CPU.
rcutorture.14.08.24a: Remove redundant rcu_torture_ops get_gp_completed
fields; add SRCU ->same_gp_state and ->get_comp_state
functions; add generic test for NUM_ACTIVE_*RCU_POLL* for
testing RCU and SRCU polled grace periods; add CFcommon.arch
for arch-specific Kconfig options; print number of update types
in rcu_torture_write_types();
add rcutree.nohz_full_patience_delay testing to the TREE07
scenario; add a stall_cpu_repeat module parameter to test
repeated CPU stalls; add argument to limit number of CPUs a
guest OS can use in torture.sh;
rcustall.09.09.24a: Abbreviate RCU CPU stall warnings during CSD-lock
stalls; Allow dump_cpu_task() to be called without disabling
preemption; defer printing stall-warning backtrace when holding
rcu_node lock.
srcu.12.08.24a: Make SRCU gp seq wrap-around faster; add KCSAN checks
for concurrent updates to ->srcu_n_exp_nodelay and
->reschedule_count which are used in heuristics governing
auto-expediting of normal SRCU grace periods and
grace-period-state-machine delays; mark idle SRCU-barrier
callbacks to help identify stuck SRCU-barrier callback.
rcu.tasks.14.08.24a: Remove RCU Tasks Rude asynchronous APIs as they
are no longer used; stop testing RCU Tasks Rude asynchronous
APIs; fix access to non-existent percpu regions; check
processor-ID assumptions during chosen CPU calculation for
callback enqueuing; update description of rtp->tasks_gp_seq
grace-period sequence number; add rcu_barrier_cb_is_done()
to identify whether a given rcu_barrier callback is stuck;
mark idle Tasks-RCU-barrier callbacks; add
*torture_stats_print() functions to print detailed
diagnostics for Tasks-RCU variants; capture start time of
rcu_barrier_tasks*() operation to help distinguish a hung
barrier operation from a long series of barrier operations.
rcu_scaling_tests.15.08.24a:
refscale: Add a TINY scenario to support tests of Tiny RCU
and Tiny SRCU; Optimize process_durations() operation;
rcuscale: Dump stacks of stalled rcu_scale_writer() instances;
dump grace-period statistics when rcu_scale_writer() stalls;
mark idle RCU-barrier callbacks to identify stuck RCU-barrier
callbacks; print detailed grace-period and barrier diagnostics
on rcu_scale_writer() hangs for Tasks-RCU variants; warn if
async module parameter is specified for RCU implementations
that do not have async primitives such as RCU Tasks Rude;
make all writer tasks report upon hang; tolerate repeated
GFP_KERNEL failure in rcu_scale_writer(); use special allocator
for rcu_scale_writer(); NULL out top-level pointers to heap
memory to avoid double-free bugs on modprobe failures; maintain
per-task instead of per-CPU callbacks count to avoid any issues
with migration of either tasks or callbacks; constify struct
ref_scale_ops.
fixes.12.08.24a: Use system_unbound_wq for kfree_rcu work to avoid
disturbing isolated CPUs.
misc.11.08.24a: Warn on unexpected rcu_state.srs_done_tail state;
Better define "atomic" for list_replace_rcu() and
hlist_replace_rcu() routines; annotate struct
kvfree_rcu_bulk_data with __counted_by().
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSi2tPIQIc2VEtjarIAHS7/6Z0wpQUCZt8+8wAKCRAAHS7/6Z0w
pTqoAPwPN//tlEoJx2PRs6t0q+nD1YNvnZawPaRmdzgdM8zJogD+PiSN+XhqRr80
jzyvMDU4Aa0wjUNP3XsCoaCxo7L/lQk=
=bZ9z
-----END PGP SIGNATURE-----
Merge tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Neeraj Upadhyay:
"Context tracking:
- rename context tracking state related symbols and remove references
to "dynticks" in various context tracking state variables and
related helpers
- force context_tracking_enabled_this_cpu() to be inlined to avoid
leaving a noinstr section
CSD lock:
- enhance CSD-lock diagnostic reports
- add an API to provide an indication of ongoing CSD-lock stall
nocb:
- update and simplify RCU nocb code to handle (de-)offloading of
callbacks only for offline CPUs
- fix RT throttling hrtimer being armed from offline CPU
rcutorture:
- remove redundant rcu_torture_ops get_gp_completed fields
- add SRCU ->same_gp_state and ->get_comp_state functions
- add generic test for NUM_ACTIVE_*RCU_POLL* for testing RCU and SRCU
polled grace periods
- add CFcommon.arch for arch-specific Kconfig options
- print number of update types in rcu_torture_write_types()
- add rcutree.nohz_full_patience_delay testing to the TREE07 scenario
- add a stall_cpu_repeat module parameter to test repeated CPU stalls
- add argument to limit number of CPUs a guest OS can use in
torture.sh
rcustall:
- abbreviate RCU CPU stall warnings during CSD-lock stalls
- Allow dump_cpu_task() to be called without disabling preemption
- defer printing stall-warning backtrace when holding rcu_node lock
srcu:
- make SRCU gp seq wrap-around faster
- add KCSAN checks for concurrent updates to ->srcu_n_exp_nodelay and
->reschedule_count which are used in heuristics governing
auto-expediting of normal SRCU grace periods and
grace-period-state-machine delays
- mark idle SRCU-barrier callbacks to help identify stuck
SRCU-barrier callback
rcu tasks:
- remove RCU Tasks Rude asynchronous APIs as they are no longer used
- stop testing RCU Tasks Rude asynchronous APIs
- fix access to non-existent percpu regions
- check processor-ID assumptions during chosen CPU calculation for
callback enqueuing
- update description of rtp->tasks_gp_seq grace-period sequence
number
- add rcu_barrier_cb_is_done() to identify whether a given
rcu_barrier callback is stuck
- mark idle Tasks-RCU-barrier callbacks
- add *torture_stats_print() functions to print detailed diagnostics
for Tasks-RCU variants
- capture start time of rcu_barrier_tasks*() operation to help
distinguish a hung barrier operation from a long series of barrier
operations
refscale:
- add a TINY scenario to support tests of Tiny RCU and Tiny
SRCU
- optimize process_durations() operation
rcuscale:
- dump stacks of stalled rcu_scale_writer() instances and
grace-period statistics when rcu_scale_writer() stalls
- mark idle RCU-barrier callbacks to identify stuck RCU-barrier
callbacks
- print detailed grace-period and barrier diagnostics on
rcu_scale_writer() hangs for Tasks-RCU variants
- warn if async module parameter is specified for RCU implementations
that do not have async primitives such as RCU Tasks Rude
- make all writer tasks report upon hang
- tolerate repeated GFP_KERNEL failure in rcu_scale_writer()
- use special allocator for rcu_scale_writer()
- NULL out top-level pointers to heap memory to avoid double-free
bugs on modprobe failures
- maintain per-task instead of per-CPU callbacks count to avoid any
issues with migration of either tasks or callbacks
- constify struct ref_scale_ops
Fixes:
- use system_unbound_wq for kfree_rcu work to avoid disturbing
isolated CPUs
Misc:
- warn on unexpected rcu_state.srs_done_tail state
- better define "atomic" for list_replace_rcu() and
hlist_replace_rcu() routines
- annotate struct kvfree_rcu_bulk_data with __counted_by()"
* tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (90 commits)
rcu: Defer printing stall-warning backtrace when holding rcu_node lock
rcu/nocb: Remove superfluous memory barrier after bypass enqueue
rcu/nocb: Conditionally wake up rcuo if not already waiting on GP
rcu/nocb: Fix RT throttling hrtimer armed from offline CPU
rcu/nocb: Simplify (de-)offloading state machine
context_tracking: Tag context_tracking_enabled_this_cpu() __always_inline
context_tracking, rcu: Rename rcu_dyntick trace event into rcu_watching
rcu: Update stray documentation references to rcu_dynticks_eqs_{enter, exit}()
rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()
rcu: Rename rcu_implicit_dynticks_qs() into rcu_watching_snap_recheck()
rcu: Rename dyntick_save_progress_counter() into rcu_watching_snap_save()
rcu: Rename struct rcu_data .exp_dynticks_snap into .exp_watching_snap
rcu: Rename struct rcu_data .dynticks_snap into .watching_snap
rcu: Rename rcu_dynticks_zero_in_eqs() into rcu_watching_zero_in_eqs()
rcu: Rename rcu_dynticks_in_eqs_since() into rcu_watching_snap_stopped_since()
rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs()
rcu: Rename rcu_dynticks_eqs_online() into rcu_watching_online()
context_tracking, rcu: Rename rcu_dynticks_curr_cpu_in_eqs() into rcu_is_watching_curr_cpu()
context_tracking, rcu: Rename rcu_dynticks_task*() into rcu_task*()
refscale: Constify struct ref_scale_ops
...
- cpuset isolation improvements.
- cpuset cgroup1 support is split into its own file behind the new config
option CONFIG_CPUSET_V1. This makes it the second controller which makes
cgroup1 support optional after memcg.
- Handling of unavailable v1 controller handling improved during cgroup1
mount operations.
- union_find applied to cpuset. It makes code simpler and more efficient.
- Reduce spurious events in pids.events.
- Cleanups and other misc changes.
- Contains a merge of cgroup/for-6.11-fixes to receive cpuset fixes that
further changes build upon.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZuNU3Q4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGdMsAP9yqPxu//LiJ3lPWhKcVVKtdwrA3AYDLE81VSJO
5VZJhAD+Ic+Ly/jZjDtjjQpZ1U3JsBpBRcVBqzeH0gD7eXaJgwk=
=h/+c
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cpuset isolation improvements
- cpuset cgroup1 support is split into its own file behind the new
config option CONFIG_CPUSET_V1. This makes it the second controller
which makes cgroup1 support optional after memcg
- Handling of unavailable v1 controller handling improved during
cgroup1 mount operations
- union_find applied to cpuset. It makes code simpler and more
efficient
- Reduce spurious events in pids.events
- Cleanups and other misc changes
- Contains a merge of cgroup/for-6.11-fixes to receive cpuset fixes
that further changes build upon
* tag 'cgroup-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (34 commits)
cgroup: Do not report unavailable v1 controllers in /proc/cgroups
cgroup: Disallow mounting v1 hierarchies without controller implementation
cgroup/cpuset: Expose cpuset filesystem with cpuset v1 only
cgroup/cpuset: Move cpu.h include to cpuset-internal.h
cgroup/cpuset: add sefltest for cpuset v1
cgroup/cpuset: guard cpuset-v1 code under CONFIG_CPUSETS_V1
cgroup/cpuset: rename functions shared between v1 and v2
cgroup/cpuset: move v1 interfaces to cpuset-v1.c
cgroup/cpuset: move validate_change_legacy to cpuset-v1.c
cgroup/cpuset: move legacy hotplug update to cpuset-v1.c
cgroup/cpuset: add callback_lock helper
cgroup/cpuset: move memory_spread to cpuset-v1.c
cgroup/cpuset: move relax_domain_level to cpuset-v1.c
cgroup/cpuset: move memory_pressure to cpuset-v1.c
cgroup/cpuset: move common code to cpuset-internal.h
cgroup/cpuset: introduce cpuset-v1.c
selftest/cgroup: Make test_cpuset_prs.sh deal with pre-isolated CPUs
cgroup/cpuset: Account for boot time isolated CPUs
cgroup/cpuset: remove use_parent_ecpus of cpuset
cgroup/cpuset: remove fetch_xcpus
...
This kunit update for Linux 6.12-rc1 consists of:
-- a new int_pow test suite
-- documentation update to clarify filename best practices
-- kernel-doc fix for EXPORT_SYMBOL_IF_KUNIT
-- change to build compile_commands.json automatically instead
of requiring a manual build.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmbo3WEACgkQCwJExA0N
Qxz1WxAAj+772NHxsJ4JnPqr/74doKnzKc1jM2V4g/F9Y+BT0tSKs1Cu5CyN9VsT
wvxVPWqYltyhumVm/H6SaUGb0yZ7CzJi/5FuT3p3QFUDidMSu1h9KnlLi79q3cDI
VuFKE8K4DDP0GfyFMpbSPZOGfYQp24FybhxRxreY+7q6uRVAnPh33Q1/Bonv6K6q
5329a0z9wWySgisa93ABmQNpF4UJSYunR2bsdUzZqHgyrTXSyK66fcmVKwbBUaIT
o16P1LBjDcIbfwswFb+xUmWD1IPGk7ulirEq8n69tErI6zKbkv1rojXHsoXuvOEN
a4i+sNyR+a7NVI1h/T8F25pSbegkL0XQs7cmehATqpInmEZNDeGR8PkaGZNXXrFy
kG/z7LlWh8zQUBrTsqOLU/iz4sRVrsPCuLIUzo8MiKpAskmj/7fqw5Cab9jmL5V3
6OLAfCQDrfcH7fM9V5U6Ury2dkcovFuw+ZhFcBuLnspB5z0Cj7Yqz6aDZdJ97qyR
PfZuyBU2ouykhpJ4P/sRJC3Gq1t0b+PoDq3qNdCqz4ETld1jaiDz0e75ypquJWyB
QdVMNJF6W7Nwnmpzp4GY9QZ6dtwOKGZyuvW5J0eleWKiD4gjHZaoupIzqT24fgYi
vdscbcOxMMU3/b9F4qDlgsLSPCLVF4HIXTAK2UdiznLdaxYVHQ0=
=rmqh
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-kunit-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- a new int_pow test suite
- documentation update to clarify filename best practices
- kernel-doc fix for EXPORT_SYMBOL_IF_KUNIT
- change to build compile_commands.json automatically instead of
requiring a manual build
* tag 'linux_kselftest-kunit-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
lib/math: Add int_pow test suite
kunit: tool: Build compile_commands.json
kunit: Fix kernel-doc for EXPORT_SYMBOL_IF_KUNIT
Documentation: KUnit: Update filename best practices
metadata encodeded into UD1.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbpM6ITHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoU/kEACWS7Z9mQrWB3r22ufTTPoN+hNudth+
CP8wluXZGvLPh1Pq9dpB9ZniBUN8levYoGyj3NTdr6VtoMJ6NYcZVuH98lCCEMXO
1UmDpydSGZ3BqVgmf4h0eYAJgEiA5qTflXMsh6SfsaPQR7jniJTE451hgJdRIogG
DvgWeVTYn5vt0+oRHJp6ogRLR9oOUgdp94fIwaW34OpesbVJeWUW9zAvBcqdNrDT
KJIM7ta6eivEakFRxriQZTKRc+3ElvZ2fdWNdo9qrRd64MTIOTXAj3G0lXt3YtpZ
06pfJ1CfQ+nwHKfxmmy4gz4eJG7KcpMM+KFZTR3NoSAz4oMTzAvVTxAuEt+pahx6
bmLzaY/I/gRB/Rt+e5oEZSEIq+Sh/Lm3IZoQUhK0+HeJBjwPghBZw3BjkFJvEsMw
S0arvklH2x37gP9rnzOODf2QG7aIAqLTrvRJS610fctwadR4k+2UIE8ZGHOTt55J
UdiK/QhU4gMVaRTebTcPquu3IMmnJjla/bEWdIrBtOSiGtVd1BnAp/kvmkdQH3eI
ZUqJbnfofN4rzSufFqSVY88ORVIcQMnNDLM0qyJofIC79u7OiU40icoDxWS6mDHQ
wQSEszInhwNzyAxoHnNkXDunjDVKhATQPOde0F4TxLcrYD9KRpvJag/1j5fCQi+0
ftODZflfGS2UjQ==
=Z5Hg
-----END PGP SIGNATURE-----
Merge tag 'x86-core-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core update from Thomas Gleixner:
"Enable UBSAN traps for x86, which provides better reporting through
metadata encodeded into UD1"
* tag 'x86-core-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/traps: Enable UBSAN traps on x86
- Prevent spurious KCOV coverage in common_interrupt()
- Fixup the KCOV Makefile directive which got stale due to a source file
rename
- Exclude stack unwinding from KCOV as it creates large amounts of
uninteresting coverage
- Provide a self test to validate that KCOV coverage of the interrupt
handling code starts not before preempt count got updated.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbpMeITHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoaOeD/4oO3g0soK0LIcDIwzaG0ap0hx0nucw
aVSAESuY+ZaSbRbV0fNoYdHORvLdErs67SeyeJRSxTzSNqGH2dGoFrfbkRSXq951
RdCSPP60T7xgqAme1YLDiChfXt/gkbWk/8V5Q7sG3oq3GaVcPUyZgPo4M4HQMdfg
Mla3VPikW5Np3fvs0IZYWQ5VdY0fFOHY5JGMhKJznJxf+Ud+VAtxsbJUcO4MEYWW
A9CVJNHGEXssGA6vm5kgtLu6n2QFuoSj6En/WqLEaJb8f/V332e04Xj2ZHUaOOjV
2abVeDovv+dwUYb4SgrGVg9gfEwwcLPDnmOuuQJmQBB5kU4mJsCqI5TTS6c1fgU4
x8tQsGSOKHFQAI14ZWtitrL4rS2uFcBkAFXo0dF8J5o4989RA8cpfeWVSVUb/UXd
u38BWpc9iHiihHKMmMQgsa1bUMwdSUTvN5XFHkeP4oqUdMiEiWn8iM5+zXd/lfTs
9mrTv+kcLA7mjFOmn4JyE2b+NuiPdgS2FCBGLycHvGwvJoJlO2UmSpF89AJ5vdKs
F8vWLkV+gno/HtwS5o949cAwjYiCodfc7u1W0xj2VDAbx0RbaBw1SDhXMQcLxLgn
BTt4yHKKIeLX++WH3fpeyL91+UJWubUzNzY4rAmLkz5DedWAkpES+45fatp1buIz
Lp/hGiIsG9p5xw==
=tiXT
-----END PGP SIGNATURE-----
Merge tag 'x86-build-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Thomas Gleixner:
"Updates for KCOV instrumentation on x86:
- Prevent spurious KCOV coverage in common_interrupt()
- Fixup the KCOV Makefile directive which got stale due to a source
file rename
- Exclude stack unwinding from KCOV as it creates large amounts of
uninteresting coverage
- Provide a self test to validate that KCOV coverage of the interrupt
handling code starts not before preempt count got updated"
* tag 'x86-build-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Ignore stack unwinding in KCOV
module: Fix KCOV-ignored file name
kcov: Add interrupt handling self test
x86/entry: Remove unwanted instrumentation in common_interrupt()
Increase the test coverage of list_test_list_replace*() by adding the
checks to compare the pointer of "a_new.next" and "a_new.prev" to make
sure a perfect circular doubly linked list is formed after the
replacement.
Link: https://lkml.kernel.org/r/20240910040818.65723-1-richard120310@gmail.com
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix test for list_cut_position*() for the missing check of integer "i"
after the second loop. The variable should be checked for second time to
make sure both lists after the cut operation are formed as expected.
Link: https://lkml.kernel.org/r/20240910043531.71343-1-richard120310@gmail.com
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "resource: Fix region_intersects() vs
add_memory_driver_managed()", v3.
The patchset fixes a bug of region_intersects() for systems with CXL
memory. The details of the bug can be found in [1/3]. To avoid similar
bugs in the future. A kunit test case for region_intersects() is added in
[3/3]. [2/3] is a preparation patch for [3/3].
This patch (of 3):
region_intersects() is important because it's used for /dev/mem permission
checking. To avoid possible bug of region_intersects() in the future, a
kunit test case for region_intersects() is added.
Link: https://lkml.kernel.org/r/20240906030713.204292-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20240906030713.204292-4-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- Use the threshold to check for the pool refill condition and not the
run time recorded all time low fill value, which is lower than the
threshold and therefore causes refills to be delayed.
- KCSAN annotation updates and simplification of the fill_pool() code.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbn480THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoVB1D/0UE1n86SLFrR7plXudttXJnbyJ/OjK
uOjLSHx66TyMkN1z6xF6K4bZTyQRpIUifPLz4evyd9CdDvITvnrvkboby/15rsGW
8sEBqAFVMkENkPzDA1Qmn3fxJs9XvHoER7WcMjaEl9yQbSi4gjO5Y+B0BNp4XKHZ
P1YSmRJqUBX5F0BvmeeDlHCCpyUxeRGiyzxZ/WSl70e6RSGis10R+B/aqsMxf3Zz
6WboQJqMxnDT3ICtDxTicH9VJ6Lh9iJxppeLVxAtZ+acfhcRmpwKFmsfJJOVy1eg
zkJuDh3ieb8hH7vr6bqzMEoP8qclUY7JgcJCK0dIwcASIvr7ZFVLCDLDx6Ta9UrG
D+L7sjGs+h/wz7NOoKTaGJS0XHwijVtLhc5/O64p1POUiQVTfjCVW6E3RAs3IGBI
uXTxuVzpK7XXvbg7+iEwYVcE5fp5vctnlLyepkbXvei9r/ccgIndj3rVGZz1qyOc
41LVhTx1Uu9MSqnsWTGbr+kzIze/g1rj8OlSH+692nbLL0mxWsOuojljvDgILC1Q
rcvZLJrf8e4FDFyGZiX8kG3eHbyYQPdf3fqUCI7B05n0o7utXLf4Mgw+/LdIvpKY
JTx4/lhwZ4TXFMvf+LiW/rhRlP72QsVkljjsyJTHI6a5LukdNL9dNXKTqSCypcjm
hAsMzee52FiZoQ==
=B0II
-----END PGP SIGNATURE-----
Merge tag 'core-debugobjects-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects updates from Thomas Gleixner:
- Use the threshold to check for the pool refill condition and not the
run time recorded all time low fill value, which is lower than the
threshold and therefore causes refills to be delayed.
- KCSAN annotation updates and simplification of the fill_pool() code.
* tag 'core-debugobjects-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Remove redundant checks in fill_pool()
debugobjects: Fix conditions in fill_pool()
debugobjects: Fix the compilation attributes of some global variables
- Core:
- Overhaul of posix-timers in preparation of removing the
workaround for periodic timers which have signal delivery
ignored.
- Remove the historical extra jiffie in msleep()
msleep() adds an extra jiffie to the timeout value to ensure
minimal sleep time. The timer wheel ensures minimal sleep
time since the large rewrite to a non-cascading wheel, but the
extra jiffie in msleep() remained unnoticed. Remove it.
- Make the timer slack handling correct for realtime tasks.
The procfs interface is inconsistent and does neither reflect
reality nor conforms to the man page. Show the correct 0 slack
for real time tasks and enforce it at the core level instead of
having inconsistent individual checks in various timer setup
functions.
- The usual set of updates and enhancements all over the place.
- Drivers:
- Allow the ACPI PM timer to be turned off during suspend
- No new drivers
- The usual updates and enhancements in various drivers
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbn7jQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYobqnD/9COlU0nwsulABI/aNIrsh6iYvnCC9v
14CcNta7Qn+157Wfw9BWOyHdNhR1/fPCXE8jJ71zTyIOeW27HV2JyTtxTwe9ZcdK
ViHAaj7YcIjcVUEC3StCoRCPnvLslEw4qJA5AOQuDyMivdQn+YVa2c0baJxKaXZt
xk4HZdMj4NAS0jRKnoZSwtKW/+Oz6rR4GAWrZo+Zs1/8ur3HfqnQfi8lJ1hJtLLW
V7XDCVRvamVi6Ah3ocYPPp/1P6yeQDA1ge9aMddqaza5STWISXRtSnFMUmYP3rbS
FaL8TyL+ilfny8pkGB2WlG6nLuSbtvogtdEh1gG1k1RmZt44kAtk8ba/KiWFPBSb
zK9cjojRMBS71f9G4kmb5F4rnXoLsg1YbD1Nzhz3wq2Cs1Z90dc2QwMren0zoQ1x
Fn56ueRyAiagBlnrSaKyso/2RvqJTNoSdi3RkpjYeAph0UoDCqvTvKjGAf1mWiw1
T/1lUWSVqWHnzZbM7XXzzajIN9bl6A7bbqlcAJ2O9vZIDt7273DG+bQym9Vh6Why
0LTGGERHxzKBsG7WRg+2Gmvv6S18UPKRo8tLtlA758rHlFuPTZCShWrIriwSNl1K
Hxon+d4BparSnm1h9W/NHPKJA574UbWRCBjdk58IkAj8DxZZY4ORD9SMP+ggkV7G
F6p9cgoDNP9KFg==
=jE0N
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Core:
- Overhaul of posix-timers in preparation of removing the workaround
for periodic timers which have signal delivery ignored.
- Remove the historical extra jiffie in msleep()
msleep() adds an extra jiffie to the timeout value to ensure
minimal sleep time. The timer wheel ensures minimal sleep time
since the large rewrite to a non-cascading wheel, but the extra
jiffie in msleep() remained unnoticed. Remove it.
- Make the timer slack handling correct for realtime tasks.
The procfs interface is inconsistent and does neither reflect
reality nor conforms to the man page. Show the correct 0 slack for
real time tasks and enforce it at the core level instead of having
inconsistent individual checks in various timer setup functions.
- The usual set of updates and enhancements all over the place.
Drivers:
- Allow the ACPI PM timer to be turned off during suspend
- No new drivers
- The usual updates and enhancements in various drivers"
* tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
ntp: Make sure RTC is synchronized when time goes backwards
treewide: Fix wrong singular form of jiffies in comments
cpu: Use already existing usleep_range()
timers: Rename next_expiry_recalc() to be unique
platform/x86:intel/pmc: Fix comment for the pmc_core_acpi_pm_timer_suspend_resume function
clocksource/drivers/jcore: Use request_percpu_irq()
clocksource/drivers/cadence-ttc: Add missing clk_disable_unprepare in ttc_setup_clockevent
clocksource/drivers/asm9260: Add missing clk_disable_unprepare in asm9260_timer_init
clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init()
clocksource/drivers/ingenic: Use devm_clk_get_enabled() helpers
platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended
clocksource: acpi_pm: Add external callback for suspend/resume
clocksource/drivers/arm_arch_timer: Using for_each_available_child_of_node_scoped()
dt-bindings: timer: rockchip: Add rk3576 compatible
timers: Annotate possible non critical data race of next_expiry
timers: Remove historical extra jiffie for timeout in msleep()
hrtimer: Use and report correct timerslack values for realtime tasks
hrtimer: Annotate hrtimer_cpu_base_.*_expiry() for sparse.
timers: Add sparse annotation for timer_sync_wait_running().
signal: Replace BUG_ON()s
...
- Core:
- Remove a global lock in the affinity setting code
The lock protects a cpumask for intermediate results and the lock
causes a bottleneck on simultaneous start of multiple virtual
machines. Replace the lock and the static cpumask with a per CPU
cpumask which is nicely serialized by raw spinlock held when
executing this code.
- Provide support for giving a suffix to interrupt domain names.
That's required to support devices with subfunctions so that the
domain names are distinct even if they originate from the same
device node.
- The usual set of cleanups and enhancements all over the place
- Drivers:
- Support for longarch AVEC interrupt chip
- Refurbishment of the Armada driver so it can be extended for new
variants.
- The usual set of cleanups and enhancements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbn5p8THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoRFtD/43eB3h5usY2OPW0JmDqrE6qnzsvjPZ
1H52BcmMcOuI6yCfTnbi/fBB52mwSEGq9Dmt1GXradyq9/CJDIqZ1ajI1rA2jzW2
YdbeTDpKm1rS2ddzfp2LT2BryrNt+7etrRO7qHn4EKSuOcNuV2f58WPbIIqasvaK
uPbUDVDPrvXxLNcjoab6SqaKrEoAaHSyKpd0MvDd80wHrtcSC/QouW7JDSUXv699
RwvLebN1OF6mQ2J8Z3DLeCQpcbAs+UT8UvID7kYUJi1g71J/ZY+xpMLoX/gHiDNr
isBtsuEAiZeNaFpksc7A6Jgu5ljZf2/aLCqbPLlHaduHFNmo94x9KUbIF2cpEMN+
rsf5Ff7AVh1otz3cUwLLsm+cFLWRRoZdLuncn7rrgB4Yg0gll7qzyLO6YGvQHr8U
Ocj1RXtvvWsMk4XzhgCt1AH/42cO6go+bhA4HspeYykNpsIldIUl1MeFbO8sWiDJ
kybuwiwHp3oaMLjEK4Lpq65u7Ll8Lju2zRde65YUJN2nbNmJFORrOLmeC1qsr6ri
dpend6n2qD9UD1oAt32ej/uXnG160nm7UKescyxiZNeTm1+ez8GW31hY128ifTY3
4R3urGS38p3gazXBsfw6eqkeKx0kEoDNoQqrO5gBvb8kowYTvoZtkwMGAN9OADwj
w6vvU0i+NIyVMA==
=JlJ2
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Core:
- Remove a global lock in the affinity setting code
The lock protects a cpumask for intermediate results and the lock
causes a bottleneck on simultaneous start of multiple virtual
machines. Replace the lock and the static cpumask with a per CPU
cpumask which is nicely serialized by raw spinlock held when
executing this code.
- Provide support for giving a suffix to interrupt domain names.
That's required to support devices with subfunctions so that the
domain names are distinct even if they originate from the same
device node.
- The usual set of cleanups and enhancements all over the place
Drivers:
- Support for longarch AVEC interrupt chip
- Refurbishment of the Armada driver so it can be extended for new
variants.
- The usual set of cleanups and enhancements all over the place"
* tag 'irq-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
genirq: Use cpumask_intersects()
genirq/cpuhotplug: Use cpumask_intersects()
irqchip/apple-aic: Only access system registers on SoCs which provide them
irqchip/apple-aic: Add a new "Global fast IPIs only" feature level
irqchip/apple-aic: Skip unnecessary enabling of use_fast_ipi
dt-bindings: apple,aic: Document A7-A11 compatibles
irqdomain: Use IS_ERR_OR_NULL() in irq_domain_trim_hierarchy()
genirq/msi: Use kmemdup_array() instead of kmemdup()
genirq/proc: Change the return value for set affinity permission error
genirq/proc: Use irq_move_pending() in show_irq_affinity()
genirq/proc: Correctly set file permissions for affinity control files
genirq: Get rid of global lock in irq_do_set_affinity()
genirq: Fix typo in struct comment
irqchip/loongarch-avec: Add AVEC irqchip support
irqchip/loongson-pch-msi: Prepare get_pch_msi_handle() for AVECINTC
irqchip/loongson-eiointc: Rename CPUHP_AP_IRQ_LOONGARCH_STARTING
LoongArch: Architectural preparation for AVEC irqchip
LoongArch: Move irqchip function prototypes to irq-loongson.h
irqchip/loongson-pch-msi: Switch to MSI parent domains
softirq: Remove unused 'action' parameter from action callback
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEvgAKCRCRxhvAZXjc
onQWAQD6IxAKPU0zom2FoWNilvSzPs7WglTtvddX9pu/lT1RNAD/YC/wOLW8mvAv
9oTAmigQDQQhEWdJA9RgLZBiw7k+DAw=
=zWFb
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull netfs updates from Christian Brauner:
"This contains the work to improve read/write performance for the new
netfs library.
The main performance enhancing changes are:
- Define a structure, struct folio_queue, and a new iterator type,
ITER_FOLIOQ, to hold a buffer as a replacement for ITER_XARRAY. See
that patch for questions about naming and form.
ITER_FOLIOQ is provided as a replacement for ITER_XARRAY. The
problem with an xarray is that accessing it requires the use of a
lock (typically the RCU read lock) - and this means that we can't
supply iterate_and_advance() with a step function that might sleep
(crypto for example) without having to drop the lock between pages.
ITER_FOLIOQ is the iterator for a chain of folio_queue structs,
where each folio_queue holds a small list of folios. A folio_queue
struct is a simpler structure than xarray and is not subject to
concurrent manipulation by the VM. folio_queue is used rather than
a bvec[] as it can form lists of indefinite size, adding to one end
and removing from the other on the fly.
- Provide a copy_folio_from_iter() wrapper.
- Make cifs RDMA support ITER_FOLIOQ.
- Use folio queues in the write-side helpers instead of xarrays.
- Add a function to reset the iterator in a subrequest.
- Simplify the write-side helpers to use sheaves to skip gaps rather
than trying to work out where gaps are.
- In afs, make the read subrequests asynchronous, putting them into
work items to allow the next patch to do progressive
unlocking/reading.
- Overhaul the read-side helpers to improve performance.
- Fix the caching of a partial block at the end of a file.
- Allow a store to be cancelled.
Then some changes for cifs to make it use folio queues instead of
xarrays for crypto bufferage:
- Use raw iteration functions rather than manually coding iteration
when hashing data.
- Switch to using folio_queue for crypto buffers.
- Remove the xarray bits.
Make some adjustments to the /proc/fs/netfs/stats file such that:
- All the netfs stats lines begin 'Netfs:' but change this to
something a bit more useful.
- Add a couple of stats counters to track the numbers of skips and
waits on the per-inode writeback serialisation lock to make it
easier to check for this as a source of performance loss.
Miscellaneous work:
- Ensure that the sb_writers lock is taken around
vfs_{set,remove}xattr() in the cachefiles code.
- Reduce the number of conditional branches in netfs_perform_write().
- Move the CIFS_INO_MODIFIED_ATTR flag to the netfs_inode struct and
remove cifs_post_modify().
- Move the max_len/max_nr_segs members from netfs_io_subrequest to
netfs_io_request as they're only needed for one subreq at a time.
- Add an 'unknown' source value for tracing purposes.
- Remove NETFS_COPY_TO_CACHE as it's no longer used.
- Set the request work function up front at allocation time.
- Use bh-disabling spinlocks for rreq->lock as cachefiles completion
may be run from block-filesystem DIO completion in softirq context.
- Remove fs/netfs/io.c"
* tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
docs: filesystems: corrected grammar of netfs page
cifs: Don't support ITER_XARRAY
cifs: Switch crypto buffer to use a folio_queue rather than an xarray
cifs: Use iterate_and_advance*() routines directly for hashing
netfs: Cancel dirty folios that have no storage destination
cachefiles, netfs: Fix write to partial block at EOF
netfs: Remove fs/netfs/io.c
netfs: Speed up buffered reading
afs: Make read subreqs async
netfs: Simplify the writeback code
netfs: Provide an iterator-reset function
netfs: Use new folio_queue data type and iterator instead of xarray iter
cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs
iov_iter: Provide copy_folio_from_iter()
mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios
netfs: Use bh-disabling spinlocks for rreq->lock
netfs: Set the request work function upon allocation
netfs: Remove NETFS_COPY_TO_CACHE
netfs: Reserve netfs_sreq_source 0 as unset/unknown
netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream
...
As described in commit 42d9b379e3 ("lib/Kconfig.debug: Allow BTF +
DWARF5 with pahole 1.21+"), the combination of CONFIG_DEBUG_INFO_BTF
and CONFIG_DEBUG_INFO_DWARF5 requires pahole 1.21+.
GCC 11+ and Clang 14+ default to DWARF 5 when the -g flag is passed.
For the same reason, the combination of CONFIG_DEBUG_INFO_BTF and
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is also likely to require
pahole 1.21+ these days. (At least, it is uncertain whether the actual
requirement is pahole 1.16+ or 1.21+.)
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240913173759.1316390-3-masahiroy@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When DEBUG_INFO_DWARF5 is selected, pahole 1.21+ is required to enable
DEBUG_INFO_BTF.
When DEBUG_INFO_DWARF4 or DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is selected,
DEBUG_INFO_BTF can be enabled without pahole installed, but a build error
will occur in scripts/link-vmlinux.sh:
LD .tmp_vmlinux1
BTF: .tmp_vmlinux1: pahole (pahole) is not available
Failed to generate BTF for vmlinux
Try to disable CONFIG_DEBUG_INFO_BTF
We did not guard DEBUG_INFO_BTF by PAHOLE_VERSION when previously
discussed [1].
However, commit 613fe16923 ("kbuild: Add CONFIG_PAHOLE_VERSION")
added CONFIG_PAHOLE_VERSION after all. Now several CONFIG options, as
well as the combination of DEBUG_INFO_BTF and DEBUG_INFO_DWARF5, are
guarded by PAHOLE_VERSION.
The remaining compile-time check in scripts/link-vmlinux.sh now appears
to be an awkward inconsistency.
This commit adopts Nathan's original work.
[1]: https://lore.kernel.org/lkml/20210111180609.713998-1-natechancellor@gmail.com/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240913173759.1316390-2-masahiroy@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Depending on the architecture, building a 32-bit vDSO on a 64-bit kernel
is problematic when some system headers are included.
Minimise the amount of headers by moving needed items, such as
__{get,put}_unaligned_t, into dedicated common headers and in general
use more specific headers, similar to what was done in commit
8165b57bca ("linux/const.h: Extract common header for vDSO") and
commit 8c59ab839f ("lib/vdso: Enable common headers").
On some architectures this results in missing PAGE_SIZE, as was
described by commit 8b3843ae36 ("vdso/datapage: Quick fix - use
asm/page-def.h for ARM64"), so define this if necessary, in the same way
as done prior by commit cffaefd15a ("vdso: Use CONFIG_PAGE_SHIFT in
vdso/datapage.h").
Removing linux/time64.h leads to missing 'struct timespec64' in
x86's asm/pvclock.h. Add a forward declaration of that struct in
that file.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
With the current implementation, __cvdso_getrandom_data() calls
memset() on certain architectures, which is unexpected in the VDSO.
Rather than providing a memset(), simply rewrite opaque data
initialization to avoid memset().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Same as for the gettimeofday CVDSO implementation, add c-getrandom-y to
ease the inclusion of lib/vdso/getrandom.c in architectures' VDSO
builds.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Performing SMP atomic operations on u64 fails on powerpc32:
CC drivers/char/random.o
In file included from <command-line>:
drivers/char/random.c: In function 'crng_reseed':
././include/linux/compiler_types.h:510:45: error: call to '__compiletime_assert_391' declared with attribute error: Need native word sized stores/loads for atomicity.
510 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
././include/linux/compiler_types.h:491:25: note: in definition of macro '__compiletime_assert'
491 | prefix ## suffix(); \
| ^~~~~~
././include/linux/compiler_types.h:510:9: note: in expansion of macro '_compiletime_assert'
510 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
././include/linux/compiler_types.h:513:9: note: in expansion of macro 'compiletime_assert'
513 | compiletime_assert(__native_word(t), \
| ^~~~~~~~~~~~~~~~~~
./arch/powerpc/include/asm/barrier.h:74:9: note: in expansion of macro 'compiletime_assert_atomic_type'
74 | compiletime_assert_atomic_type(*p); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/asm-generic/barrier.h:172:55: note: in expansion of macro '__smp_store_release'
172 | #define smp_store_release(p, v) do { kcsan_release(); __smp_store_release(p, v); } while (0)
| ^~~~~~~~~~~~~~~~~~~
drivers/char/random.c:286:9: note: in expansion of macro 'smp_store_release'
286 | smp_store_release(&__arch_get_k_vdso_rng_data()->generation, next_gen + 1);
| ^~~~~~~~~~~~~~~~~
The kernel-side generation counter in the random driver is handled as an
unsigned long, not as a u64, in base_crng and struct crng.
But on the vDSO side, it needs to be an u64, not just an unsigned long,
in order to support a 32-bit vDSO atop a 64-bit kernel.
On kernel side, however, it is an unsigned long, hence a 32-bit value on
32-bit architectures, so just cast it to unsigned long for the
smp_store_release(). A side effect is that on big endian architectures
the store will be performed in the upper 32 bits. It is not an issue on
its own because the vDSO site doesn't mind the value, as it only checks
differences. Just make sure that the vDSO side checks the full 64 bits.
For that, the local current_generation has to be u64 as well.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Adds test suite for integer based power function which performs integer
exponentiation.
The test suite is designed to verify that the implementation of int_pow
correctly computes the power of a given base raised to a given exponent.
The tests check various scenarios and edge cases to ensure the accuracy
and reliability of the exponentiation function.
Updated commit with test information at commit time: Shuah Khan
Signed-off-by: Luis Felipe Hernandez <luis.hernandez093@gmail.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Define a data structure, struct folio_queue, to represent a sequence of
folios and a kernel-internal I/O iterator type, ITER_FOLIOQ, to allow a
list of folio_queue structures to be used to provide a buffer to
iov_iter-taking functions, such as sendmsg and recvmsg.
The folio_queue structure looks like:
struct folio_queue {
struct folio_batch vec;
u8 orders[PAGEVEC_SIZE];
struct folio_queue *next;
struct folio_queue *prev;
unsigned long marks;
unsigned long marks2;
};
It does not use a list_head so that next and/or prev can be set to NULL at
the ends of the list, allowing iov_iter-handling routines to determine that
they *are* the ends without needing to store a head pointer in the iov_iter
struct.
A folio_batch struct is used to hold the folio pointers which allows the
batch to be passed to batch handling functions. Two mark bits are
available per slot. The intention is to use at least one of them to mark
folios that need putting, but that might not be ultimately necessary.
Accessor functions are used to access the slots to do the masking and an
additional accessor function is used to indicate the size of the array.
The order of each folio is also stored in the structure to avoid the need
for iov_iter_advance() and iov_iter_revert() to have to query each folio to
find its size.
With careful barriering, this can be used as an extending buffer with new
folios inserted and new folio_queue structs added without the need for a
lock. Further, provided we always keep at least one struct in the buffer,
we can also remove consumed folios and consumed structs from the head end
as we without the need for locks.
[Questions/thoughts]
(1) To manage this, I need a head pointer, a tail pointer, a tail slot
number (assuming insertion happens at the tail end and the next
pointers point from head to tail). Should I put these into a struct
of their own, say "folio_queue_head" or "rolling_buffer"?
I will end up with two of these in netfs_io_request eventually, one
keeping track of the pagecache I'm dealing with for buffered I/O and
the other to hold a bounce buffer when we need one.
(2) Should I make the slots {folio,off,len} or bio_vec?
(3) This is intended to replace ITER_XARRAY eventually. Using an xarray
in I/O iteration requires the taking of the RCU read lock, doing
copying under the RCU read lock, walking the xarray (which may change
under us), handling retries and dealing with special values.
The advantage of ITER_XARRAY is that when we're dealing with the
pagecache directly, we don't need any allocation - but if we're doing
encrypted comms, there's a good chance we'd be using a bounce buffer
anyway.
This will require afs, erofs, cifs, orangefs and fscache to be
converted to not use this. afs still uses it for dirs and symlinks;
some of erofs usages should be easy to change, but there's one which
won't be so easy; ceph's use via fscache can be fixed by porting ceph
to netfslib; cifs is using xarray as a bounce buffer - that can be
moved to use sheaves instead; and orangefs has a similar problem to
erofs - maybe orangefs could use netfslib?
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Gao Xiang <xiang@kernel.org>
cc: Mike Marshall <hubcap@omnibond.com>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: linux-erofs@lists.ozlabs.org
cc: devel@lists.orangefs.org
Link: https://lore.kernel.org/r/20240814203850.2240469-13-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
With freader we don't need to restrict ourselves to a single page, so
let's allow ELF notes to be at any valid position with the file.
We also merge parse_build_id() and parse_build_id_buf() as now the only
difference between them is note offset overflow, which makes sense to
check in all situations.
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240829174232.3133883-8-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Extend freader with a flag specifying whether it's OK to cause page
fault to fetch file data that is not already physically present in
memory. With this, it's now easy to wait for data if the caller is
running in sleepable (faultable) context.
We utilize read_cache_folio() to bring the desired folio into page
cache, after which the rest of the logic works just the same at folio level.
Suggested-by: Omar Sandoval <osandov@fb.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240829174232.3133883-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make it clear that build_id_parse() assumes that it can take no page
fault by renaming it and current few users to build_id_parse_nofault().
Also add build_id_parse() stub which for now falls back to non-sleepable
implementation, but will be changed in subsequent patches to take
advantage of sleepable context. PROCMAP_QUERY ioctl() on
/proc/<pid>/maps file is using build_id_parse() and will automatically
take advantage of more reliable sleepable context implementation.
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240829174232.3133883-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that freader allows to access multiple pages transparently, there is
no need to limit program headers to the very first ELF file page. Remove
this limitation, but still put some sane limit on amount of program
headers that we are willing to iterate over (set arbitrarily to 256).
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240829174232.3133883-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Current code assumption is that program (segment) headers are following
ELF header immediately. This is a common case, but is not guaranteed. So
take into account e_phoff field of the ELF header when accessing program
headers.
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240829174232.3133883-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add freader abstraction that transparently manages fetching and local
mapping of the underlying file page(s) and provides a simple direct data
access interface.
freader_fetch() is the only and single interface necessary. It accepts
file offset and desired number of bytes that should be accessed, and
will return a kernel mapped pointer that caller can use to dereference
data up to requested size. Requested size can't be bigger than the size
of the extra buffer provided during initialization (because, worst case,
all requested data has to be copied into it, so it's better to flag
wrongly sized buffer unconditionally, regardless if requested data range
is crossing page boundaries or not).
If folio is not paged in, or some of the conditions are not satisfied,
NULL is returned and more detailed error code can be accessed through
freader->err field. This approach makes the usage of freader_fetch()
cleaner.
To accommodate accessing file data that crosses folio boundaries, user
has to provide an extra buffer that will be used to make a local copy,
if necessary. This is done to maintain a simple linear pointer data
access interface.
We switch existing build ID parsing logic to it, without changing or
lifting any of the existing constraints, yet. This will be done
separately.
Given existing code was written with the assumption that it's always
working with a single (first) page of the underlying ELF file, logic
passes direct pointers around, which doesn't really work well with
freader approach and would be limiting when removing the single page (folio)
limitation. So we adjust all the logic to work in terms of file offsets.
There is also a memory buffer-based version (freader_init_from_mem())
for cases when desired data is already available in kernel memory. This
is used for parsing vmlinux's own build ID note. In this mode assumption
is that provided data starts at "file offset" zero, which works great
when parsing ELF notes sections, as all the parsing logic is relative to
note section's start.
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240829174232.3133883-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Harden build ID parsing logic, adding explicit READ_ONCE() where it's
important to have a consistent value read and validated just once.
Also, as pointed out by Andi Kleen, we need to make sure that entire ELF
note is within a page bounds, so move the overflow check up and add an
extra note_size boundaries validation.
Fixes tag below points to the code that moved this code into
lib/buildid.c, and then subsequently was used in perf subsystem, making
this code exposed to perf_event_open() users in v5.12+.
Cc: stable@vger.kernel.org
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Jann Horn <jannh@google.com>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Fixes: bd7525dacd ("bpf: Move stack_map_get_build_id into lib")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240829174232.3133883-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add null check for character class. Previously, an inverted character
class could result in a nul byte being matched and lead to the function
reading past the end of the inputted string.
Link: https://lkml.kernel.org/r/20240826155709.12383-1-swaminathanalok@gmail.com
Signed-off-by: Alok Swaminathan <swaminathanalok@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
People keep trying to remove three functions that are going to be used in
a feature that is being developed. Dropping the functions entirely may
end up with people trying to use the bit for other uses, as people have
tried in the past.
Adding __maybe_unused stops compilers complaining about the unused
functions so they can be silently optimised out of the compiled code and
people won't try to claim the bit for another use.
Link: https://lore.kernel.org/all/20230726080916.17454-2-zhangpeng.00@bytedance.com/
Link: https://lore.kernel.org/all/202408310728.S7EE59BN-lkp@intel.com/
Link: https://lkml.kernel.org/r/20240907021506.4018676-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ZSTD_createCDict_advanced2() must ensure that
ZSTD_createCDict_advanced_internal() has successfully allocated cdict.
customMalloc() may be called under low memory condition and may be unable
to allocate workspace for cdict.
Link: https://lkml.kernel.org/r/20240902105656.1383858-4-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This symbol is needed to enable lz4hc dictionary support.
Link: https://lkml.kernel.org/r/20240902105656.1383858-3-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch tries to cleanup some function description:
* function name mismatch
* parameter name mismatch
* parameter all end up with ':'
* not prefix '*' if parameter is a pointer
There is still some missing description of parameters, I didn't add them
since I am not sure the exact meaning.
Link: https://lkml.kernel.org/r/20240830220400.2007-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Just do what mt_dump_range64() does.
Dump the error message based on format.
Link: https://lkml.kernel.org/r/20240826012422.29935-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mt_dump_arange64() only applies to an entry whose type is maple_arange_64,
in which mte_is_leaf() must return false.
Since mte_is_leaf() here is always false, we can remove this condition
check.
Link: https://lkml.kernel.org/r/20240826012422.29935-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fill_pool() checks locklessly at the beginning whether the pool has to be
refilled. After that it checks locklessly in a loop whether the free list
contains objects and repeats the refill check.
If both conditions are true, it acquires the pool lock and tries to move
objects from the free list to the pool repeating the same checks again.
There are two redundant issues with that:
1) The repeated check for the fill condition
2) The loop processing
The repeated check is pointless as it was just established that fill is
required. The condition has to be re-evaluated under the lock anyway.
The loop processing is not required either because there is practically
zero chance that a repeated attempt will succeed if the checks under the
lock terminate the moving of objects.
Remove the redundant check and replace the loop with a simple if condition.
[ tglx: Massaged change log ]
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240904133944.2124-4-thunder.leizhen@huawei.com
fill_pool() uses 'obj_pool_min_free' to decide whether objects should be
handed back to the kmem cache. But 'obj_pool_min_free' records the lowest
historical value of the number of objects in the object pool and not the
minimum number of objects which should be kept in the pool.
Use 'debug_objects_pool_min_level' instead, which holds the minimum number
which was scaled to the number of CPUs at boot time.
[ tglx: Massage change log ]
Fixes: d26bf5056f ("debugobjects: Reduce number of pool_lock acquisitions in fill_pool()")
Fixes: 36c4ead6f6 ("debugobjects: Add global free list and the counter")
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240904133944.2124-3-thunder.leizhen@huawei.com
1. Both debug_objects_pool_min_level and debug_objects_pool_size are
read-only after initialization, change attribute '__read_mostly' to
'__ro_after_init', and remove '__data_racy'.
2. Many global variables are read in the debug_stats_show() function, but
didn't mask KCSAN's detection. Add '__data_racy' for them.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240904133944.2124-2-thunder.leizhen@huawei.com
There are several comments all over the place, which uses a wrong singular
form of jiffies.
Replace 'jiffie' by 'jiffy'. No functional change.
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Link: https://lore.kernel.org/all/20240904-devel-anna-maria-b4-timers-flseep-v1-3-e98760256370@linutronix.de
This kunit update for Linux 6.11-rc7 consist of one single fix to
a use-after-free bug resulting from kunit_driver_create() failing
to copy the driver name leaving it on the stack or freeing it.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmbY0WMACgkQCwJExA0N
QxzCgBAA7Cb6tyvGcXsQTXC50S90CR+3bGmHzTL8jl/ElHvTz521UzPTn01QB51t
JcGNhKz3RByvRBuukhg7abpCnCYWZoa9pmxojVD5D1TO2AXvypWEv0ao/UwSAYyi
2b7BTkcc7ciRske51/yFfipjwI/NLLIlu4HVcZ0OisOt+tvHzoz50KiyYV+Qan8r
e8NkqVI587KLfDAZRC+cLXyJCIRwlCK+jNMrjoiOanv1Ybe65eAGNQmAIyuGX1Fo
Ku8ZgoCgpc+Vjc1bMWgwgHWCdFOvINdd7ibfCp59JBBAkqYFpHYS5Lk9kHWH6lYF
X9THLaCSh5cq+u0qksW8p4ml1fYnWZbm92qkdPj0wG36v9la769HSXijtVhL2lxD
b1ca/NpfNfbbr5mxoVRq4ulO1JvyC6jmRKSJWt1p1SFfHf+Oaowh2Sr2ZjFfOozj
+/Joh3n2dxlnH/in8BvXGwQIo7xbyTatm/4IVCccJAolR+hPv7izBeWfYn3xgtu5
5WZVcxPMxNwgNHWnxm2nbxTtBTvTsOSC8/nbxm8g3jM9cHCP7Mz3/zSV6p2vcRxm
HPx/Qj2LmNcPKGXs4jh7WLErgkunxlvsqCJChwGjZoYR0fgRmzCgrwbkDE6/26UW
Teo51bWwD/CxTy7OtXi8D2pPzVqt8u5cFPaNgHaRzxLDuVTouhU=
=JRC5
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-kunit-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fix fromShuah Khan:
"One single fix to a use-after-free bug resulting from
kunit_driver_create() failing to copy the driver name leaving it on
the stack or freeing it"
* tag 'linux_kselftest-kunit-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Device wrappers should also manage driver name
Pull bpf/master to receive baebe9aaba ("bpf: allow passing struct
bpf_iter_<type> as kfunc arguments") and related changes in preparation for
the DSQ iterator patchset.
Signed-off-by: Tejun Heo <tj@kernel.org>
Patch series "Increase the number of bits available in page_type".
Kent wants more than 16 bits in page_type, so I resurrected this old patch
and expanded it a bit. It's a bit more efficient than our current scheme
(1 4-byte insn vs 3 insns of 13 bytes total) to test a single page type.
This patch (of 4):
An upcoming patch will convert page type from being a bitfield to a
single byte, so we will not be able to use %pG to print the page type
any more. The printing of the symbolic name will be restored in that
patch.
Link: https://lkml.kernel.org/r/20240821173914.2270383-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20240821173914.2270383-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
NETIF_F_LLTX can't be changed via Ethtool and is not a feature,
rather an attribute, very similar to IFF_NO_QUEUE (and hot).
Free one netdev_features_t bit and make it a "hot" private flag.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
debugfs_create_dir() returns error pointers. It never returns NULL. So
use IS_ERR() to check it.
Link: https://lkml.kernel.org/r/20240821073441.9701-1-11162571@vivo.com
Signed-off-by: Yang Ruibin <11162571@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
*-objs suffix is reserved rather for (user-space) host programs while
usually *-y suffix is used for kernel drivers (although *-objs works for
that purpose for now).
Let's correct the old usages of *-objs in Makefiles.
Link: https://lkml.kernel.org/r/20240821155140.611514-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Tal Gilboa <talgi@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add missing __percpu qualifier to a (void *) cast to fix
percpu_counter.c:212:36: warning: cast removes address space '__percpu' of expression
percpu_counter.c:212:33: warning: incorrect type in assignment (different address spaces)
percpu_counter.c:212:33: expected signed int [noderef] [usertype] __percpu *counters
percpu_counter.c:212:33: got void *
sparse warnings.
Found by GCC's named address space checks.
There were no changes in the resulting object file.
Link: https://lkml.kernel.org/r/20240814064437.940162-1-ubizjak@gmail.com
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The original _bin2bcd() function used / 10 and % 10 operations for
conversion. Although GCC optimizes these operations and does not generate
division or modulus instructions, the new implementation reduces the
number of mov instructions in the generated code for both x86-64 and ARM
architectures.
This optimization calculates the tens digit using (val * 103) >> 10, which
is accurate for values of 'val' in the range [0, 178]. Given that the
valid input range is [0, 99], this method ensures correctness while
simplifying the generated code.
Link: https://lkml.kernel.org/r/20240812170229.229380-1-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The fault-inject.h users across the kernel need to add a lot of #ifdef
CONFIG_FAULT_INJECTION to cater for shortcomings in the header. Make
fault-inject.h self-contained for CONFIG_FAULT_INJECTION=n, and add stubs
for DECLARE_FAULT_ATTR(), setup_fault_attr(), should_fail_ex(), and
should_fail() to allow removal of conditional compilation.
[akpm@linux-foundation.org: repair fallout from no longer including debugfs.h into fault-inject.h]
[akpm@linux-foundation.org: fix drivers/misc/xilinx_tmr_inject.c]
[akpm@linux-foundation.org: Add debugfs.h inclusion to more files, per Stephen]
Link: https://lkml.kernel.org/r/20240813121237.2382534-1-jani.nikula@intel.com
Fixes: 6ff1cb355e ("[PATCH] fault-injection capabilities infrastructure")
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Upon allocation failure, the current check with the nofail bits is
unnecessary, and further stands in the way of discouraging direct use of
__GFP_NOFAIL. Remove this and replace with the proper way of determining
if doing a non-blocking allocation for the nested table case.
Link: https://lkml.kernel.org/r/20240806153927.184515-1-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Suggested-by: Michal Hocko <mhocko@suse.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
CONFIG_LOCKDEP_CHAINS_BITS value decides the size of chain_hlocks[] in
kernel/locking/lockdep.c, and it is checked by add_chain_cache() with
BUILD_BUG_ON((1UL << 24) <= ARRAY_SIZE(chain_hlocks));
This patch is just to silence BUILD_BUG_ON().
See also https://lore.kernel.org/all/30795.1620913191@jrobl/
[cmllamas@google.com: fix minor checkpatch issues in commit log]
Link: https://lkml.kernel.org/r/20240723164018.2489615-1-cmllamas@google.com
Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There is a spelling mistake in a literal string and in cariable names.
Fix these.
Link: https://lkml.kernel.org/r/20240725093044.1742842-1-deshan@nfschina.com
Signed-off-by: Deshan Zhang <deshan@nfschina.com>
Cc: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A single line break should be put into a sequence. Thus use the
corresponding function "seq_putc".
This issue was transformed by using the Coccinelle software.
Link: https://lkml.kernel.org/r/e7faa2c4-9590-44b4-8669-69ef810277b1@web.de
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Single characters should be put into a sequence. Thus use the
corresponding function "seq_putc".
This issue was transformed by using the Coccinelle software.
Link: https://lkml.kernel.org/r/375b5b4b-6295-419e-bae9-da724a7a682d@web.de
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
XZ_EXTERN was used to make internal functions static in the preboot code.
However, in other decompressors this hasn't been done. On x86-64, this
makes no difference to the kernel image size.
Omit XZ_EXTERN and let some of the internal functions be extern in the
preboot code. Omitting XZ_EXTERN from include/linux/xz.h fixes warnings
in "make htmldocs" and makes the intradocument links to xz_dec functions
work in Documentation/staging/xz.rst. The alternative would have been to
add "XZ_EXTERN" to c_id_attributes in Documentation/conf.py but omitting
XZ_EXTERN seemed cleaner.
Link: https://lore.kernel.org/lkml/20240723205437.3c0664b0@kaneli/
Link: https://lkml.kernel.org/r/20240724110544.16430-1-lasse.collin@tukaani.org
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sam James <sam@gentoo.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Jubin Zhong <zhongjubin@huawei.com>
Cc: Jules Maselbas <jmaselbas@zdiv.net>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rui Li <me@lirui.org>
Cc: Simon Glass <sjg@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use LZMA2 options that match the arch-specific alignment of instructions.
This change reduces compressed kernel size 0-2 % depending on the arch.
On 1-byte-aligned x86 it makes no difference and on 4-byte-aligned archs
it helps the most.
Use the ARM-Thumb filter for ARM-Thumb2 kernels. This reduces compressed
kernel size about 5 %.[1] Previously such kernels were compressed using
the ARM filter which didn't do anything useful with ARM-Thumb2 code.
Add BCJ filter support for ARM64 and RISC-V. Compared to unfiltered XZ or
plain LZMA, the compressed kernel size is reduced about 5 % on ARM64 and 7
% on RISC-V. A new enough version of the xz tool is required: 5.4.0 for
ARM64 and 5.6.0 for RISC-V. With an old xz version, a message is printed
to standard error and the kernel is compressed without the filter.
Update lib/decompress_unxz.c to match the changes to xz_wrap.sh.
Update the CONFIG_KERNEL_XZ help text in init/Kconfig:
- Add the RISC-V and ARM64 filters.
- Clarify that the PowerPC filter is for big endian only.
- Omit IA-64.
Link: https://lore.kernel.org/lkml/1637379771-39449-1-git-send-email-zhongjubin@huawei.com/ [1]
Link: https://lkml.kernel.org/r/20240721133633.47721-15-lasse.collin@tukaani.org
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Reviewed-by: Sam James <sam@gentoo.org>
Cc: Simon Glass <sjg@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Jubin Zhong <zhongjubin@huawei.com>
Cc: Jules Maselbas <jmaselbas@zdiv.net>
Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rui Li <me@lirui.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A later commit updates lib/decompress_unxz.c to enable this filter for
kernel decompression. lib/decompress_unxz.c is already used if
CONFIG_EFI_ZBOOT=y && CONFIG_KERNEL_XZ=y.
This filter can be used by Squashfs without modifications to the Squashfs
kernel code (only needs support in userspace Squashfs-tools).
Link: https://lkml.kernel.org/r/20240721133633.47721-13-lasse.collin@tukaani.org
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Reviewed-by: Sam James <sam@gentoo.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jubin Zhong <zhongjubin@huawei.com>
Cc: Jules Maselbas <jmaselbas@zdiv.net>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rui Li <me@lirui.org>
Cc: Simon Glass <sjg@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Also omit a duplicated check for XZ_DEC_ARM in xz_private.h.
A later commit updates lib/decompress_unxz.c to enable this filter for
kernel decompression. lib/decompress_unxz.c is already used if
CONFIG_EFI_ZBOOT=y && CONFIG_KERNEL_XZ=y.
This filter can be used by Squashfs without modifications to the Squashfs
kernel code (only needs support in userspace Squashfs-tools).
Link: https://lkml.kernel.org/r/20240721133633.47721-12-lasse.collin@tukaani.org
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Reviewed-by: Sam James <sam@gentoo.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jubin Zhong <zhongjubin@huawei.com>
Cc: Jules Maselbas <jmaselbas@zdiv.net>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rui Li <me@lirui.org>
Cc: Simon Glass <sjg@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Compilers cannot optimize the addition "i + 4" away since theoretically it
could overflow.
Link: https://lkml.kernel.org/r/20240721133633.47721-11-lasse.collin@tukaani.org
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Reviewed-by: Sam James <sam@gentoo.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jubin Zhong <zhongjubin@huawei.com>
Cc: Jules Maselbas <jmaselbas@zdiv.net>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rui Li <me@lirui.org>
Cc: Simon Glass <sjg@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In 2018, a dependency on <linux/crc32poly.h> was added to avoid
duplicating the same constant in multiple files. Two months later it was
found to be a bad idea and the definition of CRC32_POLY_LE macro was moved
into xz_private.h to avoid including <linux/crc32poly.h>.
xz_private.h is a wrong place for it too. Revert back to the upstream
version which has the poly in xz_crc32_init() in xz_crc32.c.
Link: https://lkml.kernel.org/r/20240721133633.47721-10-lasse.collin@tukaani.org
Fixes: faa16bc404 ("lib: Use existing define with polynomial")
Fixes: 242cdad873 ("lib/xz: Put CRC32_POLY_LE in xz_private.h")
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Reviewed-by: Sam James <sam@gentoo.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jubin Zhong <zhongjubin@huawei.com>
Cc: Jules Maselbas <jmaselbas@zdiv.net>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rui Li <me@lirui.org>
Cc: Simon Glass <sjg@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- Fix comments that were no longer in sync with the code below them.
- Fix language errors.
- Fix coding style.
Link: https://lkml.kernel.org/r/20240721133633.47721-5-lasse.collin@tukaani.org
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Reviewed-by: Sam James <sam@gentoo.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jubin Zhong <zhongjubin@huawei.com>
Cc: Jules Maselbas <jmaselbas@zdiv.net>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rui Li <me@lirui.org>
Cc: Simon Glass <sjg@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remove the public domain notices and add SPDX license identifiers.
Change MODULE_LICENSE from "GPL" to "Dual BSD/GPL" because 0BSD should
count as a BSD license variant here.
The switch to 0BSD was done in the upstream XZ Embedded project because
public domain has (real or perceived) legal issues in some jurisdictions.
Link: https://lkml.kernel.org/r/20240721133633.47721-4-lasse.collin@tukaani.org
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Reviewed-by: Sam James <sam@gentoo.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jubin Zhong <zhongjubin@huawei.com>
Cc: Jules Maselbas <jmaselbas@zdiv.net>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rui Li <me@lirui.org>
Cc: Simon Glass <sjg@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This file produces large amounts of flaky coverage not useful for the
KCOV's intended use case (guiding the fuzzing process).
Link: https://lkml.kernel.org/r/20240722223726.194658-1-andrey.konovalov@linux.dev
Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Aleksandr Nogikh <nogikh@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_objpool.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240715-md-lib-test_objpool-v2-1-5a2b9369c37e@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Matt Wu <wuqiang.matt@bytedance.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mul_u64_u64_div_u64: new implementation", v3.
This provides an implementation for mul_u64_u64_div_u64() that always
produces exact results.
This patch (of 2):
Library facilities must always return exact results. If the caller may be
contented with approximations then it should do the approximation on its
own.
In this particular case the comment in the code says "the algorithm
... below might lose some precision". Well, if you try it with e.g.:
a = 18446462598732840960
b = 18446462598732840960
c = 18446462598732840961
then the produced answer is 0 whereas the exact answer should be
18446462598732840959. This is _some_ precision lost indeed!
Let's reimplement this function so it always produces the exact result
regardless of its inputs while preserving existing fast paths when
possible.
Uwe said:
: My personal interest is to get the calculations in pwm drivers right.
: This function is used in several drivers below drivers/pwm/ . With the
: errors in mul_u64_u64_div_u64(), pwm consumers might not get the
: settings they request. Although I have to admit that I'm not aware it
: breaks real use cases (because typically the periods used are too short
: to make the involved multiplications overflow), but I pretty sure am
: not aware of all usages and it breaks testing.
:
: Another justification is commits like
: https://git.kernel.org/tip/77baa5bafcbe1b2a15ef9c37232c21279c95481c,
: where people start to work around the precision shortcomings of
: mul_u64_u64_div_u64().
Link: https://lkml.kernel.org/r/20240707190648.1982714-1-nico@fluxnic.net
Link: https://lkml.kernel.org/r/20240707190648.1982714-2-nico@fluxnic.net
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Tested-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Tested-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The return value of various write helper functions are not checked. We
can safely change the return type of these functions to be void.
Link: https://lkml.kernel.org/r/20240814161944.55347-18-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Users of mas_store_prealloc() enter this function with nodes already
preallocated. This means the store type must be already set. We can then
remove the call to mas_wr_store_type() and initialize the write state to
continue the partial walk that was done when determining the store type.
Link: https://lkml.kernel.org/r/20240814161944.55347-17-sidhartha.kumar@oracle.com
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
These sanity checks are now redundant as they are already checked in
mas_wr_store_type(). We can remove them from mas_wr_append() and
mas_wr_node_store().
Link: https://lkml.kernel.org/r/20240814161944.55347-16-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
These write helper functions are all called from store paths which
preallocate enough nodes that will be needed for the write. There is no
more need to allocate within the functions themselves.
Link: https://lkml.kernel.org/r/20240814161944.55347-15-sidhartha.kumar@oracle.com
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Not all users of mas_store() enter with nodes already preallocated.
Check for the MA_STATE_PREALLOC flag to decide whether to preallocate nodes
within mas_store() rather than relying on future write helper functions
to perform the allocations. This allows the write helper functions to be
simplified as they do not have to do checks to make sure there are
enough allocated nodes to perform the write.
Link: https://lkml.kernel.org/r/20240814161944.55347-14-sidhartha.kumar@oracle.com
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There are no more users of the function, safely remove it.
Link: https://lkml.kernel.org/r/20240814161944.55347-13-sidhartha.kumar@oracle.com
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The only callers of mas_commit_b_node() are those with store type of
wr_rebalance and wr_split_store. Use mas->store_type to dispatch to the
correct helper function. This allows the removal of mas_reuse_node() as
it is no longer used.
Link: https://lkml.kernel.org/r/20240814161944.55347-12-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
By setting the store type in mas_insert(), we no longer need to use
mas_wr_modify() to determine the correct store function to use. Instead,
set the store type and call mas_wr_store_entry(). Also, pass in the
requested gfp flags to mas_insert() so they can be passed to the call to
mas_wr_preallocate().
Link: https://lkml.kernel.org/r/20240814161944.55347-11-sidhartha.kumar@oracle.com
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When storing an entry, we can read the store type that was set from a
previous partial walk of the tree. Now that the type of store is known,
select the correct write helper function to use to complete the store.
Also noinline mas_wr_spanning_store() to limit stack frame usage in
mas_wr_store_entry() as it allocates a maple_big_node on the stack.
Link: https://lkml.kernel.org/r/20240814161944.55347-10-sidhartha.kumar@oracle.com
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Knowing the store type of the maple state could be helpful for debugging.
Have mas_dump() print mas->store_type.
Link: https://lkml.kernel.org/r/20240814161944.55347-9-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Refactor mtree_store_range() to use mas_store_gfp() which will abstract
the store, memory allocation, and error handling.
Link: https://lkml.kernel.org/r/20240814161944.55347-8-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use mas_wr_preallocate() in mas_erase() to preallocate enough nodes to
complete the erase. Add error handling by skipping the store if the
preallocation lead to some error besides no memory.
Link: https://lkml.kernel.org/r/20240814161944.55347-7-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Separate call to mas_destroy() from mas_nomem() so we can check for no
memory errors without destroying the current maple state in
mas_store_gfp(). We then add calls to mas_destroy() to callers of
mas_nomem().
Link: https://lkml.kernel.org/r/20240814161944.55347-6-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Introduce mas_wr_store_type() which will set the correct store type based
on a walk of the tree. In mas_wr_node_store() the <= min_slots condition
is changed to < as if new_end is = to mt_min_slots then there is not
enough room.
mas_prealloc_calc() is also introduced to abstract the calculation used to
determine the number of nodes needed for a store operation.
In this change a call to mas_reset() is removed in the error case of
mas_prealloc(). This is only needed in the MA_STATE_REBALANCE case of
mas_destroy(). We can move the call to mas_reset() directly to
mas_destroy().
Also, add a test case to validate the order that we check the store type
in is correct. This test models a vma expanding and then shrinking which
is part of the boot process.
Link: https://lkml.kernel.org/r/20240814161944.55347-5-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Introduce a helper function, mas_wr_prealoc_setup(), that will set up a
maple write state in order to start a walk of a maple tree.
Link: https://lkml.kernel.org/r/20240814161944.55347-3-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In comment of function mas_start(), we list the return value of different
cases. According to the comment context, tell the maple_status here is
more consistent with others.
Let's correct it with ma_active in the case it's a tree.
Link: https://lkml.kernel.org/r/20240812150925.31551-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In comment of mas_start(), we lists the return value for different cases.
In case of a single entry, we set mas->status to ma_root, while the
comment uses mas_root, which is not a maple_status.
Fix the typo according to the code.
Link: https://lkml.kernel.org/r/20240812150925.31551-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add new callback fields to the userspace implementation of struct
kmem_cache. This allows for executing callback functions in order to
further test low memory scenarios where node allocation is retried.
This callback can help test race conditions by calling a function when a
low memory event is tested.
Link: https://lkml.kernel.org/r/20240812190543.71967-2-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The following scenario can result in a race condition:
Consider a node with the following indices and values
a<------->b<----------->c<--------->d
0xA NULL 0xB
CPU 1 CPU 2
--------- ---------
mas_set_range(a,b)
mas_erase()
-> range is expanded (a,c) because of null expansion
mas_nomem()
mas_unlock()
mas_store_range(b,c,0xC)
The node now looks like:
a<------->b<----------->c<--------->d
0xA 0xC 0xB
mas_lock()
mas_erase() <------ range of erase is still (a,c)
The node is now NULL from (a,c) but the write from CPU 2 should have been
retained and range (b,c) should still have 0xC as its value. We can fix
this by re-intializing to the original index and last. This does not need
a cc: Stable as there are no users of the maple tree which use internal
locking and this condition is only possible with internal locking.
Link: https://lkml.kernel.org/r/20240812190543.71967-1-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use min() to simplify the dmirror_exclusive() function and improve its
readability.
Link: https://lkml.kernel.org/r/20240726131245.161695-1-thorsten.blum@toblux.com
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Besides the obvious (and desired) difference between krealloc() and
kvrealloc(), there is some inconsistency in their function signatures and
behavior:
- krealloc() frees the memory when the requested size is zero, whereas
kvrealloc() simply returns a pointer to the existing allocation.
- krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas
kvrealloc() does not accept a NULL pointer at all and, if passed,
would fault instead.
- krealloc() is self-contained, whereas kvrealloc() relies on the caller
to provide the size of the previous allocation.
Inconsistent behavior throughout allocation APIs is error prone, hence
make kvrealloc() behave like krealloc(), which seems superior in all
mentioned aspects.
Besides that, implementing kvrealloc() by making use of krealloc() and
vrealloc() provides oppertunities to grow (and shrink) allocations more
efficiently. For instance, vrealloc() can be optimized to allocate and
map additional pages to grow the allocation or unmap and free unused pages
to shrink the allocation.
[dakr@kernel.org: document concurrency restrictions]
Link: https://lkml.kernel.org/r/20240725125442.4957-1-dakr@kernel.org
[dakr@kernel.org: disable KASAN when switching to vmalloc]
Link: https://lkml.kernel.org/r/20240730185049.6244-2-dakr@kernel.org
[dakr@kernel.org: properly document __GFP_ZERO behavior]
Link: https://lkml.kernel.org/r/20240730185049.6244-5-dakr@kernel.org
Link: https://lkml.kernel.org/r/20240722163111.4766-3-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Chandan Babu R <chandan.babu@oracle.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
codetag_module_init() is used to initialize sections containing allocation
tags. This function is used to initialize module sections as well as core
kernel sections, in which case the module parameter is set to NULL. This
function has to be called even when CONFIG_MODULES=n to initialize core
kernel allocation tag sections. When CONFIG_MODULES=n, this function is a
NOP, which is wrong. This leads to /proc/allocinfo reported as empty.
Fix this by making it independent of CONFIG_MODULES.
Link: https://lkml.kernel.org/r/20240828231536.1770519-1-surenb@google.com
Fixes: 916cc5167c ("lib: code tagging framework")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [6.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmbPwucACgkQSfxwEqXe
A653nRAA0pk0iDH9iz/DLXVy5e4WWE1WQyCdT4jB5H2SItG3fz4kcKz0x1qcPEtA
RUhO4bZLTeFE/QkAQROA41x0ysAbg2dnIefO6CzFhndKGDyOEfUKYAsb65HiYj8Z
HI9XGRYWc8kD35BGDtqGrgbgDgSVS3JPASC8mPJKv608h9f1M1ABqtyuft8bxz57
2OxuXoxVVN4ZI0VyQqqhT1roEiCIuuDaSZlPUws2PjnLxcqIQXXXPMLgN2vi9QzG
cCslhtJMxBAhQ/skAVbxQlI6S2OB0zGROE78k2PK7eqGZuBAex9G0kuWH9Rl3RQL
NmYjITWPZts7LRxCcvUQzxcKYsGb08mvCMCu+AAS9QfI1rOQu/TS7+4IfRHnHyg0
J7OBN0aPqKfciAch5NCfxN5EMUAlwXdro2/salONdGNF7do9mdjt/LqUzhbSKBPi
kpVWBkLHzl0obPR1F/BBfC2oRW7Us5ShjaLod9J1DcJps/GTr7MXir8lEnPxwypJ
5t4F8Y4M34MpxmVZ/k2oNsEGhugpicaTAqa5KO4vqtWDPk1TNHi2POxU1Fjnth5K
ds/NxoRvXV/2K5V+XiJQnngt5pgRjqU5DgCh19Bq1W7PqqbGkVWmzIa+zfYm9sCH
+RuZiyjM16RyN/tDAxhfKowBqsagW6/DM7LJe3fWJO7yCem/S5g=
=a3c1
-----END PGP SIGNATURE-----
Merge tag 'random-6.11-rc6-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator fix from Jason Donenfeld:
"Reject invalid flags passed to vgetrandom() in the same way that
getrandom() does, so that the behavior is the same, from Yann.
The flags argument to getrandom() only has a behavioral effect on the
function if the RNG isn't initialized yet, so vgetrandom() falls back
to the syscall in that case. But if the RNG is initialized, all of the
flags behave the same way, so vgetrandom() didn't bother checking
them, and just ignored them entirely.
But that doesn't account for invalid flags passed in, which need to be
rejected so we can use them later"
* tag 'random-6.11-rc6-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: vDSO: reject unknown getrandom() flags
This adds GENMASK_U128() tests although currently only 64 bit wide masks
are being tested.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Add a test that will create cache, allocate one object, kfree_rcu() it
and attempt to destroy it. As long as the usage of kvfree_rcu_barrier()
in kmem_cache_destroy() works correctly, there should be no warnings in
dmesg and the test should pass.
Additionally add a test_leak_destroy() test that leaks an object on
purpose and verifies that kmem_cache_destroy() catches it.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
kunit_driver_create() accepts a name for the driver, but does not copy
it, so if that name is either on the stack, or otherwise freed, we end
up with a use-after-free when the driver is cleaned up.
Instead, strdup() the name, and manage it as another KUnit allocation.
As there was no existing kunit_kstrdup(), we add one. Further, add a
kunit_ variant of strdup_const() and kfree_const(), so we don't need to
allocate and manage the string in the majority of cases where it's a
constant.
However, these are inline functions, and is_kernel_rodata() only works
for built-in code. This causes problems in two cases:
- If kunit is built as a module, __{start,end}_rodata is not defined.
- If a kunit test using these functions is built as a module, it will
suffer the same fate.
This fixes a KASAN splat with overflow.overflow_allocation_test, when
built as a module.
Restrict the is_kernel_rodata() case to when KUnit is built as a module,
which fixes the first case, at the cost of losing the optimisation.
Also, make kunit_{kstrdup,kfree}_const non-inline, so that other modules
using them will not accidentally depend on is_kernel_rodata(). If KUnit
is built-in, they'll benefit from the optimisation, if KUnit is not,
they won't, but the string will be properly duplicated.
Fixes: d03c720e03 ("kunit: Add APIs for managing devices")
Reported-by: Nico Pache <npache@redhat.com>
Closes: https://groups.google.com/g/kunit-dev/c/81V9b9QYON0
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Tested-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Like the getrandom() syscall, vDSO getrandom() must also reject unknown
flags. [1]
It would be possible to return -EINVAL from vDSO itself, but in the
possible case that a new flag is added to getrandom() syscall in the
future, it would be easier to get the behavior from the syscall, instead
of erroring until the vDSO is extended to support the new flag or
explicitly falling back.
[1] Designing the API: Planning for Extension
https://docs.kernel.org/process/adding-syscalls.html#designing-the-api-planning-for-extension
Signed-off-by: Yann Droneaud <yann@droneaud.fr>
[Jason: reworded commit message]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
When soft interrupt actions are called, they are passed a pointer to the
struct softirq action which contains the action's function pointer.
This pointer isn't useful, as the action callback already knows what
function it is. And since each callback handles a specific soft interrupt,
the callback also knows which soft interrupt number is running.
No soft interrupt action callback actually uses this parameter, so remove
it from the function pointer signature. This clarifies that soft interrupt
actions are global routines and makes it slightly cheaper to call them.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/all/20240815171549.3260003-1-csander@purestorage.com
The Spectre-v1 mitigations made "access_ok()" much more expensive, since
it has to serialize execution with the test for a valid user address.
All the normal user copy routines avoid this by just masking the user
address with a data-dependent mask instead, but the fast
"unsafe_user_read()" kind of patterms that were supposed to be a fast
case got slowed down.
This introduces a notion of using
src = masked_user_access_begin(src);
to do the user address sanity using a data-dependent mask instead of the
more traditional conditional
if (user_read_access_begin(src, len)) {
model.
This model only works for dense accesses that start at 'src' and on
architectures that have a guard region that is guaranteed to fault in
between the user space and the kernel space area.
With this, the user access doesn't need to be manually checked, because
a bad address is guaranteed to fault (by some architecture masking
trick: on x86-64 this involves just turning an invalid user address into
all ones, since we don't map the top of address space).
This only converts a couple of examples for now. Example x86-64 code
generation for loading two words from user space:
stac
mov %rax,%rcx
sar $0x3f,%rcx
or %rax,%rcx
mov (%rcx),%r13
mov 0x8(%rcx),%r14
clac
where all the error handling and -EFAULT is now purely handled out of
line by the exception path.
Of course, if the micro-architecture does badly at 'clac' and 'stac',
the above is still pitifully slow. But at least we did as well as we
could.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- New on disk format version, bcachefs_metadata_version_disk_accounting_inum
This adds one more disk accounting counter, which counts disk usage and
number of extents per inode number. This lets us track fragmentation,
for implementing defragmentation later, and it also counts disk usage
per inode in all snapshots, which will be a useful thing to expose to
users.
- One performance issue we've observed is threads spinning when they
should be waiting for dirty keys in the key cache to be flushed by
journal reclaim, so we now have hysteresis for the waiting thread, as
well as improving the tracepoint and a new time_stat, for tracking time
blocked waiting on key cache flushing.
And, various assorted smaller fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAma/9QkACgkQE6szbY3K
bnYcBw/+LBSZ415gWSjPktdecf5rc6K4KxETxAxV0f0KesYzxqAtQzN0SCDvKt65
3aALU03wM8vWITiLS38/ckT+j6S2BpXcOxdu/OC0nRYQEUg9ZLvqEG5lQ3a/LliV
Q64N33qsSr6QaKszFllLYcN4tGduKg8HoMlHn6+vJ7HNPjdfv0HHERSUsc7K84/w
jkRtDE2NxsRJZKMEvIFp8hd5KXUR5zyBz/kc4P0WliLXpSyJLITzhKw1JV7ikKVD
0mO2bJ/0i7wPIabAD2HJahvbC7fl+2fkYFxUJ2XnvMTgU/+QyeGHEufbcbVrVSp0
BpzBTmSMFbGXBkbQBruFX5rJetzXeBqdYf0Yfavd4KDhGvYlSfDZQUapXT1QKC2q
aHSB/s+2r7Crr/MBJyjbeFgXFTNGvI5yerlbdp2yj1kxjYJHHaKrp6h7n6XXk21W
/mGF5tkIMkFTv98rQnIaky4neJzOPsLTTgxeR8zEudCgMaVUqEcaMdIFvARDjY/3
n52VR0zl3olV3vu7LgHaHfgH6lfaMV0sHPaGNYGL0YL+bCJD+lYM8a6l9aaks8vk
md7+mFcOS4FUdDdS8MEKIN/k/gkEOC/EpmI864i9rIl0SiNXNy7FPTDKON8b+Ury
5omBMUQMEe9Q/pgKGXfpJWFynhSPEVf4y1DIOsrXk/jeBqenFyo=
=BPGT
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-08-16' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent OverstreetL
- New on disk format version, bcachefs_metadata_version_disk_accounting_inum
This adds one more disk accounting counter, which counts disk usage
and number of extents per inode number. This lets us track
fragmentation, for implementing defragmentation later, and it also
counts disk usage per inode in all snapshots, which will be a useful
thing to expose to users.
- One performance issue we've observed is threads spinning when they
should be waiting for dirty keys in the key cache to be flushed by
journal reclaim, so we now have hysteresis for the waiting thread, as
well as improving the tracepoint and a new time_stat, for tracking
time blocked waiting on key cache flushing.
... and various assorted smaller fixes.
* tag 'bcachefs-2024-08-16' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix locking in __bch2_trans_mark_dev_sb()
bcachefs: fix incorrect i_state usage
bcachefs: avoid overflowing LRU_TIME_BITS for cached data lru
bcachefs: Fix forgetting to pass trans to fsck_err()
bcachefs: Increase size of cuckoo hash table on too many rehashes
bcachefs: bcachefs_metadata_version_disk_accounting_inum
bcachefs: Kill __bch2_accounting_mem_mod()
bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()
bcachefs: Fix warning in __bch2_fsck_err() for trans not passed in
bcachefs: Add a time_stat for blocked on key cache flush
bcachefs: Improve trans_blocked_journal_reclaim tracepoint
bcachefs: Add hysteresis to waiting on btree key cache flush
lib/generic-radix-tree.c: Fix rare race in __genradix_ptr_alloc()
bcachefs: Convert for_each_btree_node() to lockrestart_do()
bcachefs: Add missing downgrade table entry
bcachefs: disk accounting: ignore unknown types
bcachefs: bch2_accounting_invalid() fixup
bcachefs: Fix bch2_trigger_alloc when upgrading from old versions
bcachefs: delete faulty fastpath in bch2_btree_path_traverse_cached()
The remaining functions added by commit
a8ea8bdd9d did not check for memory
allocation errors. Add the checks and change the API to allow errors
to be returned.
Fixes: a8ea8bdd9d ("lib/mpi: Extend the MPI library")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This partially reverts commit a8ea8bdd9d.
Most of it is no longer needed since sm2 has been removed. However,
the following functions have been kept as they have developed other
uses:
mpi_copy
mpi_mod
mpi_test_bit
mpi_set_bit
mpi_rshift
mpi_add
mpi_sub
mpi_addm
mpi_subm
mpi_mul
mpi_mulm
mpi_tdiv_r
mpi_fdiv_r
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When @size is 0, the desired behavior is to allow unlimited bytes to be
parsed. Currently, this relies on some intentional arithmetic overflow
where --size gives us SIZE_MAX when size is 0.
Explicitly spell out the desired behavior without relying on intentional
overflow/underflow.
Signed-off-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240808-b4-string_helpers_caa133-v1-1-686a455167c4@google.com
Signed-off-by: Kees Cook <kees@kernel.org>
After building with CONFIG_FORTIFY_SOURCE=y, many .*.d files are left
in lib/test_fortify/ because the compiler outputs header dependencies
into *.d without fixdep being invoked.
When compiling C files, if_changed_dep should be used so that the
auto-generated header dependencies are recorded in .*.cmd files.
Currently, if_changed is incorrectly used, and only two headers are
hard-coded in lib/Makefile.
In the previous patch version, the kbuild test robot detected new errors
on GCC 7.
GCC 7 or older does not produce test.d with the following test code:
$ echo 'void b(void) __attribute__((__error__(""))); void a(void) { b(); }' |
gcc -Wp,-MMD,test.d -c -o /dev/null -x c -
Perhaps, this was a bug that existed in older GCC versions.
Skip the tests for GCC<=7 for now, as this will be eventually solved
when we bump the minimal supported GCC version.
Link: https://lore.kernel.org/oe-kbuild-all/CAK7LNARmJcyyzL-jVJfBPi3W684LTDmuhMf1koF0TXoCpKTmcw@mail.gmail.com/T/#m13771bf78ae21adff22efc4d310c973fb4bcaf67
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20240727150302.1823750-4-masahiroy@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
There are some issues in the test_fortify Makefile code.
Problem 1: cc-disable-warning invokes compiler dozens of times
To see how many times the cc-disable-warning is evaluated, change
this code:
$(call cc-disable-warning,fortify-source)
to:
$(call cc-disable-warning,$(shell touch /tmp/fortify-$$$$)fortify-source)
Then, build the kernel with CONFIG_FORTIFY_SOURCE=y. You will see a
large number of '/tmp/fortify-<PID>' files created:
$ ls -1 /tmp/fortify-* | wc
80 80 1600
This means the compiler was invoked 80 times just for checking the
-Wno-fortify-source flag support.
$(call cc-disable-warning,fortify-source) should be added to a simple
variable instead of a recursive variable.
Problem 2: do not recompile string.o when the test code is updated
The test cases are independent of the kernel. However, when the test
code is updated, $(obj)/string.o is rebuilt and vmlinux is relinked
due to this dependency:
$(obj)/string.o: $(obj)/$(TEST_FORTIFY_LOG)
always-y is suitable for building the log files.
Problem 3: redundant code
clean-files += $(addsuffix .o, $(TEST_FORTIFY_LOGS))
... is unneeded because the top Makefile globally cleans *.o files.
This commit fixes these issues and makes the code readable.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20240727150302.1823750-2-masahiroy@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
The 'device_name' array doesn't exist out of the
'overflow_allocation_test' function scope. However, it is being used as
a driver name when calling 'kunit_driver_create' from
'kunit_device_register'. It produces the kernel panic with KASAN
enabled.
Since this variable is used in one place only, remove it and pass the
device name into kunit_device_register directly as an ascii string.
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Reviewed-by: David Gow <davidgow@google.com>
Link: https://lore.kernel.org/r/20240815000431.401869-1-ivan.orlov0322@gmail.com
Signed-off-by: Kees Cook <kees@kernel.org>
If a CSD-lock stall goes on long enough, it will cause an RCU CPU
stall warning. This additional warning provides much additional
console-log traffic and little additional information. Therefore,
provide a new csd_lock_is_stuck() function that returns true if there
is an ongoing CSD-lock stall. This function will be used by the RCU
CPU stall warnings to provide a one-line indication of the stall when
this function returns true.
[ neeraj.upadhyay: Apply Rik van Riel feedback. ]
[ neeraj.upadhyay: Apply kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leonardo Bras <leobras@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
If we need to increase the tree depth, allocate a new node, and then
race with another thread that increased the tree depth before us, we'll
still have a preallocated node that might be used later.
If we then use that node for a new non-root node, it'll still have a
pointer to the old root instead of being zeroed - fix this by zeroing it
in the cmpxchg failure path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a boot self test that can catch sprious coverage from interrupts.
The coverage callback filters out interrupt code, but only after the
handler updates preempt count. Some code periodically leaks out
of that section and leads to spurious coverage.
Add a best-effort (but simple) test that is likely to catch such bugs.
If the test is enabled on CI systems that use KCOV, they should catch
any issues fast.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/all/7662127c97e29da1a748ad1c1539dd7b65b737b2.1718092070.git.dvyukov@google.com
Currently ARM64 extracts which specific sanitizer has caused a trap via
encoded data in the trap instruction. Clang on x86 currently encodes the
same data in the UD1 instruction but x86 handle_bug() and
is_valid_bugaddr() currently only look at UD2.
Bring x86 to parity with ARM64, similar to commit 25b84002af ("arm64:
Support Clang UBSAN trap codes for better reporting"). See the llvm
links for information about the code generation.
Enable the reporting of UBSAN sanitizer details on x86 compiled with clang
when CONFIG_UBSAN_TRAP=y by analysing UD1 and retrieving the type immediate
which is encoded by the compiler after the UD1.
[ tglx: Simplified it by moving the printk() into handle_bug() ]
Signed-off-by: Gatlin Newhouse <gatlin.newhouse@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20240724000206.451425-1-gatlin.newhouse@gmail.com
Link: https://github.com/llvm/llvm-project/commit/c5978f42ec8e9#diff-bb68d7cd885f41cfc35843998b0f9f534adb60b415f647109e597ce448e92d9f
Link: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86InstrSystem.td#L27
This patch implements a union-find data structure in the kernel library,
which includes operations for allocating nodes, freeing nodes,
finding the root of a node, and merging two nodes.
Signed-off-by: Xavier <xavier_qy@163.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Introduce KUnit resource wrappers around platform_driver_register(),
platform_device_alloc(), and platform_device_add() so that test authors
can register platform drivers/devices from their tests and have the
drivers/devices automatically be unregistered when the test is done.
This makes test setup code simpler when a platform driver or platform
device is needed. Add a few test cases at the same time to make sure the
APIs work as intended.
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Reviewed-by: David Gow <davidgow@google.com>
Cc: Rae Moar <rmoar@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lore.kernel.org/r/20240718210513.3801024-6-sboyd@kernel.org
We only had a couple of array[] declarations, and changing them to just
use 'MAX()' instead of 'max()' fixes the issue.
This will allow us to simplify our min/max macros enormously, since they
can now unconditionally use temporary variables to avoid using the
argument values multiple times.
Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This just standardizes the use of MIN() and MAX() macros, with the very
traditional semantics. The goal is to use these for C constant
expressions and for top-level / static initializers, and so be able to
simplify the min()/max() macros.
These macro names were used by various kernel code - they are very
traditional, after all - and all such users have been fixed up, with a
few different approaches:
- trivial duplicated macro definitions have been removed
Note that 'trivial' here means that it's obviously kernel code that
already included all the major kernel headers, and thus gets the new
generic MIN/MAX macros automatically.
- non-trivial duplicated macro definitions are guarded with #ifndef
This is the "yes, they define their own versions, but no, the include
situation is not entirely obvious, and maybe they don't get the
generic version automatically" case.
- strange use case #1
A couple of drivers decided that the way they want to describe their
versioning is with
#define MAJ 1
#define MIN 2
#define DRV_VERSION __stringify(MAJ) "." __stringify(MIN)
which adds zero value and I just did my Alexander the Great
impersonation, and rewrote that pointless Gordian knot as
#define DRV_VERSION "1.2"
instead.
- strange use case #2
A couple of drivers thought that it's a good idea to have a random
'MIN' or 'MAX' define for a value or index into a table, rather than
the traditional macro that takes arguments.
These values were re-written as C enum's instead. The new
function-line macros only expand when followed by an open
parenthesis, and thus don't clash with enum use.
Happily, there weren't really all that many of these cases, and a lot of
users already had the pattern of using '#ifndef' guarding (or in one
case just using '#undef MIN') before defining their own private version
that does the same thing. I left such cases alone.
Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The highlight is the establishment of a minimum version for the Rust
toolchain, including 'rustc' (and bundled tools) and 'bindgen'.
The initial minimum will be the pinned version we currently have, i.e.
we are just widening the allowed versions. That covers 3 stable Rust
releases: 1.78.0, 1.79.0, 1.80.0 (getting released tomorrow), plus beta,
plus nightly.
This should already be enough for kernel developers in distributions
that provide recent Rust compiler versions routinely, such as Arch
Linux, Debian Unstable (outside the freeze period), Fedora Linux,
Gentoo Linux (especially the testing channel), Nix (unstable) and
openSUSE Slowroll and Tumbleweed.
In addition, the kernel is now being built-tested by Rust's pre-merge
CI. That is, every change that is attempting to land into the Rust
compiler is tested against the kernel, and it is merged only if it
passes. Similarly, the bindgen tool has agreed to build the kernel in
their CI too.
Thus, with the pre-merge CI in place, both projects hope to avoid
unintentional changes to Rust that break the kernel. This means that,
in general, apart from intentional changes on their side (that we
will need to workaround conditionally on our side), the upcoming Rust
compiler versions should generally work.
In addition, the Rust project has proposed getting the kernel into
stable Rust (at least solving the main blockers) as one of its three
flagship goals for 2024H2 [1].
I would like to thank Niko, Sid, Emilio et al. for their help promoting
the collaboration between Rust and the kernel.
[1] https://rust-lang.github.io/rust-project-goals/2024h2/index.html#flagship-goals
Toolchain and infrastructure:
- Support several Rust toolchain versions.
- Support several bindgen versions.
- Remove 'cargo' requirement and simplify 'rusttest', thanks to 'alloc'
having been dropped last cycle.
- Provide proper error reporting for the 'rust-analyzer' target.
'kernel' crate:
- Add 'uaccess' module with a safe userspace pointers abstraction.
- Add 'page' module with a 'struct page' abstraction.
- Support more complex generics in workqueue's 'impl_has_work!' macro.
'macros' crate:
- Add 'firmware' field support to the 'module!' macro.
- Improve 'module!' macro documentation.
Documentation:
- Provide instructions on what packages should be installed to build
the kernel in some popular Linux distributions.
- Introduce the new kernel.org LLVM+Rust toolchains.
- Explain '#[no_std]'.
And a few other small bits.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmahqRUACgkQGXyLc2ht
IW0xbA/6A26b14LjvmFBJU6LZb0ey1BCbK9cOWtd6K6f/uWp108WAIdA/+gHgOGU
I6rW8nXk3af078lHRqv0ihMDUks/1mz5wyxEXoZ/mVvRJbzH9TsHN7cSP2fr4H14
8rES4esr2XBlu9OdgDFb/o7jequ7PE0+WQDapV6eAhWQlBC6AI+ShyX26pWcB5gv
8O4mE59Up51d21L8apVh+pnEgBsCsu7c68pUMbrk2k4sHVvnRti4iLoVlemf4X80
Di9hyi8iN/MvWMdfq+hCIufUIbcWde07HcCbLjQlkJv0sc20V+UIGUx4EOUasOTY
ugUyzhlFNGPxJYayAZAb8KJtQZhSbGZ+R244Z/CoV2RMlEw9LxSCpyzHr1nalOLT
01gqZh6+gIFyPm6F0ORsetcV6yzdvUcGTjx1vuEJ9qqeKG/gc/VqFOcmCPaT7y8K
nTOMg6zY3mzaqTn1iBebid7INzXJN7ha9dk1TkDv47BNZAic51d3L0hQFXuDrEuu
MxVIPTAPKJSaQTCh0jrLxLJ649v/98OP0urYqlVeKuTeovupETxCsBTVtjjjsv+w
ZomqEO+JWuf7hjG0RLuCwi/IvWpUFpEdOal4qfHbKLOAOn7zxV/WrG675HcRKbw5
Zkr/0Q44fwbZWd2b/svTO1qOKaYV7oL0utVOdUb2KX05K71NNVo=
=8PYF
-----END PGP SIGNATURE-----
Merge tag 'rust-6.11' of https://github.com/Rust-for-Linux/linux
Pull Rust updates from Miguel Ojeda:
"The highlight is the establishment of a minimum version for the Rust
toolchain, including 'rustc' (and bundled tools) and 'bindgen'.
The initial minimum will be the pinned version we currently have, i.e.
we are just widening the allowed versions. That covers three stable
Rust releases: 1.78.0, 1.79.0, 1.80.0 (getting released tomorrow),
plus beta, plus nightly.
This should already be enough for kernel developers in distributions
that provide recent Rust compiler versions routinely, such as Arch
Linux, Debian Unstable (outside the freeze period), Fedora Linux,
Gentoo Linux (especially the testing channel), Nix (unstable) and
openSUSE Slowroll and Tumbleweed.
In addition, the kernel is now being built-tested by Rust's pre-merge
CI. That is, every change that is attempting to land into the Rust
compiler is tested against the kernel, and it is merged only if it
passes. Similarly, the bindgen tool has agreed to build the kernel in
their CI too.
Thus, with the pre-merge CI in place, both projects hope to avoid
unintentional changes to Rust that break the kernel. This means that,
in general, apart from intentional changes on their side (that we will
need to workaround conditionally on our side), the upcoming Rust
compiler versions should generally work.
In addition, the Rust project has proposed getting the kernel into
stable Rust (at least solving the main blockers) as one of its three
flagship goals for 2024H2 [1].
I would like to thank Niko, Sid, Emilio et al. for their help
promoting the collaboration between Rust and the kernel.
Toolchain and infrastructure:
- Support several Rust toolchain versions.
- Support several bindgen versions.
- Remove 'cargo' requirement and simplify 'rusttest', thanks to
'alloc' having been dropped last cycle.
- Provide proper error reporting for the 'rust-analyzer' target.
'kernel' crate:
- Add 'uaccess' module with a safe userspace pointers abstraction.
- Add 'page' module with a 'struct page' abstraction.
- Support more complex generics in workqueue's 'impl_has_work!'
macro.
'macros' crate:
- Add 'firmware' field support to the 'module!' macro.
- Improve 'module!' macro documentation.
Documentation:
- Provide instructions on what packages should be installed to build
the kernel in some popular Linux distributions.
- Introduce the new kernel.org LLVM+Rust toolchains.
- Explain '#[no_std]'.
And a few other small bits"
Link: https://rust-lang.github.io/rust-project-goals/2024h2/index.html#flagship-goals [1]
* tag 'rust-6.11' of https://github.com/Rust-for-Linux/linux: (26 commits)
docs: rust: quick-start: add section on Linux distributions
rust: warn about `bindgen` versions 0.66.0 and 0.66.1
rust: start supporting several `bindgen` versions
rust: work around `bindgen` 0.69.0 issue
rust: avoid assuming a particular `bindgen` build
rust: start supporting several compiler versions
rust: simplify Clippy warning flags set
rust: relax most deny-level lints to warnings
rust: allow `dead_code` for never constructed bindings
rust: init: simplify from `map_err` to `inspect_err`
rust: macros: indent list item in `paste!`'s docs
rust: add abstraction for `struct page`
rust: uaccess: add typed accessors for userspace pointers
uaccess: always export _copy_[from|to]_user with CONFIG_RUST
rust: uaccess: add userspace pointers
kbuild: rust-analyzer: improve comment documentation
kbuild: rust-analyzer: better error handling
docs: rust: no_std is used
rust: alloc: add __GFP_HIGHMEM flag
rust: alloc: fix typo in docs for GFP_NOWAIT
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZqQWWQAKCRDdBJ7gKXxA
jqJVAP9vU9HNzIyKDOOqoNHKMI+VzGn39w1FihWjG6AU5a+9NQD+MZJwr7bBwkpH
ii43HLUGvNRQtsldBZSRypsaitCSwAI=
=HGce
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-07-26-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"11 hotfixes, 7 of which are cc:stable. 7 are MM, 4 are other"
* tag 'mm-hotfixes-stable-2024-07-26-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nilfs2: handle inconsistent state in nilfs_btnode_create_block()
selftests/mm: skip test for non-LPA2 and non-LVA systems
mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()
mm: memcg: add cacheline padding after lruvec in mem_cgroup_per_node
alloc_tag: outline and export free_reserved_page()
decompress_bunzip2: fix rare decompression failure
mm/huge_memory: avoid PMD-size page cache if needed
mm: huge_memory: use !CONFIG_64BIT to relax huge page alignment on 32 bit machines
mm: fix old/young bit handling in the faulting path
dt-bindings: arm: update James Clark's email address
MAINTAINERS: mailmap: update James Clark's email address
The decompression code parses a huffman tree and counts the number of
symbols for a given bit length. In rare cases, there may be >= 256
symbols with a given bit length, causing the unsigned char to overflow.
This causes a decompression failure later when the code tries and fails to
find the bit length for a given symbol.
Since the maximum number of symbols is 258, use unsigned short instead.
Link: https://lkml.kernel.org/r/20240717162016.1514077-1-ross.lagerwall@citrix.com
Fixes: bc22c17e12 ("bzip2/lzma: library support for gzip, bzip2 and lzma decompression")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Random fixes for v6.11.
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmahKbIACgkQsUSA/Tof
vsh8zQwAvguyeNubDFqdMe3E/Vp1J3WqXsBFzbE1rGLCyI2S0cgJFL5BlW51zY47
70wLt9EmroEobwj1qHSQlzejNp31kSBQ1Sqq25oivfJqEF1elDT5PQxYqBbU1C9Y
kVWnxtb+oKaoFd5jiBK8+iTl8dXjT6H2RoV0zpPab/JPcqsjwFfkUvtENt/Kpo5c
aRrGTFwshdp5eT4sEZQv57VKroBcwZOvv2//qrklFHrJHl4pjMT8eaX3twcQysoy
umTVt+TK6NErLnht+VRQJ2/L02FKi7b+bHePVgNzaT+1FSDMT4FltmZd96Xwbzah
hSkwWtqy0N2gaTcqie9nwdEiCJGjF39M7k2wangUS91CeDsbIUSsJgDCESUCm+zK
hRqleGOnoeg4+jZBci7M53lKa/pADlmLhnU8iAc3BSKozsaioltkT+hHn8vAkstk
h/kHlbfkzasufUWAhduBpIn384gWWEY6RACffgCsOuvbT+ZyDKUJpKYaEwVx+Pri
l72j0hs9
=RbET
-----END PGP SIGNATURE-----
Merge tag 'bitmap-6.11-rc1' of https://github.com:/norov/linux
Pull bitmap updates from Yury Norov:
"Random fixes"
* tag 'bitmap-6.11-rc1' of https://github.com:/norov/linux:
riscv: Remove unnecessary int cast in variable_fls()
radix tree test suite: put definition of bitmap_clear() into lib/bitmap.c
bitops: Add a comment explaining the double underscore macros
lib: bitmap: add missing MODULE_DESCRIPTION() macros
cpumask: introduce assign_cpu() macro
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmaiUQcACgkQUqAMR0iA
lPITNQ/+KDdmQljKwpuXlqe01F1mG/LFn5i1Y/a8fZVep/OSmihsgnEqnYzBomTq
CF22tlrH7r6EZ2D5by1fjno/AG6/BAXvwH8jGDr9jhVNNBsneeVYrtMB1UUslR/e
OEFoFKyzpq6VJNmHl5aAM95CEFEkE5uBba4DkJ/hCh3oErc2zP5DRdD9COCkdlIp
+LzQa6XsMjwzrWAMAm2vWdBgePCHVKsAVVFUdfmN28FQw3BcFZEgIvN7vPT7Ee3I
ESKx/Asb3myb1J7bFvDKnpT9O+7EkU/cpQn+HxjiIFVPqFLX3mfSXzgvfNocuPB0
hkIUzA9Sbu2wa+SE7qU1IwHVZj2N612OPso8lG8cbcic/KaYLpd6Kt7bnJe8kBe7
gFGBVmDjvapilQwWteWJcMs2hBxXuq0Xd+CMcXTMKYcLS8Z+4TVAYA8onssrUq+0
Jye8hW/CvST/P7wazVtuQu1fsKKz3SG+dAaXw9/7fYTGQ2LdRoLUkDhLkwqIURjD
j6+pMMuYpxrpeA7yaOD1xLLOKC33OINClgfodjGVHvDlwxsQBhZFchCKsbufYEa1
CxFi8lDcfNVSGuw5x3a6iMqwnqxoeVKpi6eKKgpU81fXnGd4G2NA5jGRMycWmqzX
+uUq6Ot1NO4QVK+pT70GhXMt2rQ6UsC1SyWggeFYBhaaqfIGKRk=
=89Nl
-----END PGP SIGNATURE-----
Merge tag 'printk-for-6.11-trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- trivial printk changes
The bigger "real" printk work is still being discussed.
* tag 'printk-for-6.11-trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
vsprintf: add missing MODULE_DESCRIPTION() macro
printk: Rename console_replay_all() and update context
Here is the big set of driver core changes for 6.11-rc1.
Lots of stuff in here, with not a huge diffstat, but apis are evolving
which required lots of files to be touched. Highlights of the changes
in here are:
- platform remove callback api final fixups (Uwe took many releases to
get here, finally!)
- Rust bindings for basic firmware apis and initial driver-core
interactions. It's not all that useful for a "write a whole driver
in rust" type of thing, but the firmware bindings do help out the
phy rust drivers, and the driver core bindings give a solid base on
which others can start their work. There is still a long way to go
here before we have a multitude of rust drivers being added, but
it's a great first step.
- driver core const api changes. This reached across all bus types,
and there are some fix-ups for some not-common bus types that
linux-next and 0-day testing shook out. This work is being done to
help make the rust bindings more safe, as well as the C code, moving
toward the end-goal of allowing us to put driver structures into
read-only memory. We aren't there yet, but are getting closer.
- minor devres cleanups and fixes found by code inspection
- arch_topology minor changes
- other minor driver core cleanups
All of these have been in linux-next for a very long time with no
reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZqH+aQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymoOQCfVBdLcBjEDAGh3L8qHRGMPy4rV2EAoL/r+zKm
cJEYtJpGtWX6aAtugm9E
=ZyJV
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big set of driver core changes for 6.11-rc1.
Lots of stuff in here, with not a huge diffstat, but apis are evolving
which required lots of files to be touched. Highlights of the changes
in here are:
- platform remove callback api final fixups (Uwe took many releases
to get here, finally!)
- Rust bindings for basic firmware apis and initial driver-core
interactions.
It's not all that useful for a "write a whole driver in rust" type
of thing, but the firmware bindings do help out the phy rust
drivers, and the driver core bindings give a solid base on which
others can start their work.
There is still a long way to go here before we have a multitude of
rust drivers being added, but it's a great first step.
- driver core const api changes.
This reached across all bus types, and there are some fix-ups for
some not-common bus types that linux-next and 0-day testing shook
out.
This work is being done to help make the rust bindings more safe,
as well as the C code, moving toward the end-goal of allowing us to
put driver structures into read-only memory. We aren't there yet,
but are getting closer.
- minor devres cleanups and fixes found by code inspection
- arch_topology minor changes
- other minor driver core cleanups
All of these have been in linux-next for a very long time with no
reported problems"
* tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
ARM: sa1100: make match function take a const pointer
sysfs/cpu: Make crash_hotplug attribute world-readable
dio: Have dio_bus_match() callback take a const *
zorro: make match function take a const pointer
driver core: module: make module_[add|remove]_driver take a const *
driver core: make driver_find_device() take a const *
driver core: make driver_[create|remove]_file take a const *
firmware_loader: fix soundness issue in `request_internal`
firmware_loader: annotate doctests as `no_run`
devres: Correct code style for functions that return a pointer type
devres: Initialize an uninitialized struct member
devres: Fix memory leakage caused by driver API devm_free_percpu()
devres: Fix devm_krealloc() wasting memory
driver core: platform: Switch to use kmemdup_array()
driver core: have match() callback in struct bus_type take a const *
MAINTAINERS: add Rust device abstractions to DRIVER CORE
device: rust: improve safety comments
MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer
MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER
firmware: rust: improve safety comments
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmaarzgACgkQSfxwEqXe
A66ZWBAAlhXx8bve0uKlDRK8fffWHgruho/fOY4lZJ137AKwA9JCtmOyqdfL4Dmk
VxFe7pEQJlQhcA/6kH54uO7SBXwfKlKZJth6SYnaCRMUIbFifHjjIQ0QqldjEKi0
rP90Hu4FVsbwQC7u9i9lQj9n2P36zb6pn83BzpZQ/2PtoVCSCrdSJUe0Rxa3H3GN
0+nNkDSXQt5otCByLaeE3x7KJgXLWL9+G2eFSFLTZ8rSVfMx1CdOIAG37WlLGdWm
BaFYPDKMyBTVvVJBNgAe9YSqtrsZ5nlmLz+Z9wAe/hTL7RlL03kWUu34/Udcpull
zzMDH0WMntiGK3eFQ2gOYSWqypvAjwHgn3BzqNmjUb69+89mZsdU1slcvnxWsUwU
D3vphrscaqarF629tfsXti3jc5PoXwUTjROZVcCyeFPBhyAZgzK8xUvPpJO+RT+K
EuUABob9cpA6FCpW/QeolDmMDhXlNT8QgsZu1juokZac2xP3Ly3REyEvT7HLbU2W
ZJjbEqm1ppp3RmGELUOJbyhwsLrnbt+OMDO7iEWoG8aSFK4diBK/ZM6WvLMkr8Oi
7ioXGIsYkCy3c47wpZKTrAapOPJp5keqNAiHSEbXw8mozp6429QAEZxNOcczgHKC
Ea2JzRkctqutcIT+Slw/uUe//i1iSsIHXbE81fp5udcQTJcUByo=
=P8aI
-----END PGP SIGNATURE-----
Merge tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"This adds getrandom() support to the vDSO.
First, it adds a new kind of mapping to mmap(2), MAP_DROPPABLE, which
lets the kernel zero out pages anytime under memory pressure, which
enables allocating memory that never gets swapped to disk but also
doesn't count as being mlocked.
Then, the vDSO implementation of getrandom() is introduced in a
generic manner and hooked into random.c.
Next, this is implemented on x86. (Also, though it's not ready for
this pull, somebody has begun an arm64 implementation already)
Finally, two vDSO selftests are added.
There are also two housekeeping cleanup commits"
* tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
MAINTAINERS: add random.h headers to RNG subsection
random: note that RNDGETPOOL was removed in 2.6.9-rc2
selftests/vDSO: add tests for vgetrandom
x86: vdso: Wire up getrandom() vDSO implementation
random: introduce generic vDSO getrandom() implementation
mm: add MAP_DROPPABLE for designating always lazily freeable mappings
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmaeZBIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpqI7D/9XPinZuuwiZ/670P8yjk1SHFzqzdwtuFuP
+Dq2lcoYRkuwm5PvCvhs3QH2mnjS1vo1SIoAijGEy3V1bs41mw87T2knKMIn4g5v
I5A4gC6i0IqxIkFm17Zx9yG+MivoOmPtqM4RMxze2xS/uJwWcvg4tjBHZfylY3d9
oaIXyZj+0dTRf955K2x/5dpfE6qjtDG0bqrrJXnzaIKHBJk2HKezYFbTstAA4OY+
MvMqRL7uJmJBd7384/WColIO0b8/UEchPl7qG+zy9pg+wzQGLFyF/Z/KdjrWdDMD
IFs92uNDFQmiGoyujJmXdDV9xpKi94nqDAtUR+Qct0Mui5zz0w2RNcGvyTDjBMpv
CAzTkTW48moYkwLPhPmy8Ge69elT82AC/9ZQAHbA7g3TYgJML5IT/7TtiaVe6Rc1
podnTR3/e9XmZnc25aUZeAr6CG7b+0NBvB+XPO9lNyMEE38sfwShoPdAGdKX25oA
mjnLHBc9grVOQzRGEx22E11k+1ChXf/o9H546PB2Pr9yvf/DQ3868a+QhHssxufL
Xul1K5a+pUmOnaTLD3ESftYlFmcDOHQ6gDK697so7mU7lrD3ctN4HYZ2vwNk35YY
2b4xrABrOEbAXlUo3Ht8F/ecg6qw4xTr9vAW5q4+L2H5+28RaZKYclHhLmR23yfP
xJ/d5FfVFQ==
=fqoV
-----END PGP SIGNATURE-----
Merge tag 'for-6.11/block-20240722' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:
- MD fixes via Song:
- md-cluster fixes (Heming Zhao)
- raid1 fix (Mateusz Jończyk)
- s390/dasd module description (Jeff)
- Series cleaning up and hardening the blk-mq debugfs flag handling
(John, Christoph)
- blk-cgroup cleanup (Xiu)
- Error polled IO attempts if backend doesn't support it (hexue)
- Fix for an sbitmap hang (Yang)
* tag 'for-6.11/block-20240722' of git://git.kernel.dk/linux: (23 commits)
blk-cgroup: move congestion_count to struct blkcg
sbitmap: fix io hung due to race on sbitmap_word::cleared
block: avoid polling configuration errors
block: Catch possible entries missing from rqf_name[]
block: Simplify definition of RQF_NAME()
block: Use enum to define RQF_x bit indexes
block: Catch possible entries missing from cmd_flag_name[]
block: Catch possible entries missing from alloc_policy_name[]
block: Catch possible entries missing from hctx_flag_name[]
block: Catch possible entries missing from hctx_state_name[]
block: Catch possible entries missing from blk_queue_flag_name[]
block: Make QUEUE_FLAG_x as an enum
block: Relocate BLK_MQ_MAX_DEPTH
block: Relocate BLK_MQ_CPU_WORK_BATCH
block: remove QUEUE_FLAG_STOPPED
block: Add missing entry to hctx_flag_name[]
block: Add zone write plugging entry to rqf_name[]
block: Add missing entries from cmd_flag_name[]
s390/dasd: fix error checks in dasd_copy_pair_store()
s390/dasd: add missing MODULE_DESCRIPTION() macros
...
Kuan-Wei Chiu has significantly reworked the min_heap library code and
has taught bcachefs to use the new more generic implementation.
- Yury Norov's series "Cleanup cpumask.h inclusion in core headers"
reworks the cpumask and nodemask headers to make things generally more
rational.
- Kuan-Wei Chiu has sent along some maintenance work against our sorting
library code in the series "lib/sort: Optimizations and cleanups".
- More library maintainance work from Christophe Jaillet in the series
"Remove usage of the deprecated ida_simple_xx() API".
- Ryusuke Konishi continues with the nilfs2 fixes and clanups in the
series "nilfs2: eliminate the call to inode_attach_wb()".
- Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix GDB
command error".
- Plus the usual shower of singleton patches all over the place. Please
see the relevant changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2GvwAKCRDdBJ7gKXxA
jlf/AP48xP5ilIHbtpAKm2z+MvGuTxJQ5VSC0UXFacuCbc93lAEA+Yo+vOVRmh6j
fQF2nVKyKLYfSz7yqmCyAaHWohIYLgg=
=Stxz
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- In the series "treewide: Refactor heap related implementation",
Kuan-Wei Chiu has significantly reworked the min_heap library code
and has taught bcachefs to use the new more generic implementation.
- Yury Norov's series "Cleanup cpumask.h inclusion in core headers"
reworks the cpumask and nodemask headers to make things generally
more rational.
- Kuan-Wei Chiu has sent along some maintenance work against our
sorting library code in the series "lib/sort: Optimizations and
cleanups".
- More library maintainance work from Christophe Jaillet in the series
"Remove usage of the deprecated ida_simple_xx() API".
- Ryusuke Konishi continues with the nilfs2 fixes and clanups in the
series "nilfs2: eliminate the call to inode_attach_wb()".
- Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix
GDB command error".
- Plus the usual shower of singleton patches all over the place. Please
see the relevant changelogs for details.
* tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (98 commits)
ia64: scrub ia64 from poison.h
watchdog/perf: properly initialize the turbo mode timestamp and rearm counter
tsacct: replace strncpy() with strscpy()
lib/bch.c: use swap() to improve code
test_bpf: convert comma to semicolon
init/modpost: conditionally check section mismatch to __meminit*
init: remove unused __MEMINIT* macros
nilfs2: Constify struct kobj_type
nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro
math: rational: add missing MODULE_DESCRIPTION() macro
lib/zlib: add missing MODULE_DESCRIPTION() macro
fs: ufs: add MODULE_DESCRIPTION()
lib/rbtree.c: fix the example typo
ocfs2: add bounds checking to ocfs2_check_dir_entry()
fs: add kernel-doc comments to ocfs2_prepare_orphan_dir()
coredump: simplify zap_process()
selftests/fpu: add missing MODULE_DESCRIPTION() macro
compiler.h: simplify data_race() macro
build-id: require program headers to be right after ELF header
resource: add missing MODULE_DESCRIPTION()
...
walkers") is known to cause a performance regression
(https://lore.kernel.org/all/3acefad9-96e5-4681-8014-827d6be71c7a@linux.ibm.com/T/#mfa809800a7862fb5bdf834c6f71a3a5113eb83ff).
Yu has a fix which I'll send along later via the hotfixes branch.
- In the series "mm: Avoid possible overflows in dirty throttling" Jan
Kara addresses a couple of issues in the writeback throttling code.
These fixes are also targetted at -stable kernels.
- Ryusuke Konishi's series "nilfs2: fix potential issues related to
reserved inodes" does that. This should actually be in the
mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad.
- More folio conversions from Kefeng Wang in the series "mm: convert to
folio_alloc_mpol()"
- Kemeng Shi has sent some cleanups to the writeback code in the series
"Add helper functions to remove repeated code and improve readability of
cgroup writeback"
- Kairui Song has made the swap code a little smaller and a little
faster in the series "mm/swap: clean up and optimize swap cache index".
- In the series "mm/memory: cleanly support zeropage in
vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
Hildenbrand has reworked the rather sketchy handling of the use of the
zeropage in MAP_SHARED mappings. I don't see any runtime effects here -
more a cleanup/understandability/maintainablity thing.
- Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of
higher addresses, for aarch64. The (poorly named) series is
"Restructure va_high_addr_switch".
- The core TLB handling code gets some cleanups and possible slight
optimizations in Bang Li's series "Add update_mmu_tlb_range() to
simplify code".
- Jane Chu has improved the handling of our
fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the
series "Enhance soft hwpoison handling and injection".
- Jeff Johnson has sent a billion patches everywhere to add
MODULE_DESCRIPTION() to everything. Some landed in this pull.
- In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has
simplified migration's use of hardware-offload memory copying.
- Yosry Ahmed performs more folio API conversions in his series "mm:
zswap: trivial folio conversions".
- In the series "large folios swap-in: handle refault cases first",
Chuanhua Han inches us forward in the handling of large pages in the
swap code. This is a cleanup and optimization, working toward the end
objective of full support of large folio swapin/out.
- In the series "mm,swap: cleanup VMA based swap readahead window
calculation", Huang Ying has contributed some cleanups and a possible
fixlet to his VMA based swap readahead code.
- In the series "add mTHP support for anonymous shmem" Baolin Wang has
taught anonymous shmem mappings to use multisize THP. By default this
is a no-op - users must opt in vis sysfs controls. Dramatic
improvements in pagefault latency are realized.
- David Hildenbrand has some cleanups to our remaining use of
page_mapcount() in the series "fs/proc: move page_mapcount() to
fs/proc/internal.h".
- David also has some highmem accounting cleanups in the series
"mm/highmem: don't track highmem pages manually".
- Build-time fixes and cleanups from John Hubbard in the series
"cleanups, fixes, and progress towards avoiding "make headers"".
- Cleanups and consolidation of the core pagemap handling from Barry
Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
and utilize them".
- Lance Yang's series "Reclaim lazyfree THP without splitting" has
reduced the latency of the reclaim of pmd-mapped THPs under fairly
common circumstances. A 10x speedup is seen in a microbenchmark.
It does this by punting to aother CPU but I guess that's a win unless
all CPUs are pegged.
- hugetlb_cgroup cleanups from Xiu Jianfeng in the series
"mm/hugetlb_cgroup: rework on cftypes".
- Miaohe Lin's series "Some cleanups for memory-failure" does just that
thing.
- Is anyone reading this stuff? If so, email me!
- Someone other than SeongJae has developed a DAMON feature in Honggyu
Kim's series "DAMON based tiered memory management for CXL memory".
This adds DAMON features which may be used to help determine the
efficiency of our placement of CXL/PCIe attached DRAM.
- DAMON user API centralization and simplificatio work in SeongJae
Park's series "mm/damon: introduce DAMON parameters online commit
function".
- In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
David Hildenbrand does some maintenance work on zsmalloc - partially
modernizing its use of pageframe fields.
- Kefeng Wang provides more folio conversions in the series "mm: remove
page_maybe_dma_pinned() and page_mkclean()".
- More cleanup from David Hildenbrand, this time in the series
"mm/memory_hotplug: use PageOffline() instead of PageReserved() for
!ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline()
pages" and permits the removal of some virtio-mem hacks.
- Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
__folio_add_anon_rmap()" is a cleanup to the anon folio handling in
preparation for mTHP (multisize THP) swapin.
- Kefeng Wang's series "mm: improve clear and copy user folio"
implements more folio conversions, this time in the area of large folio
userspace copying.
- The series "Docs/mm/damon/maintaier-profile: document a mailing tool
and community meetup series" tells people how to get better involved
with other DAMON developers. From SeongJae Park.
- A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
that.
- David Hildenbrand sends along more cleanups, this time against the
migration code. The series is "mm/migrate: move NUMA hinting fault
folio isolation + checks under PTL".
- Jan Kara has found quite a lot of strangenesses and minor errors in
the readahead code. He addresses this in the series "mm: Fix various
readahead quirks".
- SeongJae Park's series "selftests/damon: test DAMOS tried regions and
{min,max}_nr_regions" adds features and addresses errors in DAMON's self
testing code.
- Gavin Shan has found a userspace-triggerable WARN in the pagecache
code. The series "mm/filemap: Limit page cache size to that supported
by xarray" addresses this. The series is marked cc:stable.
- Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
and cleanup" cleans up and slightly optimizes KSM.
- Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
code motion. The series (which also makes the memcg-v1 code
Kconfigurable) are
"mm: memcg: separate legacy cgroup v1 code and put under config
option" and
"mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1"
- Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
adds an additional feature to this cgroup-v2 control file.
- The series "Userspace controls soft-offline pages" from Jiaqi Yan
permits userspace to stop the kernel's automatic treatment of excessive
correctable memory errors. In order to permit userspace to monitor and
handle this situation.
- Kefeng Wang's series "mm: migrate: support poison recover from migrate
folio" teaches the kernel to appropriately handle migration from
poisoned source folios rather than simply panicing.
- SeongJae Park's series "Docs/damon: minor fixups and improvements"
does those things.
- In the series "mm/zsmalloc: change back to per-size_class lock"
Chengming Zhou improves zsmalloc's scalability and memory utilization.
- Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare
refcount increments. So these paes can first be moved aside if they
reside in the movable zone or a CMA block.
- Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps
for much faster reading of vma information. The series is "query VMAs
from /proc/<pid>/maps".
- In the series "mm: introduce per-order mTHP split counters" Lance Yang
improves the kernel's presentation of developer information related to
multisize THP splitting.
- Michael Ellerman has developed the series "Reimplement huge pages
without hugepd on powerpc (8xx, e500, book3s/64)". This permits
userspace to use all available huge page sizes.
- In the series "revert unconditional slab and page allocator fault
injection calls" Vlastimil Babka removes a performance-affecting and not
very useful feature from slab fault injection.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2C+QAKCRDdBJ7gKXxA
joTkAQDvjqOoFStqk4GU3OXMYB7WCU/ZQMFG0iuu1EEwTVDZ4QEA8CnG7seek1R3
xEoo+vw0sWWeLV3qzsxnCA1BJ8cTJA8=
=z0Lf
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- In the series "mm: Avoid possible overflows in dirty throttling" Jan
Kara addresses a couple of issues in the writeback throttling code.
These fixes are also targetted at -stable kernels.
- Ryusuke Konishi's series "nilfs2: fix potential issues related to
reserved inodes" does that. This should actually be in the
mm-nonmm-stable tree, along with the many other nilfs2 patches. My
bad.
- More folio conversions from Kefeng Wang in the series "mm: convert to
folio_alloc_mpol()"
- Kemeng Shi has sent some cleanups to the writeback code in the series
"Add helper functions to remove repeated code and improve readability
of cgroup writeback"
- Kairui Song has made the swap code a little smaller and a little
faster in the series "mm/swap: clean up and optimize swap cache
index".
- In the series "mm/memory: cleanly support zeropage in
vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
Hildenbrand has reworked the rather sketchy handling of the use of
the zeropage in MAP_SHARED mappings. I don't see any runtime effects
here - more a cleanup/understandability/maintainablity thing.
- Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling
of higher addresses, for aarch64. The (poorly named) series is
"Restructure va_high_addr_switch".
- The core TLB handling code gets some cleanups and possible slight
optimizations in Bang Li's series "Add update_mmu_tlb_range() to
simplify code".
- Jane Chu has improved the handling of our
fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in
the series "Enhance soft hwpoison handling and injection".
- Jeff Johnson has sent a billion patches everywhere to add
MODULE_DESCRIPTION() to everything. Some landed in this pull.
- In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang
has simplified migration's use of hardware-offload memory copying.
- Yosry Ahmed performs more folio API conversions in his series "mm:
zswap: trivial folio conversions".
- In the series "large folios swap-in: handle refault cases first",
Chuanhua Han inches us forward in the handling of large pages in the
swap code. This is a cleanup and optimization, working toward the end
objective of full support of large folio swapin/out.
- In the series "mm,swap: cleanup VMA based swap readahead window
calculation", Huang Ying has contributed some cleanups and a possible
fixlet to his VMA based swap readahead code.
- In the series "add mTHP support for anonymous shmem" Baolin Wang has
taught anonymous shmem mappings to use multisize THP. By default this
is a no-op - users must opt in vis sysfs controls. Dramatic
improvements in pagefault latency are realized.
- David Hildenbrand has some cleanups to our remaining use of
page_mapcount() in the series "fs/proc: move page_mapcount() to
fs/proc/internal.h".
- David also has some highmem accounting cleanups in the series
"mm/highmem: don't track highmem pages manually".
- Build-time fixes and cleanups from John Hubbard in the series
"cleanups, fixes, and progress towards avoiding "make headers"".
- Cleanups and consolidation of the core pagemap handling from Barry
Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
and utilize them".
- Lance Yang's series "Reclaim lazyfree THP without splitting" has
reduced the latency of the reclaim of pmd-mapped THPs under fairly
common circumstances. A 10x speedup is seen in a microbenchmark.
It does this by punting to aother CPU but I guess that's a win unless
all CPUs are pegged.
- hugetlb_cgroup cleanups from Xiu Jianfeng in the series
"mm/hugetlb_cgroup: rework on cftypes".
- Miaohe Lin's series "Some cleanups for memory-failure" does just that
thing.
- Someone other than SeongJae has developed a DAMON feature in Honggyu
Kim's series "DAMON based tiered memory management for CXL memory".
This adds DAMON features which may be used to help determine the
efficiency of our placement of CXL/PCIe attached DRAM.
- DAMON user API centralization and simplificatio work in SeongJae
Park's series "mm/damon: introduce DAMON parameters online commit
function".
- In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
David Hildenbrand does some maintenance work on zsmalloc - partially
modernizing its use of pageframe fields.
- Kefeng Wang provides more folio conversions in the series "mm: remove
page_maybe_dma_pinned() and page_mkclean()".
- More cleanup from David Hildenbrand, this time in the series
"mm/memory_hotplug: use PageOffline() instead of PageReserved() for
!ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline()
pages" and permits the removal of some virtio-mem hacks.
- Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
__folio_add_anon_rmap()" is a cleanup to the anon folio handling in
preparation for mTHP (multisize THP) swapin.
- Kefeng Wang's series "mm: improve clear and copy user folio"
implements more folio conversions, this time in the area of large
folio userspace copying.
- The series "Docs/mm/damon/maintaier-profile: document a mailing tool
and community meetup series" tells people how to get better involved
with other DAMON developers. From SeongJae Park.
- A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
that.
- David Hildenbrand sends along more cleanups, this time against the
migration code. The series is "mm/migrate: move NUMA hinting fault
folio isolation + checks under PTL".
- Jan Kara has found quite a lot of strangenesses and minor errors in
the readahead code. He addresses this in the series "mm: Fix various
readahead quirks".
- SeongJae Park's series "selftests/damon: test DAMOS tried regions and
{min,max}_nr_regions" adds features and addresses errors in DAMON's
self testing code.
- Gavin Shan has found a userspace-triggerable WARN in the pagecache
code. The series "mm/filemap: Limit page cache size to that supported
by xarray" addresses this. The series is marked cc:stable.
- Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
and cleanup" cleans up and slightly optimizes KSM.
- Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
code motion. The series (which also makes the memcg-v1 code
Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put
under config option" and "mm: memcg: put cgroup v1-specific memcg
data under CONFIG_MEMCG_V1"
- Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
adds an additional feature to this cgroup-v2 control file.
- The series "Userspace controls soft-offline pages" from Jiaqi Yan
permits userspace to stop the kernel's automatic treatment of
excessive correctable memory errors. In order to permit userspace to
monitor and handle this situation.
- Kefeng Wang's series "mm: migrate: support poison recover from
migrate folio" teaches the kernel to appropriately handle migration
from poisoned source folios rather than simply panicing.
- SeongJae Park's series "Docs/damon: minor fixups and improvements"
does those things.
- In the series "mm/zsmalloc: change back to per-size_class lock"
Chengming Zhou improves zsmalloc's scalability and memory
utilization.
- Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
pinning memfd folios" makes the GUP code use FOLL_PIN rather than
bare refcount increments. So these paes can first be moved aside if
they reside in the movable zone or a CMA block.
- Andrii Nakryiko has added a binary ioctl()-based API to
/proc/pid/maps for much faster reading of vma information. The series
is "query VMAs from /proc/<pid>/maps".
- In the series "mm: introduce per-order mTHP split counters" Lance
Yang improves the kernel's presentation of developer information
related to multisize THP splitting.
- Michael Ellerman has developed the series "Reimplement huge pages
without hugepd on powerpc (8xx, e500, book3s/64)". This permits
userspace to use all available huge page sizes.
- In the series "revert unconditional slab and page allocator fault
injection calls" Vlastimil Babka removes a performance-affecting and
not very useful feature from slab fault injection.
* tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits)
mm/mglru: fix ineffective protection calculation
mm/zswap: fix a white space issue
mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
mm/hugetlb: fix possible recursive locking detected warning
mm/gup: clear the LRU flag of a page before adding to LRU batch
mm/numa_balancing: teach mpol_to_str about the balancing mode
mm: memcg1: convert charge move flags to unsigned long long
alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting
lib: reuse page_ext_data() to obtain codetag_ref
lib: add missing newline character in the warning message
mm/mglru: fix overshooting shrinker memory
mm/mglru: fix div-by-zero in vmpressure_calc_level()
mm/kmemleak: replace strncpy() with strscpy()
mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC
mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB
mm: ignore data-race in __swap_writepage
hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr
mm: shmem: rename mTHP shmem counters
mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async()
mm/migrate: putback split folios when numa hint migration fails
...
Here is the "big" set of char/misc and other driver subsystem changes
for 6.11-rc1. Nothing major in here, just loads of new drivers and
updates. Included in here are:
- IIO api updates and new drivers added
- wait_interruptable_timeout() api cleanups for some drivers
- MODULE_DESCRIPTION() additions for loads of drivers
- parport out-of-bounds fix
- interconnect driver updates and additions
- mhi driver updates and additions
- w1 driver fixes
- binder speedups and fixes
- eeprom driver updates
- coresight driver updates
- counter driver update
- new misc driver additions
- other minor api updates
All of these, EXCEPT for the final Kconfig build fix for 32bit systems,
have been in linux-next for a while with no reported issues. The
Kconfig fixup went in 29 hours ago, so might have missed the latest
linux-next, but was acked by everyone involved.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZppR4w8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykwoQCeIaW3nbOiNTmOupvEnZwrN3yVNs8An3Q5L+Br
1LpTASaU6A8pN81Z1m5g
=6U1z
-----END PGP SIGNATURE-----
Merge tag 'char-misc-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc and other driver updates from Greg KH:
"Here is the "big" set of char/misc and other driver subsystem changes
for 6.11-rc1. Nothing major in here, just loads of new drivers and
updates. Included in here are:
- IIO api updates and new drivers added
- wait_interruptable_timeout() api cleanups for some drivers
- MODULE_DESCRIPTION() additions for loads of drivers
- parport out-of-bounds fix
- interconnect driver updates and additions
- mhi driver updates and additions
- w1 driver fixes
- binder speedups and fixes
- eeprom driver updates
- coresight driver updates
- counter driver update
- new misc driver additions
- other minor api updates
All of these, EXCEPT for the final Kconfig build fix for 32bit
systems, have been in linux-next for a while with no reported issues.
The Kconfig fixup went in 29 hours ago, so might have missed the
latest linux-next, but was acked by everyone involved"
* tag 'char-misc-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (330 commits)
misc: Kconfig: exclude mrvl-cn10k-dpi compilation for 32-bit systems
misc: delete Makefile.rej
binder: fix hang of unregistered readers
misc: Kconfig: add a new dependency for MARVELL_CN10K_DPI
virtio: add missing MODULE_DESCRIPTION() macro
agp: uninorth: add missing MODULE_DESCRIPTION() macro
spmi: add missing MODULE_DESCRIPTION() macros
dev/parport: fix the array out-of-bounds risk
samples: configfs: add missing MODULE_DESCRIPTION() macro
misc: mrvl-cn10k-dpi: add Octeon CN10K DPI administrative driver
misc: keba: Fix missing AUXILIARY_BUS dependency
slimbus: Fix struct and documentation alignment in stream.c
MAINTAINERS: CC dri-devel list on Qualcomm FastRPC patches
misc: fastrpc: use coherent pool for untranslated Compute Banks
misc: fastrpc: support complete DMA pool access to the DSP
misc: fastrpc: add missing MODULE_DESCRIPTION() macro
misc: fastrpc: Add missing dev_err newlines
misc: fastrpc: Use memdup_user()
nvmem: core: Implement force_ro sysfs attribute
nvmem: Use sysfs_emit() for type attribute
...
Provide a generic C vDSO getrandom() implementation, which operates on
an opaque state returned by vgetrandom_alloc() and produces random bytes
the same way as getrandom(). This has the following API signature:
ssize_t vgetrandom(void *buffer, size_t len, unsigned int flags,
void *opaque_state, size_t opaque_len);
The return value and the first three arguments are the same as ordinary
getrandom(), while the last two arguments are a pointer to the opaque
allocated state and its size. Were all five arguments passed to the
getrandom() syscall, nothing different would happen, and the functions
would have the exact same behavior.
The actual vDSO RNG algorithm implemented is the same one implemented by
drivers/char/random.c, using the same fast-erasure techniques as that.
Should the in-kernel implementation change, so too will the vDSO one.
It requires an implementation of ChaCha20 that does not use any stack,
in order to maintain forward secrecy if a multi-threaded program forks
(though this does not account for a similar issue with SA_SIGINFO
copying registers to the stack), so this is left as an
architecture-specific fill-in. Stack-less ChaCha20 is an easy algorithm
to implement on a variety of architectures, so this shouldn't be too
onerous.
Initially, the state is keyless, and so the first call makes a
getrandom() syscall to generate that key, and then uses it for
subsequent calls. By keeping track of a generation counter, it knows
when its key is invalidated and it should fetch a new one using the
syscall. Later, more than just a generation counter might be used.
Since MADV_WIPEONFORK is set on the opaque state, the key and related
state is wiped during a fork(), so secrets don't roll over into new
processes, and the same state doesn't accidentally generate the same
random stream. The generation counter, as well, is always >0, so that
the 0 counter is a useful indication of a fork() or otherwise
uninitialized state.
If the kernel RNG is not yet initialized, then the vDSO always calls the
syscall, because that behavior cannot be emulated in userspace, but
fortunately that state is short lived and only during early boot. If it
has been initialized, then there is no need to inspect the `flags`
argument, because the behavior does not change post-initialization
regardless of the `flags` value.
Since the opaque state passed to it is mutated, vDSO getrandom() is not
reentrant, when used with the same opaque state, which libc should be
mindful of.
The function works over an opaque per-thread state of a particular size,
which must be marked VM_WIPEONFORK, VM_DONTDUMP, VM_NORESERVE, and
VM_DROPPABLE for proper operation. Over time, the nuances of these
allocations may change or grow or even differ based on architectural
features.
The opaque state passed to vDSO getrandom() must be allocated using the
mmap_flags and mmap_prot parameters provided by the vgetrandom_opaque_params
struct, which also contains the size of each state. That struct can be
obtained with a call to vgetrandom(NULL, 0, 0, ¶ms, ~0UL). Then,
libc can call mmap(2) and slice up the returned array into a state per
each thread, while ensuring that no single state straddles a page
boundary. Libc is expected to allocate a chunk of these on first use,
and then dole them out to threads as they're created, allocating more
when needed.
vDSO getrandom() provides the ability for userspace to generate random
bytes quickly and safely, and is intended to be integrated into libc's
thread management. As an illustrative example, the introduced code in
the vdso_test_getrandom self test later in this series might be used to
do the same outside of libc. In a libc the various pthread-isms are
expected to be elided into libc internals.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
API:
- Test setkey in no-SIMD context.
- Add skcipher speed test for user-specified algorithm.
Algorithms:
- Add x25519 support on ppc64le.
- Add VAES and AVX512 / AVX10 optimized AES-GCM on x86.
- Remove sm2 algorithm.
Drivers:
- Add Allwinner H616 support to sun8i-ce.
- Use DMA in stm32.
- Add Exynos850 hwrng support to exynos.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmaZFsgACgkQxycdCkmx
i6f76Q//ej7akY9fo6/qsn8UFK16O0SCEMkx7TrkxqHV8R6uwy4ret3+b5dbckY6
hBjDabiL/BAdNzo8hvta+BOtN6ToEqquSVwNCpX0U3YMLf9dIzcMA4Uri3LbxUHi
x9Qa8klI5x62Kg+RW+ovaJC4C11oKTpjVeDn4S57MudlBnhEa3DYcEADKiUowkEz
aigtLx8HrZYjwkQxwgWeS0xzeojhW1P20yaghOd6hTCD7vKw18JaKdD8r4YFGOBu
39eDaM/0vR+wWokk3NNl6NmXieBT8qLFt+OIbQs6b3gX9K37daahRs1VoShcL+ix
l8GaqLpo1n1llVrV1OWzyVLVLtYK849QEo6OmlusnbK7e5pQKEOXoACQ0VB8ElNE
1u7KNW6CBWGzr33dWPgl9yYBrT3BmMXABIK4dNmTicJsK2zk2FPKbLDZNi8fWah/
D46mv7Rb8EtTdhN56EzceUJpd1ZfmP9S4vY1Hu8YdmI1pxex11US/XppKLoyymqp
vNOzf85VuZ/GkUPfHdyWAFBnTaCjXtSBrlXD6+0nxavU9KGli0PLLX5tKNNWGw0l
51Z0tbNsDbo3Z+sMmtfvBXR2V8NwiAT5f775W0lLvpq/44mbDpdN3jGvfy9y9C7u
1DUC6F0XtUhZjR7e6/EhvHh3lB/a3w/m3+XC+XzDeox/VYTrC3Q=
=x80X
-----END PGP SIGNATURE-----
Merge tag 'v6.11-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
"API:
- Test setkey in no-SIMD context
- Add skcipher speed test for user-specified algorithm
Algorithms:
- Add x25519 support on ppc64le
- Add VAES and AVX512 / AVX10 optimized AES-GCM on x86
- Remove sm2 algorithm
Drivers:
- Add Allwinner H616 support to sun8i-ce
- Use DMA in stm32
- Add Exynos850 hwrng support to exynos"
* tag 'v6.11-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (81 commits)
hwrng: core - remove (un)register_miscdev()
crypto: lib/mpi - delete unnecessary condition
crypto: testmgr - generate power-of-2 lengths more often
crypto: mxs-dcp - Ensure payload is zero when using key slot
hwrng: Kconfig - Do not enable by default CN10K driver
crypto: starfive - Fix nent assignment in rsa dec
crypto: starfive - Align rsa input data to 32-bit
crypto: qat - fix unintentional re-enabling of error interrupts
crypto: qat - extend scope of lock in adf_cfg_add_key_value_param()
Documentation: qat: fix auto_reset attribute details
crypto: sun8i-ce - add Allwinner H616 support
crypto: sun8i-ce - wrap accesses to descriptor address fields
dt-bindings: crypto: sun8i-ce: Add compatible for H616
hwrng: core - Fix wrong quality calculation at hw rng registration
hwrng: exynos - Enable Exynos850 support
hwrng: exynos - Add SMC based TRNG operation
hwrng: exynos - Implement bus clock control
hwrng: exynos - Use devm_clk_get_enabled() to get the clock
hwrng: exynos - Improve coding style
dt-bindings: rng: Add Exynos850 support to exynos-trng
...
Configuration for sbq:
depth=64, wake_batch=6, shift=6, map_nr=1
1. There are 64 requests in progress:
map->word = 0xFFFFFFFFFFFFFFFF
2. After all the 64 requests complete, and no more requests come:
map->word = 0xFFFFFFFFFFFFFFFF, map->cleared = 0xFFFFFFFFFFFFFFFF
3. Now two tasks try to allocate requests:
T1: T2:
__blk_mq_get_tag .
__sbitmap_queue_get .
sbitmap_get .
sbitmap_find_bit .
sbitmap_find_bit_in_word .
__sbitmap_get_word -> nr=-1 __blk_mq_get_tag
sbitmap_deferred_clear __sbitmap_queue_get
/* map->cleared=0xFFFFFFFFFFFFFFFF */ sbitmap_find_bit
if (!READ_ONCE(map->cleared)) sbitmap_find_bit_in_word
return false; __sbitmap_get_word -> nr=-1
mask = xchg(&map->cleared, 0) sbitmap_deferred_clear
atomic_long_andnot() /* map->cleared=0 */
if (!(map->cleared))
return false;
/*
* map->cleared is cleared by T1
* T2 fail to acquire the tag
*/
4. T2 is the sole tag waiter. When T1 puts the tag, T2 cannot be woken
up due to the wake_batch being set at 6. If no more requests come, T1
will wait here indefinitely.
This patch achieves two purposes:
1. Check on ->cleared and update on both ->cleared and ->word need to
be done atomically, and using spinlock could be the simplest solution.
2. Add extra check in sbitmap_deferred_clear(), to identify whether
->word has free bits.
Fixes: ea86ea2cdc ("sbitmap: ammortize cost of clearing bits")
Signed-off-by: Yang Yang <yang.yang@vivo.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240716082644.659566-1-yang.yang@vivo.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmaXl0kACgkQu+CwddJF
iJrOlgf+N/G7BmgoW2CBF7mKsvCYs+pX3xeBuxPtsuq4FD386nsPFMN8gWAYLG3q
ZU1z1S+0M8LhTg6/G9jMYLHt2Y7WhYbhFTjTHmULJkuhMDTUP9CRYy4XZ+hdPtHF
30ezSdJQF9x/XxCSaaRVK1s+SMVHFg5xAOHKpfkNSamcMz9g+ZkYyPBr10/VoKd0
JqwhW7r6hrlvWAiqY3QKCOvohIWglgvBUnNjUGMh1cUkOE2aYLYHklhRwICKgA6z
p/2BUXiAEWUtgBkUrizwm/pdhJXLs0pOeYarVZP1v83tQMxyrc6XLNnqhvxP3DPW
31thF5Rf9I8WaWTczXhxsAwFjqO3KQ==
=4uf9
-----END PGP SIGNATURE-----
Merge tag 'slab-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
"The most prominent change this time is the kmem_buckets based
hardening of kmalloc() allocations from Kees Cook.
We have also extended the kmalloc() alignment guarantees for
non-power-of-two sizes in a way that benefits rust.
The rest are various cleanups and non-critical fixups.
- Dedicated bucket allocator (Kees Cook)
This series [1] enhances the probabilistic defense against heap
spraying/grooming of CONFIG_RANDOM_KMALLOC_CACHES from last year.
kmalloc() users that are known to be useful for exploits can get
completely separate set of kmalloc caches that can't be shared with
other users. The first converted users are alloc_msg() and
memdup_user().
The hardening is enabled by CONFIG_SLAB_BUCKETS.
- Extended kmalloc() alignment guarantees (Vlastimil Babka)
For years now we have guaranteed natural alignment for power-of-two
allocations, but nothing was defined for other sizes (in practice,
we have two such buckets, kmalloc-96 and kmalloc-192).
To avoid unnecessary padding in the rust layer due to its alignment
rules, extend the guarantee so that the alignment is at least the
largest power-of-two divisor of the requested size.
This fits what rust needs, is a superset of the existing
power-of-two guarantee, and does not in practice change the layout
(and thus does not add overhead due to padding) of the kmalloc-96
and kmalloc-192 caches, unless slab debugging is enabled for them.
- Cleanups and non-critical fixups (Chengming Zhou, Suren
Baghdasaryan, Matthew Willcox, Alex Shi, and Vlastimil Babka)
Various tweaks related to the new alloc profiling code, folio
conversion, debugging and more leftovers after SLAB"
Link: https://lore.kernel.org/all/20240701190152.it.631-kees@kernel.org/ [1]
* tag 'slab-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/memcg: alignment memcg_data define condition
mm, slab: move prepare_slab_obj_exts_hook under CONFIG_MEM_ALLOC_PROFILING
mm, slab: move allocation tagging code in the alloc path into a hook
mm/util: Use dedicated slab buckets for memdup_user()
ipc, msg: Use dedicated slab buckets for alloc_msg()
mm/slab: Introduce kmem_buckets_create() and family
mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument
mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
mm/slab: Introduce kmem_buckets typedef
slab, rust: extend kmalloc() alignment guarantees to remove Rust padding
slab: delete useless RED_INACTIVE and RED_ACTIVE
slab: don't put freepointer outside of object if only orig_size
slab: make check_object() more consistent
mm: Reduce the number of slab->folio casts
mm, slab: don't wrap internal functions with alloc_hooks()
- Remove duplicate included header file linux/bootconfig.h from
lib/bootconfig.c. This is a cleanup, no behavior change.
-----BEGIN PGP SIGNATURE-----
iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmaWhj4bHG1hc2FtaS5o
aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bdd0H/iraZ7ZOFWxCapOZI4dL
7f870j0PQG/KU7lB4jAo+3u7YyQWQTTLdhDPEOci4axsDG+56C/SVpHV0Z26SGHX
ZqcKlA/H0HT4BA3zG1leRzXC/qPYiAEdIw38NngYPYBUWhqM3qmYlrRIBeg89VrM
B4yaIJA/Uae7KAlB2dcmhmrIg86QK1iPKU6G+U5mIFecxDQmowE7z5f5pI/K/M5j
2HT2Kg1XPTtxOb15mKtA19TXbbA1IqYUvwW5jOffppKMwtiggEaOj4mLQ1MhlrP0
pEb1OJMx21MvEJYtjOXi8qsSGOhdWH8sBpxdUv21GzwRvOuG/AoaN1YKMIZCQp1K
Jjo=
=Bjzb
-----END PGP SIGNATURE-----
Merge tag 'bootconfig-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig update from Masami Hiramatsu:
- Remove duplicate included header file linux/bootconfig.h from
lib/bootconfig.c. This is a cleanup, no behavior change.
* tag 'bootconfig-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
bootconfig: Remove duplicate included header file linux/bootconfig.h
core:
- deprecate DRM data and return 0 date
- connector: Create a set of helpers to help with HDMI support
- Remove driver owner assignments
- Allow more drivers to compile with COMPILE_TEST
- Conversions to drm_edid
- Sprinkle MODULE_DESCRIPTIONS everywhere they are missing
- Remove drm_mm_replace_node
- print: Add a drm prefix to warn level messages too, remove
___drm_dbg, consolidate prefix handling
- New monochrome TV mode variant
ttm:
- improve number of page faults on some platforms
- fix test builds under PREEMPT_RT
- more test coverage
ci:
- Require a more recent version of mesa,
- improve farm setup and test generation
dma-buf:
- warn if reserving 0 fence slots
- internal API heap enhancements
fbdev:
- Create memory manager optimized fbdev emulation
panic:
- Allow to select fonts,
- improve drm_fb_dma_get_scanout_buffer
- Allow to dump kmsg to the screen
bridge:
- Remove redundant checks on bridge->encoder
- Remove drm_bridge_chain_mode_fixup
- bridge-connector: Plumb in the new HDMI helper
- analogix_dp: Various improvements, handle AUX transfers timeout
- samsung-dsim: Fix timings calculation
- tc358767: Plenty of small fixes, fix no connector attach, fix clocks
- sii902x: state validation improvements
panels:
- Switch panels from register table initialization to proper code
- Now that the panel code tracks the panel state, remove every
ad-hoc implementation in the panel drivers
- More cleanup of prepare / enable state tracking in drivers
- edp: Drop legacy panel compatibles
- simple-bridge: Switch to devm_drm_bridge_add
- New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology
13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0, BOE
nv110wum-l60, IVO t109nw41, WL-355608-A8, PrimeView PM070WL4,
Lincoln Technologies LCD197, Ortustech COM35H3P70ULC,
AUO G104STN01, K&d kd101ne3-40ti
amdgpu:
- DCN 4.0.x support
- GC 12.0 support
- GMC 12.0 support
- SDMA 7.0 support
- MES12 support
- MMHUB 4.1 support
- GFX12 modifier and DCC support
- lots of IP fixes/updates
amdkfd:
- Contiguous VRAM allocations
- GC 12.0 support
- SDMA 7.0 support
- SR-IOV fixes
- KFD GFX ALU exceptions
i915:
- Battlemage Xe2 HPD display enablement
- Panel Replay enabling
- DP AUX-less ALPM/LOBF
- Enable link training failure fallback for DP MST links
- CMRR (Content Match Refresh Rate) enabling
- Increase ADL-S/ADL-P/DG2+ max TMDS bitrate to 6 Gbps
- Enable eDP AUX based HDR backlight
- Support replaying GPU hangs with captured context image
- Automate CCS Mode setting during engine resets
- lots of refactoring
- Support replaying GPU hangs with captured context image
- Increase FLR timeout from 3s to 9s
- Enable w/a 16021333562 for DG2, MTL and ARL [guc]
xe:
- update MAINATINERS
- New uapi adding OA functionality to Xe
- expose l3 bank mask
- fix display detect on ADL-N
- runtime PM Fixes
- Fix silent backmerge issues
- More prep for SR-IOV
- HWmon additions
- per client usage info
- Rework GPU page fault handling
- Drop EXEC_QUEUE_FLAG_BANNED
- Add BMG PCI IDs
- Scheduler fixes and improvements
- Rename xe_exec_queue::compute to xe_exec_queue::lr
- Use ttm_uncached for BO with NEEDS_UC flag
- Rename xe perf layer as xe observation layer
- lots of refactoring
radeon:
- Backlight workaround for iMac
- Silence UBSAN flex array warnings
msm:
- Validate registers XML description against schema in CI
- core/dpu: SM7150 support
- mdp5: Add support for MSM8937
- gpu: Add param for userspace to know if raytracing is supported
- gpu: X185 support (aka gpu in X1 laptop chips)
- gpu: a505 support
ivpu:
- hardware scheduler support
- profiling support
- improvements to the platform support layer
- firmware handling improvements
- clocks/power mgmt improvements
- scheduler/logging improvements
habanalabs:
- Gradual sleep in polling memory macro.
- Reduce Gaudi2 MSI-X interrupt count to 128.
- Add Gaudi2-D revision support.
- Add timestamp to CPLD info.
- Gaudi2: Assume hard-reset by firmware upon MC SEI severe error.
- Align Gaudi2 interrupt names.
- Check for errors after preboot is ready.
- Change habanalabs maintainer and git repo path.
mgag200:
- refactoring and improvements
- Add BMC output
- enable polling
nouveau:
- add registry command line
v3d:
- perf counters improvements
zynqmp:
- irq and debugfs improvements
atmel-hlcdc:
- Support XLCDC in sam9x7
mipi-dbi:
- Remove mipi_dbi_machine_little_endian
- make SPI bits per word configurable
- support RGB888
- allow pixel formats to be specified in the DT
sun4i:
- Rework the blender setup for DE2
panfrost:
- Enable MT8188 support
vc4:
- Monochrome TV support
exynos:
- fix fallback mode regression
- fix memory leak
- Use drm_edid_duplicate() instead of kmemdup()
etnaviv:
- fix i.MX8MP NPU clock gating
- workaround FE register cdc issues on some cores
- fix DMA sync handling for cached buffers
- fix job timeout handling
- keep TS enabled on MMUv2 cores for improved performance
mediatek:
- Convert to platform remove callback returning void-
- Drop chain_mode_fixup call in mode_valid()
- Fixes the errors of MediaTek display driver found by IGT.
- Add display support for the MT8365-EVK board
- Fix bit depth overwritten for mtk_ovl_set bit_depth()
- Fix possible_crtcs calculation
- Fix spurious kfree()
ast:
- refactor mode setting code
stm:
- Add LVDS support
- DSI PHY updates
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmaYqVEACgkQDHTzWXnE
hr5p3Q/+OOxTHKJ/8WMwfV1Tuep5otkCZdBgNdcuu9zqzpEMEDUDwmV1iboIvT9x
qJsDwSAJomwbZAnVjDKsbZuycSHUBV6HQdf+5+rtq6be1EfFRwJVzOq0u5+D3KGt
7f2vy6sM9tw4tR6EikiuP7vCvnSz4iGrWERvEJDEtXECbALhju8sulht8ZMnr6GW
/MfUetULLSDjq0L1x3TWAq2MPGnJ5UxIkIeOBUP6n4etAUX1BPTNA6N76eN/xMvn
a40JhtM+pCjjkHxvloIZ+KTYN3S+hskIRksczPHh9HtNX7y/A437wyhOHJZ1NvZb
yc5ke9GjXxGcxyZH+PY5aCS7O/XElzSSkR1jFZ2s3/MX7PVKgCahGK7+yWjPsiK2
R5oXebdObshUa8LHDE/3WgBUmTchkvKRTXV9cvGqzxEPhC2zrxArvwP5v6B4mhCn
Vqo3Pv0Cyr+n65Z5Dzqz/9+m999LJjFTsTrug0p5b/qBJQKu2rQONe4lpZ0NFwwY
ExyjdxILj7mqrQpKcA6V5Bel5ZCnlVsGfTshFL6Iux54VFlJyRMzKWZ+Gdv4av5k
dbjz+re+CojKabn3ML/7pAQujK6Rqe58vPuHV78zkvAGJnQgJOOTrmYNYtn3oBqe
ogdCN+/PREb/9U7i6mQv5hhdHs4tT9ROXaT9jyb8XSHXW+t9lBM=
=g+Ad
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2024-07-18' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
"There's a lot of stuff in here, amd, i915 and xe have new platform
work, lots of core rework around EDID handling, some new COMPILE_TEST
options, maintainer changes and a lots of other stuff. Summary:
core:
- deprecate DRM data and return 0 date
- connector: Create a set of helpers to help with HDMI support
- Remove driver owner assignments
- Allow more drivers to compile with COMPILE_TEST
- Conversions to drm_edid
- Sprinkle MODULE_DESCRIPTIONS everywhere they are missing
- Remove drm_mm_replace_node
- print: Add a drm prefix to warn level messages too, remove
___drm_dbg, consolidate prefix handling
- New monochrome TV mode variant
ttm:
- improve number of page faults on some platforms
- fix test builds under PREEMPT_RT
- more test coverage
ci:
- Require a more recent version of mesa
- improve farm setup and test generation
dma-buf:
- warn if reserving 0 fence slots
- internal API heap enhancements
fbdev:
- Create memory manager optimized fbdev emulation
panic:
- Allow to select fonts
- improve drm_fb_dma_get_scanout_buffer
- Allow to dump kmsg to the screen
bridge:
- Remove redundant checks on bridge->encoder
- Remove drm_bridge_chain_mode_fixup
- bridge-connector: Plumb in the new HDMI helper
- analogix_dp: Various improvements, handle AUX transfers timeout
- samsung-dsim: Fix timings calculation
- tc358767: Plenty of small fixes, fix no connector attach, fix
clocks
- sii902x: state validation improvements
panels:
- Switch panels from register table initialization to proper code
- Now that the panel code tracks the panel state, remove every ad-hoc
implementation in the panel drivers
- More cleanup of prepare / enable state tracking in drivers
- edp: Drop legacy panel compatibles
- simple-bridge: Switch to devm_drm_bridge_add
- New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology
13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0,
BOE nv110wum-l60, IVO t109nw41, WL-355608-A8, PrimeView
PM070WL4, Lincoln Technologies LCD197, Ortustech
COM35H3P70ULC, AUO G104STN01, K&d kd101ne3-40ti
amdgpu:
- DCN 4.0.x support
- GC 12.0 support
- GMC 12.0 support
- SDMA 7.0 support
- MES12 support
- MMHUB 4.1 support
- GFX12 modifier and DCC support
- lots of IP fixes/updates
amdkfd:
- Contiguous VRAM allocations
- GC 12.0 support
- SDMA 7.0 support
- SR-IOV fixes
- KFD GFX ALU exceptions
i915:
- Battlemage Xe2 HPD display enablement
- Panel Replay enabling
- DP AUX-less ALPM/LOBF
- Enable link training failure fallback for DP MST links
- CMRR (Content Match Refresh Rate) enabling
- Increase ADL-S/ADL-P/DG2+ max TMDS bitrate to 6 Gbps
- Enable eDP AUX based HDR backlight
- Support replaying GPU hangs with captured context image
- Automate CCS Mode setting during engine resets
- lots of refactoring
- Support replaying GPU hangs with captured context image
- Increase FLR timeout from 3s to 9s
- Enable w/a 16021333562 for DG2, MTL and ARL [guc]
xe:
- update MAINATINERS
- New uapi adding OA functionality to Xe
- expose l3 bank mask
- fix display detect on ADL-N
- runtime PM Fixes
- Fix silent backmerge issues
- More prep for SR-IOV
- HWmon additions
- per client usage info
- Rework GPU page fault handling
- Drop EXEC_QUEUE_FLAG_BANNED
- Add BMG PCI IDs
- Scheduler fixes and improvements
- Rename xe_exec_queue::compute to xe_exec_queue::lr
- Use ttm_uncached for BO with NEEDS_UC flag
- Rename xe perf layer as xe observation layer
- lots of refactoring
radeon:
- Backlight workaround for iMac
- Silence UBSAN flex array warnings
msm:
- Validate registers XML description against schema in CI
- core/dpu: SM7150 support
- mdp5: Add support for MSM8937
- gpu: Add param for userspace to know if raytracing is supported
- gpu: X185 support (aka gpu in X1 laptop chips)
- gpu: a505 support
ivpu:
- hardware scheduler support
- profiling support
- improvements to the platform support layer
- firmware handling improvements
- clocks/power mgmt improvements
- scheduler/logging improvements
habanalabs:
- Gradual sleep in polling memory macro
- Reduce Gaudi2 MSI-X interrupt count to 128
- Add Gaudi2-D revision support
- Add timestamp to CPLD info
- Gaudi2: Assume hard-reset by firmware upon MC SEI severe error
- Align Gaudi2 interrupt names
- Check for errors after preboot is ready
- Change habanalabs maintainer and git repo path
mgag200:
- refactoring and improvements
- Add BMC output
- enable polling
nouveau:
- add registry command line
v3d:
- perf counters improvements
zynqmp:
- irq and debugfs improvements
atmel-hlcdc:
- Support XLCDC in sam9x7
mipi-dbi:
- Remove mipi_dbi_machine_little_endian
- make SPI bits per word configurable
- support RGB888
- allow pixel formats to be specified in the DT
sun4i:
- Rework the blender setup for DE2
panfrost:
- Enable MT8188 support
vc4:
- Monochrome TV support
exynos:
- fix fallback mode regression
- fix memory leak
- Use drm_edid_duplicate() instead of kmemdup()
etnaviv:
- fix i.MX8MP NPU clock gating
- workaround FE register cdc issues on some cores
- fix DMA sync handling for cached buffers
- fix job timeout handling
- keep TS enabled on MMUv2 cores for improved performance
mediatek:
- Convert to platform remove callback returning void-
- Drop chain_mode_fixup call in mode_valid()
- Fixes the errors of MediaTek display driver found by IGT
- Add display support for the MT8365-EVK board
- Fix bit depth overwritten for mtk_ovl_set bit_depth()
- Fix possible_crtcs calculation
- Fix spurious kfree()
ast:
- refactor mode setting code
stm:
- Add LVDS support
- DSI PHY updates"
* tag 'drm-next-2024-07-18' of https://gitlab.freedesktop.org/drm/kernel: (2501 commits)
drm/amdgpu/mes12: add missing opcode string
drm/amdgpu/mes11: update opcode strings
Revert "drm/amd/display: Reset freesync config before update new state"
drm/omap: Restrict compile testing to PAGE_SIZE less than 64KB
drm/xe: Drop trace_xe_hw_fence_free
drm/xe/uapi: Rename xe perf layer as xe observation layer
drm/amdgpu: remove exp hw support check for gfx12
drm/amdgpu: timely save bad pages to eeprom after gpu ras reset is completed
drm/amdgpu: flush all cached ras bad pages to eeprom
drm/amdgpu: select compute ME engines dynamically
drm/amd/display: Allow display DCC for DCN401
drm/amdgpu: select compute ME engines dynamically
drm/amdgpu/job: Replace DRM_INFO/ERROR logging
drm/amdgpu: select compute ME engines dynamically
drm/amd/pm: Ignore initial value in smu response register
drm/amdgpu: Initialize VF partition mode
drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping
MAINTAINERS: fix Xinhui's name
MAINTAINERS: update powerplay and swsmu
drm/qxl: Pin buffer objects for internal mappings
...
patchsets (devmem among them) did not make it in time.
Core & protocols
----------------
- Use local_lock in addition to local_bh_disable() to protect per-CPU
resources in networking, a step closer for local_bh_disable() not
to act as a big lock on PREEMPT_RT.
- Use flex array for netdevice priv area, ensure its cache alignment.
- Add a sysctl knob to allow user to specify a default rto_min at socket
init time. Bit of a big hammer but multiple companies were
independently carrying such patch downstream so clearly it's useful.
- Support scheduling transmission of packets based on CLOCK_TAI.
- Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned off
using cpusets.
- Support multiple L2TPv3 UDP tunnels using the same 5-tuple address.
- Allow configuration of multipath hash seed, to both allow synchronizing
hashing of two routers, and preventing partial accidental sync.
- Improve TCP compliance with RFC 9293 for simultaneous connect().
- Support sending NAT keepalives in IPsec ESP in UDP states. Userspace
IKE daemon had to do this before, but the kernel can better keep
track of it.
- Support sending supervision HSR frames with MAC addresses stored in
ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled.
- Introduce IPPROTO_SMC for selecting SMC when socket is created.
- Allow UDP GSO transmit from devices with no checksum offload.
- openvswitch: add packet sampling via psample, separating the sampled
traffic from "upcall" packets sent to user space for forwarding.
- nf_tables: shrink memory consumption for transaction objects.
Things we sprinkled into general kernel code
--------------------------------------------
- Power Sequencing subsystem (used by Qualcomm Bluetooth driver
for QCA6390).
- Add IRQ information in sysfs for auxiliary bus.
- Introduce guard definition for local_lock.
- Add aligned flavor of __cacheline_group_{begin, end}() markings for
grouping fields in structures.
BPF
---
- Notify user space (via epoll) when a struct_ops object is getting
detached/unregistered.
- Add new kfuncs for a generic, open-coded bits iterator.
- Enable BPF programs to declare arrays of kptr, bpf_rb_root, and
bpf_list_head.
- Support resilient split BTF which cuts down on duplication and makes
BTF as compact as possible WRT BTF from modules.
- Add support for dumping kfunc prototypes from BTF which enables both
detecting as well as dumping compilable prototypes for kfuncs.
- riscv64 BPF JIT improvements in particular to add 12-argument support
for BPF trampolines and to utilize bpf_prog_pack for the latter.
- Add the capability to offload the netfilter flowtable in XDP layer
through kfuncs.
Driver API
----------
- Allow users to configure IRQ tresholds between which automatic IRQ
moderation can choose.
- Expand Power Sourcing (PoE) status with power, class and failure
reason. Support setting power limits.
- Track additional RSS contexts in the core, make sure configuration
changes don't break them.
- Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP
data paths.
- Support updating firmware on SFP modules.
Tests and tooling
-----------------
- mptcp: use net/lib.sh to manage netns.
- TCP-AO and TCP-MD5: replace debug prints used by tests with
tracepoints.
- openvswitch: make test self-contained (don't depend on OvS CLI tools).
Drivers
-------
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- increase the max total outstanding PTP TX packets to 4
- add timestamping statistics support
- implement netdev_queue_mgmt_ops
- support new RSS context API
- Intel (100G, ice, idpf):
- implement FEC statistics and dumping signal quality indicators
- support E825C products (with 56Gbps PHYs)
- nVidia/Mellanox:
- support HW-GRO
- mlx4/mlx5: support per-queue statistics via netlink
- obey the max number of EQs setting in sub-functions
- AMD/Solarflare:
- support new RSS context API
- AMD/Pensando:
- ionic: rework fix for doorbell miss to lower overhead
and skip it on new HW
- Wangxun:
- txgbe: support Flow Director perfect filters
- Ethernet NICs consumer, embedded and virtual:
- Add driver for Tehuti Networks TN40xx chips
- Add driver for Meta's internal NIC chips
- Add driver for Ethernet MAC on Airoha EN7581 SoCs
- Add driver for Renesas Ethernet-TSN devices
- Google cloud vNIC:
- flow steering support
- Microsoft vNIC:
- support page sizes other than 4KB on ARM64
- vmware vNIC:
- support latency measurement (update to version 9)
- VirtIO net:
- support for Byte Queue Limits
- support configuring thresholds for automatic IRQ moderation
- support for AF_XDP Rx zero-copy
- Synopsys (stmmac):
- support for STM32MP13 SoC
- let platforms select the right PCS implementation
- TI:
- icssg-prueth: add multicast filtering support
- icssg-prueth: enable PTP timestamping and PPS
- Renesas:
- ravb: improve Rx performance 30-400% by using page pool,
theaded NAPI and timer-based IRQ coalescing
- ravb: add MII support for R-Car V4M
- Cadence (macb):
- macb: add ARP support to Wake-On-LAN
- Cortina:
- use phylib for RX and TX pause configuration
- Ethernet switches:
- nVidia/Mellanox:
- support configuration of multipath hash seed
- report more accurate max MTU
- use page_pool to improve Rx performance
- MediaTek:
- mt7530: add support for bridge port isolation
- Qualcomm:
- qca8k: add support for bridge port isolation
- Microchip:
- lan9371/2: add 100BaseTX PHY support
- NXP:
- vsc73xx: implement VLAN operations
- Ethernet PHYs:
- aquantia: enable support for aqr115c
- aquantia: add support for PHY LEDs
- realtek: add support for rtl8224 2.5Gbps PHY
- xpcs: add memory-mapped device support
- add BroadR-Reach link mode and support in Broadcom's PHY driver
- CAN:
- add document for ISO 15765-2 protocol support
- mcp251xfd: workaround for erratum DS80000789E, use timestamps
to catch when device returns incorrect FIFO status
- WiFi:
- mac80211/cfg80211:
- parse Transmit Power Envelope (TPE) data in mac80211 instead of
in drivers
- improvements for 6 GHz regulatory flexibility
- multi-link improvements
- support multiple radios per wiphy
- remove DEAUTH_NEED_MGD_TX_PREP flag
- Intel (iwlwifi):
- bump FW API to 91 for BZ/SC devices
- report 64-bit radiotap timestamp
- enable P2P low latency by default
- handle Transmit Power Envelope (TPE) advertised by AP
- remove support for older FW for new devices
- fast resume (keeping the device configured)
- mvm: re-enable Multi-Link Operation (MLO)
- aggregation (A-MSDU) optimizations
- MediaTek (mt76):
- mt7925 Multi-Link Operation (MLO) support
- Qualcomm (ath10k):
- LED support for various chipsets
- Qualcomm (ath12k):
- remove unsupported Tx monitor handling
- support channel 2 in 6 GHz band
- support Spatial Multiplexing Power Save (SMPS) in 6 GHz band
- supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID
Advertisements (EMA)
- support dynamic VLAN
- add panic handler for resetting the firmware state
- DebugFS support for datapath statistics
- WCN7850: support for Wake on WLAN
- Microchip (wilc1000):
- read MAC address during probe to make it visible to user space
- suspend/resume improvements
- TI (wl18xx):
- support newer firmware versions
- RealTek (rtw89):
- preparation for RTL8852BE-VT support
- Wake on WLAN support for WiFi 6 chips
- 36-bit PCI DMA support
- RealTek (rtlwifi):
- RTL8192DU support
- Broadcom (brcmfmac):
- Management Frame Protection support (to enable WPA3)
- Bluetooth:
- qualcomm: use the power sequencer for QCA6390
- btusb: mediatek: add ISO data transmission functions
- hci_bcm4377: add BCM4388 support
- btintel: add support for BlazarU core
- btintel: add support for Whale Peak2
- btnxpuart: add support for AW693 A1 chipset
- btnxpuart: add support for IW615 chipset
- btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmaWjBwACgkQMUZtbf5S
IrvuSRAAkJuEzTRqgURBCe4eNEQde6mJJig7l2CKHwCbFiHZpRkFHf8qKbcGWbL6
uLW33SWnKtJVDhxVKWHLq635XW7BAa80YhqGw21GDi+mIEhWXZglHj3xbXNxsMfE
4eg/kG4BkfYWFmHaXOwVWV/mr7nXf6j7WmXNeXEi32ufE1j0OL+YlQenKnMj8yP2
j9JmYa2Chwppng1SblHmcjmGkdNVwFhStKeCG+2K7v06wdDH/QYBlbgUv9gw/cxp
NlW//wgiaeX40U4O3kDwt9C+LDoh+0VrDDeVdQ+IsScLtY3PhAzEoKolFYTq2HSr
I1JpoaHNnyNsJq3DZrACQ5WlH4yDn6C2EUB6dxNnFaI9F1ZPsi+7MTl6Sei1AklD
TuQTj/lxOACBwW2Q77NU72uoxiIUauesGPHcnrAFuoCIEhZF0mso7k59BvrXhsOP
QwcLbQdc1YHNkqv/Vc7NBY+ruMsYB+5Ubbhhj2p27dp/CWFIwxI29fze4dn2uhO6
ejHN3mbqwPdSzg12YJtM6Iq61Cnwo2eVSvhTxl+ZVSZtI4nu2arzR+y7QTYmNrXP
6tkgVN9UsWeLl2xJ8wyyqL5mcvNHP2rPXWZ2X56iTaa26m+UlleeQ7YRaYtQAAr0
Ec/vlDMX64SwHhd+qwE99DXGQf2g+KklHKSLsnajJUVrWFTlRI0=
=opz8
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Not much excitement - a handful of large patchsets (devmem among them)
did not make it in time.
Core & protocols:
- Use local_lock in addition to local_bh_disable() to protect per-CPU
resources in networking, a step closer for local_bh_disable() not
to act as a big lock on PREEMPT_RT
- Use flex array for netdevice priv area, ensure its cache alignment
- Add a sysctl knob to allow user to specify a default rto_min at
socket init time. Bit of a big hammer but multiple companies were
independently carrying such patch downstream so clearly it's useful
- Support scheduling transmission of packets based on CLOCK_TAI
- Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned
off using cpusets
- Support multiple L2TPv3 UDP tunnels using the same 5-tuple address
- Allow configuration of multipath hash seed, to both allow
synchronizing hashing of two routers, and preventing partial
accidental sync
- Improve TCP compliance with RFC 9293 for simultaneous connect()
- Support sending NAT keepalives in IPsec ESP in UDP states.
Userspace IKE daemon had to do this before, but the kernel can
better keep track of it
- Support sending supervision HSR frames with MAC addresses stored in
ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled
- Introduce IPPROTO_SMC for selecting SMC when socket is created
- Allow UDP GSO transmit from devices with no checksum offload
- openvswitch: add packet sampling via psample, separating the
sampled traffic from "upcall" packets sent to user space for
forwarding
- nf_tables: shrink memory consumption for transaction objects
Things we sprinkled into general kernel code:
- Power Sequencing subsystem (used by Qualcomm Bluetooth driver for
QCA6390) [ Already merged separately - Linus ]
- Add IRQ information in sysfs for auxiliary bus
- Introduce guard definition for local_lock
- Add aligned flavor of __cacheline_group_{begin, end}() markings for
grouping fields in structures
BPF:
- Notify user space (via epoll) when a struct_ops object is getting
detached/unregistered
- Add new kfuncs for a generic, open-coded bits iterator
- Enable BPF programs to declare arrays of kptr, bpf_rb_root, and
bpf_list_head
- Support resilient split BTF which cuts down on duplication and
makes BTF as compact as possible WRT BTF from modules
- Add support for dumping kfunc prototypes from BTF which enables
both detecting as well as dumping compilable prototypes for kfuncs
- riscv64 BPF JIT improvements in particular to add 12-argument
support for BPF trampolines and to utilize bpf_prog_pack for the
latter
- Add the capability to offload the netfilter flowtable in XDP layer
through kfuncs
Driver API:
- Allow users to configure IRQ tresholds between which automatic IRQ
moderation can choose
- Expand Power Sourcing (PoE) status with power, class and failure
reason. Support setting power limits
- Track additional RSS contexts in the core, make sure configuration
changes don't break them
- Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated
ESP data paths
- Support updating firmware on SFP modules
Tests and tooling:
- mptcp: use net/lib.sh to manage netns
- TCP-AO and TCP-MD5: replace debug prints used by tests with
tracepoints
- openvswitch: make test self-contained (don't depend on OvS CLI
tools)
Drivers:
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- increase the max total outstanding PTP TX packets to 4
- add timestamping statistics support
- implement netdev_queue_mgmt_ops
- support new RSS context API
- Intel (100G, ice, idpf):
- implement FEC statistics and dumping signal quality indicators
- support E825C products (with 56Gbps PHYs)
- nVidia/Mellanox:
- support HW-GRO
- mlx4/mlx5: support per-queue statistics via netlink
- obey the max number of EQs setting in sub-functions
- AMD/Solarflare:
- support new RSS context API
- AMD/Pensando:
- ionic: rework fix for doorbell miss to lower overhead and
skip it on new HW
- Wangxun:
- txgbe: support Flow Director perfect filters
- Ethernet NICs consumer, embedded and virtual:
- Add driver for Tehuti Networks TN40xx chips
- Add driver for Meta's internal NIC chips
- Add driver for Ethernet MAC on Airoha EN7581 SoCs
- Add driver for Renesas Ethernet-TSN devices
- Google cloud vNIC:
- flow steering support
- Microsoft vNIC:
- support page sizes other than 4KB on ARM64
- vmware vNIC:
- support latency measurement (update to version 9)
- VirtIO net:
- support for Byte Queue Limits
- support configuring thresholds for automatic IRQ moderation
- support for AF_XDP Rx zero-copy
- Synopsys (stmmac):
- support for STM32MP13 SoC
- let platforms select the right PCS implementation
- TI:
- icssg-prueth: add multicast filtering support
- icssg-prueth: enable PTP timestamping and PPS
- Renesas:
- ravb: improve Rx performance 30-400% by using page pool,
theaded NAPI and timer-based IRQ coalescing
- ravb: add MII support for R-Car V4M
- Cadence (macb):
- macb: add ARP support to Wake-On-LAN
- Cortina:
- use phylib for RX and TX pause configuration
- Ethernet switches:
- nVidia/Mellanox:
- support configuration of multipath hash seed
- report more accurate max MTU
- use page_pool to improve Rx performance
- MediaTek:
- mt7530: add support for bridge port isolation
- Qualcomm:
- qca8k: add support for bridge port isolation
- Microchip:
- lan9371/2: add 100BaseTX PHY support
- NXP:
- vsc73xx: implement VLAN operations
- Ethernet PHYs:
- aquantia: enable support for aqr115c
- aquantia: add support for PHY LEDs
- realtek: add support for rtl8224 2.5Gbps PHY
- xpcs: add memory-mapped device support
- add BroadR-Reach link mode and support in Broadcom's PHY driver
- CAN:
- add document for ISO 15765-2 protocol support
- mcp251xfd: workaround for erratum DS80000789E, use timestamps to
catch when device returns incorrect FIFO status
- WiFi:
- mac80211/cfg80211:
- parse Transmit Power Envelope (TPE) data in mac80211 instead
of in drivers
- improvements for 6 GHz regulatory flexibility
- multi-link improvements
- support multiple radios per wiphy
- remove DEAUTH_NEED_MGD_TX_PREP flag
- Intel (iwlwifi):
- bump FW API to 91 for BZ/SC devices
- report 64-bit radiotap timestamp
- enable P2P low latency by default
- handle Transmit Power Envelope (TPE) advertised by AP
- remove support for older FW for new devices
- fast resume (keeping the device configured)
- mvm: re-enable Multi-Link Operation (MLO)
- aggregation (A-MSDU) optimizations
- MediaTek (mt76):
- mt7925 Multi-Link Operation (MLO) support
- Qualcomm (ath10k):
- LED support for various chipsets
- Qualcomm (ath12k):
- remove unsupported Tx monitor handling
- support channel 2 in 6 GHz band
- support Spatial Multiplexing Power Save (SMPS) in 6 GHz band
- supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID
Advertisements (EMA)
- support dynamic VLAN
- add panic handler for resetting the firmware state
- DebugFS support for datapath statistics
- WCN7850: support for Wake on WLAN
- Microchip (wilc1000):
- read MAC address during probe to make it visible to user space
- suspend/resume improvements
- TI (wl18xx):
- support newer firmware versions
- RealTek (rtw89):
- preparation for RTL8852BE-VT support
- Wake on WLAN support for WiFi 6 chips
- 36-bit PCI DMA support
- RealTek (rtlwifi):
- RTL8192DU support
- Broadcom (brcmfmac):
- Management Frame Protection support (to enable WPA3)
- Bluetooth:
- qualcomm: use the power sequencer for QCA6390
- btusb: mediatek: add ISO data transmission functions
- hci_bcm4377: add BCM4388 support
- btintel: add support for BlazarU core
- btintel: add support for Whale Peak2
- btnxpuart: add support for AW693 A1 chipset
- btnxpuart: add support for IW615 chipset
- btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591"
* tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1589 commits)
eth: fbnic: Fix spelling mistake "tiggerring" -> "triggering"
tcp: Replace strncpy() with strscpy()
wifi: ath12k: fix build vs old compiler
tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child().
eth: fbnic: Write the TCAM tables used for RSS control and Rx to host
eth: fbnic: Add L2 address programming
eth: fbnic: Add basic Rx handling
eth: fbnic: Add basic Tx handling
eth: fbnic: Add link detection
eth: fbnic: Add initial messaging to notify FW of our presence
eth: fbnic: Implement Rx queue alloc/start/stop/free
eth: fbnic: Implement Tx queue alloc/start/stop/free
eth: fbnic: Allocate a netdevice and napi vectors with queues
eth: fbnic: Add FW communication mechanism
eth: fbnic: Add message parsing for FW messages
eth: fbnic: Add register init to set PCIe/Ethernet device config
eth: fbnic: Allocate core device specific structures and devlink interface
eth: fbnic: Add scaffolding for Meta's NIC driver
PCI: Add Meta Platforms vendor ID
net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK
...
This KUnit next update for Linux 6.11-rc1 consists of:
-- adds vm_mmap() allocation resource manager
-- converts usercopy kselftest to KUnit
-- disables usercopy testing on !CONFIG_MMU
-- adds MODULE_DESCRIPTION() to core, list, and usercopy tests
-- adds tests for assertion formatting functions - assert.c
-- introduces KUNIT_ASSERT_MEMEQ and KUNIT_ASSERT_MEMNEQ macros
-- fixes KUNIT_ASSERT_STRNEQ comments to make it clear that it is
an assertion
-- renames KUNIT_ASSERT_FAILURE to KUNIT_FAIL_AND_ABORT
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmaWpCYACgkQCwJExA0N
QxwdPQ/9G26Q+xhbieosvXHu/04ZWTcuUP/cFRv56jLH9bKm25YbW8WZzKM/imE5
So35IT6SIYlwxn9fYyriPz372h3ZC522cu8tIVrUh5Uo3O5LbzQqdrxos9a+RuCg
u6lenSksAjJRZ3S3IKDJ1ErxLnPYKyjjZFwDmV1+0Xxy30SwzFEbQqj9lY2Q4iGs
KWBm0lrFPipbHdBqZcPB/mxIDyF6rhe+oeuOPU8uag6ncNN31xMpDanU8O6XEAz9
QoAiDICANbVKTRKG5xXgmsJtyLF8GON4e49kEYtCLdnESPc39hQtf3cTHeYI22HC
7OWhhOySifNIukFj1hVtxnN3ZfjtBGmbCwe5rXZFvMovE3YwAplKK61GoOaI9UV0
qPk5GGrAb/xEh2HZ9tgf8+CsqmnPQLGnVt2h3u3c28u4YzbkinqVj20KYsye39zz
KzJsO2yDJH4LlIJjc8XWof1cyyo0TIJQVOwJqAieOPePnfs4zabmVOus8y1Cj07V
iAvQTPPoZ165zA1cl0iSMolKkXeAgf2FjlEGbODrktKKX6Ag/PKVp3e6PW28zJbp
0p1V1IDQQAlEhbcRAZb+5y1voh+hcy++KyPwpj7lAVkmHd7RoK/mDL3W+oLdOTrB
aXWs4JOlkmtUaz3EpAQZuvhYWVW7DexR9rU1SF44UAVzSdZSndw=
=nnFR
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-kunit-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
- add vm_mmap() allocation resource manager
- convert usercopy kselftest to KUnit
- disable usercopy testing on !CONFIG_MMU
- add MODULE_DESCRIPTION() to core, list, and usercopy tests
- add tests for assertion formatting functions - assert.c
- introduce KUNIT_ASSERT_MEMEQ and KUNIT_ASSERT_MEMNEQ macros
- fix KUNIT_ASSERT_STRNEQ comments to make it clear that it is an
assertion
- rename KUNIT_ASSERT_FAILURE to KUNIT_FAIL_AND_ABORT
* tag 'linux_kselftest-kunit-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Introduce KUNIT_ASSERT_MEMEQ and KUNIT_ASSERT_MEMNEQ macros
kunit: Rename KUNIT_ASSERT_FAILURE to KUNIT_FAIL_AND_ABORT for readability
kunit: Fix the comment of KUNIT_ASSERT_STRNEQ as assertion
kunit: executor: Simplify string allocation handling
kunit/usercopy: Add missing MODULE_DESCRIPTION()
kunit/usercopy: Disable testing on !CONFIG_MMU
usercopy: Convert test_user_copy to KUnit test
kunit: test: Add vm_mmap() allocation resource manager
list: test: add the missing MODULE_DESCRIPTION() macro
kunit: add missing MODULE_DESCRIPTION() macros to core modules
list: test: remove unused struct 'klist_test_struct'
kunit: Cover 'assert.c' with tests
Summary
* Remove "->procname == NULL" check when iterating through sysctl table arrays
Removing sentinels in ctl_table arrays reduces the build time size and
runtime memory consumed by ~64 bytes per array. With all ctl_table
sentinels gone, the additional check for ->procname == NULL that worked in
tandem with the ARRAY_SIZE to calculate the size of the ctl_table arrays is
no longer needed and has been removed. The sysctl register functions now
returns an error if a sentinel is used.
* Preparation patches for sysctl constification
Constifying ctl_table structs prevents the modification of proc_handler
function pointers as they would reside in .rodata. The ctl_table arguments
in sysctl utility functions are const qualified in preparation for a future
treewide proc_handler argument constification commit.
* Misc fixes
Increase robustness of set_ownership by providing sane default ownership
values in case the callee doesn't set them. Bound check proc_dou8vec_minmax
to avoid loading buggy modules and give sysctl testing module a name to
avoid compiler complaints.
Testing
* This got push to linux-next in v6.10-rc2, so it has had more than a month
of testing
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmaWdz4ACgkQupfNUreW
QU/WKQwAkSuUz42yCQye77BK+Z8ANcTF1f3aI/wfv2nahq1GaSrNBpqUiXvEe9Tt
KD2lM1PWiQfizVLIDPh96yxa5q69GQrPPOA/V1jwIXmk/HRpjjoONCFNNXVRCTls
VCqDz/RatuXvzO35Yn87MnWnxv6PiX7X/zq/3WikVsUI381kvTgC6OwZxdFM52w4
ESwOa3LeOovtRnqV5dpHr6DCQKyd0N52nPxgXvaerjlsJsv7PlezN7z9YyLOOfmW
xUD7X6LQcJq7HcEukaB6I9o2GQOi4yYXL2YOzed7qu9Thu+lasEoN3Bd7P+ilXkc
JY6EXJ5o+d69PewKRuJ1QvD7wrHIkhNMNbMtvehNay124wAHDy3KtonFzyvlX4wE
qCHBYc6rySJNhSqwVp9MoksOZfDM99pVIOs9YVIjc90Zzu5J7tORgYWRVOHTcAtj
fd8nMdkK3+ZANapygFCyew6GueIzaqlQwveVgLGw4vc5L3ClknmURit3y487Pzdg
B+BEVlsp
=bs2G
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Remove "->procname == NULL" check when iterating through sysctl table
arrays
Removing sentinels in ctl_table arrays reduces the build time size
and runtime memory consumed by ~64 bytes per array. With all
ctl_table sentinels gone, the additional check for ->procname == NULL
that worked in tandem with the ARRAY_SIZE to calculate the size of
the ctl_table arrays is no longer needed and has been removed. The
sysctl register functions now returns an error if a sentinel is used.
- Preparation patches for sysctl constification
Constifying ctl_table structs prevents the modification of
proc_handler function pointers as they would reside in .rodata. The
ctl_table arguments in sysctl utility functions are const qualified
in preparation for a future treewide proc_handler argument
constification commit.
- Misc fixes
Increase robustness of set_ownership by providing sane default
ownership values in case the callee doesn't set them. Bound check
proc_dou8vec_minmax to avoid loading buggy modules and give sysctl
testing module a name to avoid compiler complaints.
* tag 'sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: Warn on an empty procname element
sysctl: Remove ctl_table sentinel code comments
sysctl: Remove "child" sysctl code comments
sysctl: Remove superfluous empty allocations from sysctl internals
sysctl: Replace nr_entries with ctl_table_size in new_links
sysctl: Remove check for sentinel element in ctl_table arrays
mm profiling: Remove superfluous sentinel element from ctl_table
locking: Remove superfluous sentinel element from kern_lockdep_table
sysctl: Add module description to sysctl-testing
sysctl: constify ctl_table arguments of utility function
utsname: constify ctl_table arguments of utility function
sysctl: move the extra1/2 boundary check of u8 to sysctl_check_table_array
sysctl: always initialize i_uid/i_gid
- Core:
- Make the takeover of a hrtimer based broadcast timer reliable during
CPU hot-unplug. The current implementation suffers from a race which
can lead to broadcast timer starvation in the worst case.
- VDSO related cleanups and simplifications
- Small cleanups and enhancements all over the place
- PTP:
- Replace the architecture specific base clock to clocksource, e.g. ART
to TSC, conversion function with generic functionality to avoid
exposing such internals to drivers and convert all existing drivers
over. This also allows to provide functionality which converts the
other way round in the core code based on the same parameter set.
- Provide a function to convert CLOCK_REALTIME to the base clock to
support the upcoming PPS output driver on Intel platforms.
- Drivers:
- A set of Device Tree bindings for new hardware
- Cleanups and enhancements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmaUOM0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYofolD/9kK+aYdDj1gCFuZXZ2wTgMMxFmf/91
0UcsGRuBJiIXs3H3iizQ0Mb0cdTW6qZJoBp0jPlvUSm0BEKdEgE1uRX2RuAPZ/Gq
4/54ZJVopKSgAqeJFmqQubRVSv2XdMRAAJT0o1oUG3jZ0c6u8vqArIh5ZCnu13l/
tsNOeYLYzQFyA30eHSJ/KjQ2zHwAhJnl5a/b7pdAvxmlN37bGgKEpglv+9zwFiDB
K/kWbpb/oED9WOmoQy5QYi8iSvLQHEhFGrqzXV3fegu/B/mBBf/bpsisVx7Z1m2R
nzxNqg86RdMjNR6giwBETZjm7YxM+gKb9nCBNILjbjWZFC4tyrBkLGJ+KniTRNyZ
M5R4X1oP/14h00qXmCgIEFWysXaJRewYI+TIm8R2rLXrR6Tf3c4oL6fHQJxy3X52
7A+4Z/vOk/KX6PxYmLC+xQDukhFh2nirVYsP1oNM9yC9zR/wkBBXTTmUSAI+8m8l
KphniSPS2HMSBI6TtgOT8SKY7lRUZTnafBZq7wRXCv0Zz8AXoofgQDmBkXC99BkB
MjLvRotJVJvY9a8LtA7htjDg/jiEMa0wHRNAGNSbflKoAKrJzoE5WbFxFZKbq3vZ
o8cEYRMAIP+X+qn+oymT45XXXQlifZiccJdAi9FqDTvplEib2jmTmH6Ae5Khkr4l
Lbzh/nSKVN7lOg==
=8GjP
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Updates for timers, timekeeping and related functionality:
Core:
- Make the takeover of a hrtimer based broadcast timer reliable
during CPU hot-unplug. The current implementation suffers from a
race which can lead to broadcast timer starvation in the worst
case.
- VDSO related cleanups and simplifications
- Small cleanups and enhancements all over the place
PTP:
- Replace the architecture specific base clock to clocksource, e.g.
ART to TSC, conversion function with generic functionality to avoid
exposing such internals to drivers and convert all existing drivers
over. This also allows to provide functionality which converts the
other way round in the core code based on the same parameter set.
- Provide a function to convert CLOCK_REALTIME to the base clock to
support the upcoming PPS output driver on Intel platforms.
Drivers:
- A set of Device Tree bindings for new hardware
- Cleanups and enhancements all over the place"
* tag 'timers-core-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
clocksource/drivers/realtek: Add timer driver for rtl-otto platforms
dt-bindings: timer: Add schema for realtek,otto-timer
dt-bindings: timer: Add SOPHGO SG2002 clint
dt-bindings: timer: renesas,tmu: Add R-Car Gen2 support
dt-bindings: timer: renesas,tmu: Add RZ/G1 support
dt-bindings: timer: renesas,tmu: Add R-Mobile APE6 support
clocksource/drivers/mips-gic-timer: Correct sched_clock width
clocksource/drivers/mips-gic-timer: Refine rating computation
clocksource/drivers/sh_cmt: Address race condition for clock events
clocksource/driver/arm_global_timer: Remove unnecessary ‘0’ values from err
clocksource/drivers/arm_arch_timer: Remove unnecessary ‘0’ values from irq
tick/broadcast: Make takeover of broadcast hrtimer reliable
tick/sched: Combine WARN_ON_ONCE and print_once
x86/vdso: Remove unused include
x86/vgtod: Remove unused typedef gtod_long_t
x86/vdso: Fix function reference in comment
vdso: Add comment about reason for vdso struct ordering
vdso/gettimeofday: Clarify comment about open coded function
timekeeping: Add missing kernel-doc function comments
tick: Remove unnused tick_nohz_get_idle_calls()
...
debug variables so that KCSAN ignores them.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmaULP4THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoT6pEACXmc34OzO3jbOGEmgt5ch0cYSNvlY0
AL0iAV5JakC8AGDWeDNAUhR5r7tuNqjMmiy/XH+uR/4+xCZLZvQp7flyhrm/W7vd
rB3slu4xqqHizoQe81ZdH3ffg7Cj/Q/zqcJTv44UYkWLlAKA92S79bsn903UHpnL
ENH0IMulpP0b3GedV3GySz476kyAJX4ZJHXfsG71oyWz8gJahXfaDzSMqnMW0bLG
z0u51D9Q2R60zYpEsSPfBCKERKZ+Dzbn/YOYF85kytpXkVQd183JY05IkZmDgxyB
O973GgxvPGXZMXrUfhd+h7Kr17TiG+OKFpxhxgGCQoJNebFUt4A+QFWwQ7/FE/TN
FmjvwTBHllrLpucskivvI6zEETnJB/13XBB/T3k0BMB3cFfUiXdQS0N+xOBVoAhD
CLo21kG+xNPbzuKwzKx1+Vb/FH8/aoKp6py5kQlKAtQ6ddfqyvyGN3TZKYQGl3Hk
9o1ZuwlfkpG0a/0GKvyPcUeLUP0IagGe1wrOard+uL2VRlPRTnr4GH7ItTEedmAY
JRlCD0A1GQzwVtOy+D54W0G0ueW/tX76QzxuIJj5wwmZQpcV37eTOfIbZXnk4RzS
TZJ6gjxSLGbjYMbTiIcTFBU6UXhKjkE30bb5gPdzpXh8QtI1SSqpftZszqTAXWA3
qbMwI0/csYVXsg==
=PuR2
-----END PGP SIGNATURE-----
Merge tag 'core-debugobjects-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects update from Thomas Gleixner:
"A single update for debugobjects to annotate all intentionally racy
global debug variables so that KCSAN ignores them"
* tag 'core-debugobjects-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Annotate racy debug variables
- Make scripts/ld-version.sh robust against the latest LLD
- Fix warnings in rpm-pkg with device tree support
- Fix warnings in fortify tests with KASAN
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmaUM9kVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGHEYQAKvrP1IzwlkANEEj/2qW1iGHlUod
im3cFCKyxrlBar71n15fYtclhK0N4GTVAUMHAk0d1GZo8UjuCeUzurHM4o53Hu1P
D0pXbkmiA0YgndpJYQpSV0CrLxCOCVAEFRf7AgdolVjpNLuba4z0bXSTQfEwfHKC
W1igTL2vGG8citbfHhEGZfB7AIEQBB0LtNkarpsDVD39rG+blZAABBLEtCueSVtw
rVX/Yuny9nDET6tlaCNgr2esNfkrHPIxOSsufeLWdhVVIZprPSGERflhkL3yE299
v6R45ANn72iVNKnnmmjxTNeezIpr74w1NSzBJ0jRM1KRqzbsEuFAf0ZNamtoUJ4r
m4tSu5l7lDj86APvehoO2o07A3omd8vcgLPt+lZlFsBIjorVIKovsjix6pVUgHlS
BTvxbSojbSMUa/NrkbosJkLo/6TzZxYxHKr17nxk+HsXu0i9A9IiHPBK5dTcbtua
olp1MKolQG78FYMwl7v4yQithawRG0mNDLJ2J8oTEIATXQtXV0WAaje73qQFIs6I
cMBEfeaDAMH4z0/VvKZsdksXFPDrrjoW0/x1tPqcAgOSyacPGbki4asn52rwDHT5
mfAzlnUc8ts56sBasArmMpk0z+PKC4MZeFUXNGJf7bZ3NZqoDRDHpb69Aqs11vSw
AJa9Kj07o5YD8N+E
=MMw8
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Make scripts/ld-version.sh robust against the latest LLD
- Fix warnings in rpm-pkg with device tree support
- Fix warnings in fortify tests with KASAN
* tag 'kbuild-fixes-v6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
fortify: fix warnings in fortify tests with KASAN
kbuild: rpm-pkg: avoid the warnings with dtb's listed twice
kbuild: Make ld-version.sh more robust against version string changes
When a software KASAN mode is enabled, the fortify tests emit warnings
on some architectures.
For example, for ARCH=arm, the combination of CONFIG_FORTIFY_SOURCE=y
and CONFIG_KASAN=y produces the following warnings:
TEST lib/test_fortify/read_overflow-memchr.log
warning: unsafe memchr() usage lacked '__read_overflow' warning in lib/test_fortify/read_overflow-memchr.c
TEST lib/test_fortify/read_overflow-memchr_inv.log
warning: unsafe memchr_inv() usage lacked '__read_overflow' symbol in lib/test_fortify/read_overflow-memchr_inv.c
TEST lib/test_fortify/read_overflow-memcmp.log
warning: unsafe memcmp() usage lacked '__read_overflow' warning in lib/test_fortify/read_overflow-memcmp.c
TEST lib/test_fortify/read_overflow-memscan.log
warning: unsafe memscan() usage lacked '__read_overflow' symbol in lib/test_fortify/read_overflow-memscan.c
TEST lib/test_fortify/read_overflow2-memcmp.log
warning: unsafe memcmp() usage lacked '__read_overflow2' warning in lib/test_fortify/read_overflow2-memcmp.c
[ more and more similar warnings... ]
Commit 9c2d1328f8 ("kbuild: provide reasonable defaults for tool
coverage") removed KASAN flags from non-kernel objects by default.
It was an intended behavior because lib/test_fortify/*.c are unit
tests that are not linked to the kernel.
As it turns out, some architectures require -fsanitize=kernel-(hw)address
to define __SANITIZE_ADDRESS__ for the fortify tests.
Without __SANITIZE_ADDRESS__ defined, arch/arm/include/asm/string.h
defines __NO_FORTIFY, thus excluding <linux/fortify-string.h>.
This issue does not occur on x86 thanks to commit 4ec4190be4
("kasan, x86: don't rename memintrinsics in uninstrumented files"),
but there are still some architectures that define __NO_FORTIFY
in such a situation.
Set KASAN_SANITIZE=y explicitly to the fortify tests.
Fixes: 9c2d1328f8 ("kbuild: provide reasonable defaults for tool coverage")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Closes: https://lore.kernel.org/all/0e8dee26-41cc-41ae-9493-10cd1a8e3268@app.fastmail.com/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
initialized in the code path anyway right after on the ARM arch
timer and the ARM global timer (Li kunyu)
- Fix a race condition in the interrupt leading to a deadlock on the
SH CMT driver. Note that this fix was not tested on the platform
using this timer but the fix seems reasonable enough to be picked
confidently (Niklas Söderlund)
- Increase the rating of the gic-timer and use the configured width
clocksource register on the MIPS architecture (Jiaxun Yang)
- Add the DT bindings for the TMU on the Renesas platforms (Geert
Uytterhoeven)
- Add the DT bindings for the SOPHGO SG2002 clint on RiscV (Thomas
Bonnefille)
- Add the rtl-otto timer driver along with the DT bindings for the
Realtek platform (Chris Packham)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmaRQh0ACgkQqDIjiipP
6E+rfQgAqkAWZ9BjswxV8Fg+Hj+a1cSohKjDczqitQF5rJm25X5VvMwlXVa3XQGm
yemh4tKPpll02LOiYCTyqOWzNrkVS9VsoBd5rrYjRX5aSv7UD35EXklLj4P/INwX
O9CRGD6aK4Xbw66xxheYHSSh+2iRs2x2mq61+/VdcIBlAwpQo+vx7McRoJZZI+2t
NFIXw8RF5dDlmmAaqiB0WnPAtcOK3SDo9fu1LEAX1ZAzvbZriLo7XLnL7ibySWVe
BW1n7Ore6PN5Dvz7jMfTsOQsgAlVv6MPfp/s4EDqMfBLVqXNirzXrdhiee/ahnYP
vyzQyU5HPCMiIYS45mhJF0OyDd3wyw==
=wuYA
-----END PGP SIGNATURE-----
Merge tag 'timers-v6.11-rc1' of https://git.linaro.org/people/daniel.lezcano/linux into timers/core
Pull clocksource/event driver updates from Daniel Lezcano:
- Remove unnecessary local variables initialization as they will be
initialized in the code path anyway right after on the ARM arch
timer and the ARM global timer (Li kunyu)
- Fix a race condition in the interrupt leading to a deadlock on the
SH CMT driver. Note that this fix was not tested on the platform
using this timer but the fix seems reasonable enough to be picked
confidently (Niklas Söderlund)
- Increase the rating of the gic-timer and use the configured width
clocksource register on the MIPS architecture (Jiaxun Yang)
- Add the DT bindings for the TMU on the Renesas platforms (Geert
Uytterhoeven)
- Add the DT bindings for the SOPHGO SG2002 clint on RiscV (Thomas
Bonnefille)
- Add the rtl-otto timer driver along with the DT bindings for the
Realtek platform (Chris Packham)
Link: https://lore.kernel.org/all/91cd05de-4c5d-4242-a381-3b8a4fe6a2a2@linaro.org
We checked that "nlimbs" is non-zero in the outside if statement so delete
the duplicate check here.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use the swap() macro to simplify the functions solve_linear_system() and
gf_poly_gcd() and improve their readability. Remove the local variable
tmp.
Fixes the following three Coccinelle/coccicheck warnings reported by
swap.cocci:
WARNING opportunity for swap()
WARNING opportunity for swap()
WARNING opportunity for swap()
Link: https://lkml.kernel.org/r/20240708224023.9312-2-thorsten.blum@toblux.com
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Replace commas between expression statements with semicolons.
Link: https://lkml.kernel.org/r/20240709034323.586185-1-nichen@iscas.ac.cn
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The alloc/copy code pattern is better consolidated to single kstrdup (and
kstrndup) calls instead. This gets rid of deprecated[1] strncpy() uses as
well. Replace one other strncpy() use with the more idiomatic strscpy().
Link: https://github.com/KSPP/linux/issues/90 [1]
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The header file linux/bootconfig.h is included whether __KERNEL__ is
defined or not.
Include it only once before the #ifdef/#else/#endif preprocessor
directives and remove the following make includecheck warning:
linux/bootconfig.h is included more than once
Move the comment to the top and delete the now empty #else block.
Link: https://lore.kernel.org/all/20240711084315.1507-1-thorsten.blum@toblux.com/
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cross-merge networking fixes after downstream PR.
Conflicts:
net/sched/act_ct.c
26488172b0 ("net/sched: Fix UAF when resolving a clash")
3abbd7ed8b ("act_ct: prepare for stolen verdict coming from conntrack and nat engine")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
No identifiable theme here - all are singleton patches, 19 are for MM.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZo7tTQAKCRDdBJ7gKXxA
jvhZAP977PnAwQH5khIS3xJxZrqx/+Tho7UPZzQPvHJPRpHorAD/TZfDazGtlPMD
uLPEVslh18rks/w+kddLrnlBnkpUMwY=
=vhts
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-07-10-13-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"21 hotfixes, 15 of which are cc:stable.
No identifiable theme here - all are singleton patches, 19 are for MM"
* tag 'mm-hotfixes-stable-2024-07-10-13-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
mm/hugetlb: fix potential race in __update_and_free_hugetlb_folio()
filemap: replace pte_offset_map() with pte_offset_map_nolock()
arch/xtensa: always_inline get_current() and current_thread_info()
sched.h: always_inline alloc_tag_{save|restore} to fix modpost warnings
MAINTAINERS: mailmap: update Lorenzo Stoakes's email address
mm: fix crashes from deferred split racing folio migration
lib/build_OID_registry: avoid non-destructive substitution for Perl < 5.13.2 compat
mm: gup: stop abusing try_grab_folio
nilfs2: fix kernel bug on rename operation of broken directory
mm/hugetlb_vmemmap: fix race with speculative PFN walkers
cachestat: do not flush stats in recency check
mm/shmem: disable PMD-sized page cache if needed
mm/filemap: skip to create PMD-sized page cache if needed
mm/readahead: limit page cache size in page_cache_ra_order()
mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray
mm/damon/core: merge regions aggressively when max_nr_regions is unmet
Fix userfaultfd_api to return EINVAL as expected
mm: vmalloc: check if a hash-index is in cpu_possible_mask
mm: prevent derefencing NULL ptr in pfn_section_valid()
...
- Switch some asserts to WARN()
- Fix a few "transaction not locked" asserts in the data read retry
paths and backpointers gc
- Fix a race that would cause the journal to get stuck on a flush commit
- Add missing fsck checks for the fragmentation LRU
- The usual assorted ssorted syzbot fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmaOuRwACgkQE6szbY3K
bnaCHhAAi9VRqws+zx3fSpe2OMwWqAEWA84QgIFJccy+I86d7dXkqG389gFqJwMG
9S3BUHP1WooJmpsTRhK5cNtxZuKKOajXlxUYz3onsF7O/U3dHFY5GU7yIIjXS/0o
q7+iryWAJ4MmlOrAJhgPMH/WlhbSVsjANUN0n/NhlOWHccFGHmpdMTb6aYzb+lfL
iZOONKmEOR65gLzZYlO323OB2Tv00iEbOZAtxk68BLZYX+WON/j1T1A8gK4G0XSX
8wcYpXNxGGkCufjBfAbXf4mcp/WygQq0Wj3bdVMFkZ+AwSJDcfGeK1H7f6tJ9e4n
lqfWL4tgWIckS+41sA96B5cYry9TMDdhu3IeFaAm0ZrF55JT1JySGE1GNA+mo6xA
mkMAqhG7rwYh6nSJfWX0Ie+zJ9TFbmi05ZbI7jaTuQjnJ5uvPpTuRfBDi+qSWmoi
+IBDAi9hZgCUNEsLRGDm7RDQo0dpbFo6jpArn1RHK4MO/HkTrqcKpTqiGnfwFAU4
PFxwq5G9+d38+M6YMX0tXdfQ+fdxroA6aIBJSsIpF18tPRBOBlQsM2GFP34uHbyk
L6HOzed2QpM5ExBmViX79F+obuDQ/gzXQszYvDKL4QTFNbx43gPWRDrGm8EQen6y
12EScamXbUWBSWnOqxscmeUsTdTKxLfw/F43JbE2fE7jSxc5tss=
=VGT8
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-07-10' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
- Switch some asserts to WARN()
- Fix a few "transaction not locked" asserts in the data read retry
paths and backpointers gc
- Fix a race that would cause the journal to get stuck on a flush
commit
- Add missing fsck checks for the fragmentation LRU
- The usual assorted ssorted syzbot fixes
* tag 'bcachefs-2024-07-10' of https://evilpiepirate.org/git/bcachefs: (22 commits)
bcachefs: Add missing bch2_trans_begin()
bcachefs: Fix missing error check in journal_entry_btree_keys_validate()
bcachefs: Warn on attempting a move with no replicas
bcachefs: bch2_data_update_to_text()
bcachefs: Log mount failure error code
bcachefs: Fix undefined behaviour in eytzinger1_first()
bcachefs: Mark bch_inode_info as SLAB_ACCOUNT
bcachefs: Fix bch2_inode_insert() race path for tmpfiles
closures: fix closure_sync + closure debugging
bcachefs: Fix journal getting stuck on a flush commit
bcachefs: io clock: run timer fns under clock lock
bcachefs: Repair fragmentation_lru in alloc_write_key()
bcachefs: add check for missing fragmentation in check_alloc_to_lru_ref()
bcachefs: bch2_btree_write_buffer_maybe_flush()
bcachefs: Add missing printbuf_tabstops_reset() calls
bcachefs: Fix loop restart in bch2_btree_transactions_read()
bcachefs: Fix bch2_read_retry_nodecode()
bcachefs: Don't use the new_fs() bucket alloc path on an initialized fs
bcachefs: Fix shift greater than integer size
bcachefs: Change bch2_fs_journal_stop() BUG_ON() to warning
...
originally, stack closures were only used synchronously, and with the
original implementation of closure_sync() the ref never hit 0; thus,
closure_put_after_sub() assumes that if the ref hits 0 it's on the debug
list, in debug mode.
that's no longer true with the current implementation of closure_sync,
so we need a new magic so closure_debug_destroy() doesn't pop an assert.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZoxN0AAKCRDbK58LschI
g0c5AQDa3ZV9gfbN42y1zSDoM1uOgO60fb+ydxyOYh8l3+OiQQD/fLfpTY3gBFSY
9yi/pZhw/QdNzQskHNIBrHFGtJbMxgs=
=p1Zz
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-07-08
The following pull-request contains BPF updates for your *net-next* tree.
We've added 102 non-merge commits during the last 28 day(s) which contain
a total of 127 files changed, 4606 insertions(+), 980 deletions(-).
The main changes are:
1) Support resilient split BTF which cuts down on duplication and makes BTF
as compact as possible wrt BTF from modules, from Alan Maguire & Eduard Zingerman.
2) Add support for dumping kfunc prototypes from BTF which enables both detecting
as well as dumping compilable prototypes for kfuncs, from Daniel Xu.
3) Batch of s390x BPF JIT improvements to add support for BPF arena and to implement
support for BPF exceptions, from Ilya Leoshkevich.
4) Batch of riscv64 BPF JIT improvements in particular to add 12-argument support
for BPF trampolines and to utilize bpf_prog_pack for the latter, from Pu Lehui.
5) Extend BPF test infrastructure to add a CHECKSUM_COMPLETE validation option
for skbs and add coverage along with it, from Vadim Fedorenko.
6) Inline bpf_get_current_task/_btf() helpers in the arm64 BPF JIT which gives
a small 1% performance improvement in micro-benchmarks, from Puranjay Mohan.
7) Extend the BPF verifier to track the delta between linked registers in order
to better deal with recent LLVM code optimizations, from Alexei Starovoitov.
8) Fix bpf_wq_set_callback_impl() kfunc signature where the third argument should
have been a pointer to the map value, from Benjamin Tissoires.
9) Extend BPF selftests to add regular expression support for test output matching
and adjust some of the selftest when compiled under gcc, from Cupertino Miranda.
10) Simplify task_file_seq_get_next() and remove an unnecessary loop which always
iterates exactly once anyway, from Dan Carpenter.
11) Add the capability to offload the netfilter flowtable in XDP layer through
kfuncs, from Florian Westphal & Lorenzo Bianconi.
12) Various cleanups in networking helpers in BPF selftests to shave off a few
lines of open-coded functions on client/server handling, from Geliang Tang.
13) Properly propagate prog->aux->tail_call_reachable out of BPF verifier, so
that x86 JIT does not need to implement detection, from Leon Hwang.
14) Fix BPF verifier to add a missing check_func_arg_reg_off() to prevent an
out-of-bounds memory access for dynpointers, from Matt Bobrowski.
15) Fix bpf_session_cookie() kfunc to return __u64 instead of long pointer as
it might lead to problems on 32-bit archs, from Jiri Olsa.
16) Enhance traffic validation and dynamic batch size support in xsk selftests,
from Tushar Vyavahare.
bpf-next-for-netdev
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (102 commits)
selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature
bpf: helpers: fix bpf_wq_set_callback_impl signature
libbpf: Add NULL checks to bpf_object__{prev_map,next_map}
selftests/bpf: Remove exceptions tests from DENYLIST.s390x
s390/bpf: Implement exceptions
s390/bpf: Change seen_reg to a mask
bpf: Remove unnecessary loop in task_file_seq_get_next()
riscv, bpf: Optimize stack usage of trampoline
bpf, devmap: Add .map_alloc_check
selftests/bpf: Remove arena tests from DENYLIST.s390x
selftests/bpf: Add UAF tests for arena atomics
selftests/bpf: Introduce __arena_global
s390/bpf: Support arena atomics
s390/bpf: Enable arena
s390/bpf: Support address space cast instruction
s390/bpf: Support BPF_PROBE_MEM32
s390/bpf: Land on the next JITed instruction after exception
s390/bpf: Introduce pre- and post- probe functions
s390/bpf: Get rid of get_probe_mem_regno()
...
====================
Link: https://patch.msgid.link/20240708221438.10974-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Rust code needs to be able to access _copy_from_user and _copy_to_user
so that it can skip the check_copy_size check in cases where the length
is known at compile-time, mirroring the logic for when C code will skip
check_copy_size. To do this, we ensure that exported versions of these
methods are available when CONFIG_RUST is enabled.
Alice has verified that this patch passes the CONFIG_TEST_USER_COPY test
on x86 using the Android cuttlefish emulator.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20240528-alice-mm-v7-2-78222c31b8f4@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
On a system with Perl 5.12.1, commit 5ef6dc08cf
("lib/build_OID_registry: don't mention the full path of the script in
output") causes the build to fail with the error below.
Bareword found where operator expected at ./lib/build_OID_registry line 41, near "s#^\Q$abs_srctree/\E##r"
syntax error at ./lib/build_OID_registry line 41, near "s#^\Q$abs_srctree/\E##r"
Execution of ./lib/build_OID_registry aborted due to compilation errors.
make[3]: *** [lib/Makefile:352: lib/oid_registry_data.c] Error 255
Ahmad Fatoum analyzed that non-destructive substitution is only supported since
Perl 5.13.2. Instead of dropping `r` and having the side effect of modifying
`$0`, introduce a dedicated variable to support older Perl versions.
Link: https://lkml.kernel.org/r/20240702223512.8329-2-pmenzel@molgen.mpg.de
Link: https://lkml.kernel.org/r/20240701155802.75152-1-pmenzel@molgen.mpg.de
Fixes: 5ef6dc08cf ("lib/build_OID_registry: don't mention the full path of the script in output")
Link: https://lore.kernel.org/all/259f7a87-2692-480e-9073-1c1c35b52f67@molgen.mpg.de/
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Suggested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmaB0NweHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGkvwH/36UJRk/o6wvXnyH
E6QjCSWo2226APyWks22NjtC3I/8Iqdvkneuh6wG0qL2sXAB078EMjUq5R81bF8H
wWFBJwetjYTp8GEyLioMEb2wCH/J3R29dLFC4UYTplafXRGP6//xcpJaKmTxcgdR
31IzvTPXbApZ7L3k1U6rA2bK9PNKcFCOvZlrNMUCuwMrabymHsDfOUt1DqXyg2xp
zjqiWYBwlklozmgawSWt/mdEgkWuTcAbg+KyqDVQF59s9aj/OOwZ0j+HACq5V8CM
quTPIAYL6CC9p7uxa69lGr/sgC0Is/BZLPX7RTZAwCgarGvnX+1HUsjDcaFCtrVg
O6fPUV8=
=pgUx
-----END PGP SIGNATURE-----
Merge v6.10-rc6 into drm-next
The exynos-next pull is based on a newer -rc than drm-next. hence
backmerge first to make sure the unrelated conflicts we accumulated
don't end up randomly in the exynos merge pull, but are separated out.
Conflicts are all benign: Adjacent changes in amdgpu and fbdev-dma
code, and cherry-pick conflict in xe.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With ARCH=sh, make allmodconfig && make W=1 C=1 reports: WARNING: modpost:
missing MODULE_DESCRIPTION() in lib/math/rational.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240702-md-sh-lib-math-v1-1-93f4ac4fa8fd@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
With ARCH=csky, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/zlib_deflate/zlib_deflate.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240613-md-csky-lib-zlib_deflate-v1-1-83504d9a27d6@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Cc: Guo Ren <guoren@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Replace the "Sr" with "sr", the example is wrong if sl and N don't have
child nodes, so sr should be red node.
Link: https://lkml.kernel.org/r/20240628142229.69419-1-zxcvb600870024@gmail.com
Signed-off-by: Hsin Chang Yu <zxcvb600870024@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The constraints of the DFLTCC inline assembly are not precise: they do not
communicate the size of the output buffers to the compiler, so it cannot
automatically instrument it.
Add the manual kmsan_unpoison_memory() calls for the output buffers. The
logic is the same as in [1].
[1] 1f5ddcc009
Link: https://lkml.kernel.org/r/20240621113706.315500-21-iii@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reported-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <kasan-dev@googlegroups.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Since the return value of mas_wr_store_entry() is not used,
the return type can be changed to void.
Link: https://lkml.kernel.org/r/20240614092428.29491-1-rgbi3307@gmail.com
Signed-off-by: JaeJoon Jung <rgbi3307@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_hmm.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240531-lib-md-test_hmm-v1-1-e4aa17daa57b@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing
MODULE_DESCRIPTION() in lib/test_maple_tree.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240531-md-lib-test_maple_tree-v1-1-7b1b485aeec3@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_ubsan.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240531-md-lib-test_ubsan-v1-1-c2a80d258842@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_xarray.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240531-md-lib-test_xarray-v1-1-42fd6833bdd4@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The two comments state, that the following code open codes something but
they lack to specify what exactly is open coded.
Expand comments by mentioning the reference to the open coded function.
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20240701-vdso-cleanup-v1-1-36eb64e7ece2@linutronix.de
Fix warning seen with:
$ make allmodconfig && make W=1 C=1 lib/usercopy_kunit.ko
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/usercopy_kunit.o
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Since arch_pick_mmap_layout() is an inline for non-MMU systems, disable
this test there.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202406160505.uBge6TMY-lkp@intel.com/
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
make allmodconfig && make W=1 C=1 now reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_fpu.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240622-md-i386-lib-test_fpu_glue-v1-1-a4e40b7b1264@quicinc.com
Fixes: 9613736d85 ("selftests/fpu: move FP code to a separate translation unit")
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Neither ELF spec not ELF loader require program header to be placed right
after ELF header, but build-id code very much assumes such placement:
See
find_get_page(vma->vm_file->f_mapping, 0);
line and checks against PAGE_SIZE.
Returns errors for now until someone rewrites build-id parser
to be more inline with load_elf_binary().
Link: https://lkml.kernel.org/r/d58bc281-6ca7-467a-9a64-40fa214bd63e@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
simple stuff:
- null ptr/err ptr deref fixes
- fix for getting wedged on shutdown after journal error
- fix missing recalc_capacity() call, capacity now changes correctly
after a device goes read only
however: our capacity calculation still doesn't take into account when
we have mixed ro/rw devices and the ro devices have data on them,
that's going to be a more involved fix to separate accounting for
"capacity used on ro devices" and "capacity used on rw devices"
- boring syzbot stuff
slightly more involved:
- discard, invalidate workers are now per device
this has the effect of simplifying how we take device refs in these
paths, and the device ref cleanup fixes a longstanding race between
the device removal path and the discard path
- fixes for how the debugfs code takes refs on btree_trans objects
we have debugfs code that prints in use btree_trans objects. It uses
closure_get() on trans->ref, which is mainly for the cycle detector,
but the debugfs code was using it on a closure that may have hit 0,
which is not allowed; for performance reasons we cannot avoid having
not-in-use transactions on the global list.
introduce some new primitives to fix this and make the synchronization
here a whole lot saner
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmZ+ye4ACgkQE6szbY3K
bnb4shAAqkKdgB2abIaD1t8+KjUiXwt7seRY4EmzwrEaWniW5bDUYMBvV+tew93j
uvGmSKMs4ML/r24hcg0zGPJ9GoWrFb3MWhPYizzRS8QspsUjsECJuehNPCe3RPaf
QBgQtKahTge1e41y1frzkiGKqaOGOTtUVLOfPIebe+oJAhRCYRnrGY2dkZTms7Ue
aXNtBmnlX3Fkmlm0GiKYrTHpAZz3d0kzdX11Pc2vTXvqo/znuJTTVGnjJkdrHzyv
6cz6YnMKFdxLVbYO1KlB/3Hu9y9qt815g1rjvaqym8pDk9ltsGHNM3LcCCCyp7Of
btnbLQ6TdfggK5Kf2hNYuJRY2pnjNyfcNxupQF3RNaw/D/4G5EU16zfFElORC6Mw
eGwXLvDIGqOSSIvevoRZrgJKAvVptXNg9EtCI5Z5ujQ4ExW8ti1lPHp/r5SVOhyz
x0Am14H2ERuz7Vt5jUas3k74+tAck6JWc5OemMQawA5waeH1inMT7QZuBt+Bmrhx
Av0zbhaq4aTsHXmm+Xi6ofj3UBaOQ2rNzT7Au0kxdvJgDPe/USjw4tejV5DmjmHA
SyRsTG7Zn5xJBi7jc47fcwUgUzlxlffVQGFCVjRUU1vF6u/Ldn7K0zfYbkwSCiKp
iWSEyg3j5z5N69Vrgdadma4xTDjL/C5+XsMWh8G8ohf+crhUeSo=
=svIi
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-06-28' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Simple stuff:
- NULL ptr/err ptr deref fixes
- fix for getting wedged on shutdown after journal error
- fix missing recalc_capacity() call, capacity now changes correctly
after a device goes read only
however: our capacity calculation still doesn't take into account
when we have mixed ro/rw devices and the ro devices have data on
them, that's going to be a more involved fix to separate accounting
for "capacity used on ro devices" and "capacity used on rw devices"
- boring syzbot stuff
Slightly more involved:
- discard, invalidate workers are now per device
this has the effect of simplifying how we take device refs in these
paths, and the device ref cleanup fixes a longstanding race between
the device removal path and the discard path
- fixes for how the debugfs code takes refs on btree_trans objects we
have debugfs code that prints in use btree_trans objects.
It uses closure_get() on trans->ref, which is mainly for the cycle
detector, but the debugfs code was using it on a closure that may
have hit 0, which is not allowed; for performance reasons we cannot
avoid having not-in-use transactions on the global list.
Introduce some new primitives to fix this and make the
synchronization here a whole lot saner"
* tag 'bcachefs-2024-06-28' of https://evilpiepirate.org/git/bcachefs:
bcachefs: Fix kmalloc bug in __snapshot_t_mut
bcachefs: Discard, invalidate workers are now per device
bcachefs: Fix shift-out-of-bounds in bch2_blacklist_entries_gc
bcachefs: slab-use-after-free Read in bch2_sb_errors_from_cpu
bcachefs: Add missing bch2_journal_do_writes() call
bcachefs: Fix null ptr deref in journal_pins_to_text()
bcachefs: Add missing recalc_capacity() call
bcachefs: Fix btree_trans list ordering
bcachefs: Fix race between trans_put() and btree_transactions_read()
closures: closure_get_not_zero(), closure_return_sync()
bcachefs: Make btree_deadlock_to_text() clearer
bcachefs: fix seqmutex_relock()
bcachefs: Fix freeing of error pointers
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/string_kunit.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/string_helpers_kunit.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240531-md-lib-string-v1-1-2738cf057d94@quicinc.com
Signed-off-by: Kees Cook <kees@kernel.org>
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- panic: Monochrome logo support, Various fixes
- ttm: Improve the number of page faults on some platforms, Fix test
build breakage with PREEMPT_RT, more test coverage and various test
improvements
Driver Changes:
- Add missing MODULE_DESCRIPTION where needed
- ipu-v3: Various fixes
- vc4: Monochrome TV support
- bridge:
- analogix_dp: Various improvements and reworks, handle AUX
transfers timeout
- tc358767: Fix DRM_BRIDGE_ATTACH_NO_CONNECTOR, Fix clock
calculations
- panels:
- More transitions to mipi_dsi wrapped functions
- New panels: Lincoln Technologies LCD197, Ortustech COM35H3P70ULC,
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCZn1DmQAKCRDj7w1vZxhR
xYj3AP9ThM8q3HoCqXKerpEfnb5LYDB4NocLjn/Bamtm134oNQD+M4Gu2zLSVymV
74PwtPYuQGKWrmXdw0tD70/MtTAihQc=
=fSI4
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2024-06-27' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.11:
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- panic: Monochrome logo support, Various fixes
- ttm: Improve the number of page faults on some platforms, Fix test
build breakage with PREEMPT_RT, more test coverage and various test
improvements
Driver Changes:
- Add missing MODULE_DESCRIPTION where needed
- ipu-v3: Various fixes
- vc4: Monochrome TV support
- bridge:
- analogix_dp: Various improvements and reworks, handle AUX
transfers timeout
- tc358767: Fix DRM_BRIDGE_ATTACH_NO_CONNECTOR, Fix clock
calculations
- panels:
- More transitions to mipi_dsi wrapped functions
- New panels: Lincoln Technologies LCD197, Ortustech COM35H3P70ULC,
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240627-congenial-pistachio-nyala-848cf4@houat
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
e3f02f32a0 ("ionic: fix kernel panic due to multi-buffer handling")
d9c0420999 ("ionic: Mark error paths in the data path as unlikely")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DIM-related mode and work have been collected in one same place,
so new interfaces are added to provide convenience.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240621101353.107425-5-hengqi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The NetDIM library, currently leveraged by an array of NICs, delivers
excellent acceleration benefits. Nevertheless, NICs vary significantly
in their dim profile list prerequisites.
Specifically, virtio-net backends may present diverse sw or hw device
implementation, making a one-size-fits-all parameter list impractical.
On Alibaba Cloud, the virtio DPU's performance under the default DIM
profile falls short of expectations, partly due to a mismatch in
parameter configuration.
I also noticed that ice/idpf/ena and other NICs have customized
profilelist or placed some restrictions on dim capabilities.
Motivated by this, I tried adding new params for "ethtool -C" that provides
a per-device control to modify and access a device's interrupt parameters.
Usage
========
The target NIC is named ethx.
Assume that ethx only declares support for rx profile setting
(with DIM_PROFILE_RX flag set in profile_flags) and supports modification
of usec and pkt fields.
1. Query the currently customized list of the device
$ ethtool -c ethx
...
rx-profile:
{.usec = 1, .pkts = 256, .comps = n/a,},
{.usec = 8, .pkts = 256, .comps = n/a,},
{.usec = 64, .pkts = 256, .comps = n/a,},
{.usec = 128, .pkts = 256, .comps = n/a,},
{.usec = 256, .pkts = 256, .comps = n/a,}
tx-profile: n/a
2. Tune
$ ethtool -C ethx rx-profile 1,1,n_2,n,n_3,3,n_4,4,n_n,5,n
"n" means do not modify this field.
$ ethtool -c ethx
...
rx-profile:
{.usec = 1, .pkts = 1, .comps = n/a,},
{.usec = 2, .pkts = 256, .comps = n/a,},
{.usec = 3, .pkts = 3, .comps = n/a,},
{.usec = 4, .pkts = 4, .comps = n/a,},
{.usec = 256, .pkts = 5, .comps = n/a,}
tx-profile: n/a
3. Hint
If the device does not support some type of customized dim profiles,
the corresponding "n/a" will display.
If the "n/a" field is being modified, -EOPNOTSUPP will be reported.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240621101353.107425-4-hengqi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DIMLIB's capabilities are supplied by the dim, net_dim, and
rdma_dim objects, and dim's interfaces solely act as a base for
net_dim and rdma_dim and are not explicitly used anywhere else.
rdma_dim is utilized by the infiniband driver, while net_dim
is for network devices, excluding the soc/fsl driver.
In this patch, net_dim relies on some NET's interfaces, thus
DIMLIB needs to explicitly depend on the NET Kconfig.
The soc/fsl driver uses the functions provided by net_dim, so
it also needs to depend on NET.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240621101353.107425-3-hengqi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Useful macros will be used effectively elsewhere.
These will be utilized in subsequent patches.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240621101353.107425-2-hengqi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To make it easier to identify the crashing process, report effective UID
when dumping the stack.
Link: https://lkml.kernel.org/r/20240615041358.103791-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Worst case scenario of plist_add() happens when the priority of the
inserted plist_node is going to be the largest after the insertion is
done. The cost is going to be more significant when the original plist is
longer, because the iterator is going to traverse the whole plist to find
the correct position to insert the new node.
The situation can be avoided by using a reverse iterator at the same time,
doing so the maximum possible number of iteration is going to shrink from
N to N/2.
The proposed change of plist_add pasts the test in lib/plist.c to validate
its correctness, also add the worst case scenario test for plist_add() in
plist_test().
The worst case test are tested with the size of test_data and test_node
growing from 200 to 1000. The result are showned in the following table,
in which we can observed that the proposed change of plist_add performs
better than the original version, and the difference between these two
implementations are more significant with the size of N growing.
The random case test [1], and best case test [2] are also provided, with
result showing the proposed change performs slightly better in random case
test while the original implementation performs slightly better in best
case test, while the difference in both test are minor, we can see them as
even in those two situations.
-----------------------------------------------------------
| Test size | 200 | 400 | 600 | 800 | 1000 |
-----------------------------------------------------------
| new_plist_add | 140911| 548681| 1220512| 2048493| 3763755|
-----------------------------------------------------------
| old_plist_add | 188198| 774222| 1643547| 3008929| 4947435|
-----------------------------------------------------------
Link: https://lkml.kernel.org/r/20240614154603.65203-1-richard120310@gmail.com
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Signed-off-by: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
PANIC_TIMEOUT can also be controlled with the panic= kernel command line
option and the file /proc/sys/kernel/panic. Let's document both of these
in the Kconfig help text.
Link: https://lkml.kernel.org/r/20240607152443.925168-1-bmasney@redhat.com
Signed-off-by: Brian Masney <bmasney@redhat.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_linear_ranges.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240531-md-lib-test_linear_ranges-v1-1-053a1aad37c6@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_kmod.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240531-md-lib-test_kmod-v1-1-fdf11bc6095e@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/siphash_kunit.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240531-md-lib-siphash_kunit-v1-1-38688065b796@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_uuid.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240531-md-lib-test_uuid-v1-1-67fa498104c0@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
make allmodconfig && make W=1 C=1 reports for lib/*kunit:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/bitfield_kunit.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/checksum_kunit.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/cmdline_kunit.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/is_signed_type_kunit.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/overflow_kunit.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/stackinit_kunit.o
Add the missing invocations of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240601-md-lib-kunit-tests-v1-1-4493fe0032b9@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/asn1_encoder.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240601-md-lib-asn1_encoder-v1-1-8c634ed2d2e8@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
make allmodconfig && make W=1 C=1 reports for lib/*_test.ko:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/atomic64_test.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/hashtable_test.o
Add the missing invocations of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240601-md-lib-test2-v1-1-be764b785f17@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/memcpy_kunit.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/fortify_kunit.o
Add the missing invocations of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240531-md-lib-fortify_source-v1-1-2c37f7fbaafc@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
With nearly 20 taint flags and respective characters, it's getting a bit
difficult to remember what each taint flag character means. Add verbose
logging of the set taints in the format:
Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN
in dump_stack_print_info() when there are taints.
Note that the "negative flag" G is not included.
Link: https://lkml.kernel.org/r/7321e306166cb2ca2807ab8639e665baa2462e9c.1717146197.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/ts_kmp.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/ts_bm.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/ts_fsm.o
Add the missing invocations of the MODULE_DESCRIPTION() macro.
Link: https://lkml.kernel.org/r/20240531-lib-ts-v1-1-03d7f3546c49@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There exists an iteration over a plist in plist_check_list(), and memory
dependency exists between variables "prev", "next" and "prev->next". As
plist is used in the scheduling subsystem, we should guarantee the memory
ordering between multiple processors.
Using macro "WRITE_ONCE()" can help us to ensure the memory ordering as
it was stated in "Documentation/memory-barriers.txt".
Link: https://lkml.kernel.org/r/20240526140139.17220-1-richard120310@gmail.com
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Interrupt disable/enable trips are quite expensive on x86-64 compared to a
mere cmpxchg (note: no lock prefix!) and percpu counters are used quite
often.
With this change I get a bump of 1% ops/s for negative path lookups,
plugged into will-it-scale:
void testcase(unsigned long long *iterations, unsigned long nr)
{
while (1) {
int fd = open("/tmp/nonexistent", O_RDONLY);
assert(fd == -1);
(*iterations)++;
}
}
The win would be higher if it was not for other slowdowns, but one has
to start somewhere.
Link: https://lkml.kernel.org/r/20240528204257.434817-1-mjguzik@gmail.com
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dennis Zhou <dennis@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The addition of an if statement in lib/sort to handle the final unsorted 2
or 3 elements is not covered by existing test cases, leading to incomplete
test coverage. To ensure comprehensive testing and maintain 100% code
coverage, add a new testcase for scenarios where the if statement is
triggered.
Since the if statement is only triggered when the array length is odd and
the first element is greater than the second element, a testcase is
created using an array length of TEST_LEN - 1 and a suitable random seed
to maintain full code coverage.
Link: https://lkml.kernel.org/r/20240527203011.1644280-5-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
After building the heap, the code continuously pops two elements from the
heap until only 2 or 3 elements remain, at which point it switches back to
a regular heapsort with one element popped at a time. However, to handle
the final 2 or 3 elements, an additional else-if statement in the while
loop was introduced, potentially increasing branch misses. Moreover, when
there are only 2 or 3 elements left, continuing with regular heapify
operations is unnecessary as these cases are simple enough to be handled
with a single comparison and 1 or 2 swaps outside the while loop.
Eliminating the additional else-if statement and directly managing cases
involving 2 or 3 elements outside the loop reduces unnecessary conditional
branches resulting from the numerous loops and conditionals in heapify.
This optimization maintains consistent numbers of comparisons and swaps
for arrays with even lengths while reducing swaps and comparisons for
arrays with odd lengths from 2.5 swaps and 1 comparison to 1.5 swaps and 1
comparison.
Link: https://lkml.kernel.org/r/20240527203011.1644280-4-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The existing comment in lib/sort refers to glibc qsort() using quicksort.
However, glibc qsort() no longer uses quicksort; it now uses mergesort and
falls back to heapsort if memory allocation for mergesort fails. This
makes the comment outdated and incorrect.
Update the comment to refer to quicksort in general rather than glibc's
implementation to provide accurate information about the comparisons and
trade-offs without implying an outdated implementation.
Link: https://lkml.kernel.org/r/20240527203011.1644280-3-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "lib/sort: Optimizations and cleanups".
This patch series optimizes the handling of the last 2 or 3 elements in
lib/sort and adds a testcase in lib/test_sort to maintain 100% code
coverage reflecting this change. Additionally, it corrects outdated
descriptions regarding glibc qsort() and removes the unused pr_fmt macro.
This patch (of 4):
The pr_fmt macro is defined but not used in lib/sort.c. Since there are
no pr_* functions printing any messages, the pr_fmt macro is redundant and
can be safely removed.
Link: https://lkml.kernel.org/r/20240527203011.1644280-1-visitorckw@gmail.com
Link: https://lkml.kernel.org/r/20240527203011.1644280-2-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add test cases for the min_heap_del() to ensure its functionality is
thoroughly tested.
Link: https://lkml.kernel.org/r/20240524152958.919343-15-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Coly Li <colyli@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Sakai <msakai@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a third parameter 'args' for the 'less' and 'swp' functions in the
'struct min_heap_callbacks'. This additional parameter allows these
comparison and swap functions to handle extra arguments when necessary.
Link: https://lkml.kernel.org/r/20240524152958.919343-9-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Coly Li <colyli@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Sakai <msakai@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Implement a type-safe interface for min_heap using strong type pointers
instead of void * in the data field. This change includes adding small
macro wrappers around functions, enabling the use of __minheap_cast and
__minheap_obj_size macros for type casting and obtaining element size.
This implementation removes the necessity of passing element size in
min_heap_callbacks. Additionally, introduce the MIN_HEAP_PREALLOCATED
macro for preallocating some elements.
Link: https://lkml.kernel.org/ioyfizrzq7w7mjrqcadtzsfgpuntowtjdw5pgn4qhvsdp4mqqg@nrlek5vmisbu
Link: https://lkml.kernel.org/r/20240524152958.919343-5-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Coly Li <colyli@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Sakai <msakai@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
KCSAN has identified a potential data race in debugobjects, where the
global variable debug_objects_maxchain is accessed for both reading and
writing simultaneously in separate and parallel data paths. This results in
the following splat printed by KCSAN:
BUG: KCSAN: data-race in debug_check_no_obj_freed / debug_object_activate
write to 0xffffffff847ccfc8 of 4 bytes by task 734 on cpu 41:
debug_object_activate (lib/debugobjects.c:199 lib/debugobjects.c:564 lib/debugobjects.c:710)
call_rcu (kernel/rcu/rcu.h:227 kernel/rcu/tree.c:2719 kernel/rcu/tree.c:2838)
security_inode_free (security/security.c:1626)
__destroy_inode (./include/linux/fsnotify.h:222 fs/inode.c:287)
...
read to 0xffffffff847ccfc8 of 4 bytes by task 384 on cpu 31:
debug_check_no_obj_freed (lib/debugobjects.c:1000 lib/debugobjects.c:1019)
kfree (mm/slub.c:2081 mm/slub.c:4280 mm/slub.c:4390)
percpu_ref_exit (lib/percpu-refcount.c:147)
css_free_rwork_fn (kernel/cgroup/cgroup.c:5357)
...
value changed: 0x00000070 -> 0x00000071
The data race is actually harmless as this is just used for debugfs
statistics, as all other debug variables.
Annotate all debug variables as racy explicitly, since these variables
are known to be racy and harmless.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240611091813.1189860-1-leitao@debian.org
When CONFIG_FONTS ("Select compiled-in fonts") is not enabled, the user
should not be asked about any fonts. However, when CONFIG_DRM_PANIC is
enabled, the user is still asked about the Sparc console 12x22 and
Terminus 16x32 fonts.
Fix this by moving the "|| DRM_PANIC" to where it belongs.
Split the dependency in two rules to improve readability.
Fixes: b94605a388 ("lib/fonts: Allow to select fonts for drm_panic")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ac474c6755800e61e18bd5af407c6acb449c5149.1718305355.git.geert+renesas@glider.be
Provide new primitives for solving a lifetime issue with bcachefs
btree_trans objects.
closure_sync_return(): like closure_sync(), wait synchronously for any
outstanding gets. like closure_return, the closure is considered
"finished" and the ref left at 0.
closure_get_not_zero(): get a ref on a closure if it's alive, i.e. the
ref is not zero.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Lots of (mostly boring) fixes for syzbot bugs and rare(r) CI bugs.
The LRU_TIME_BITS fix was slightly more involved; we only have 48 bits
for the LRU position (we would prefer 64), so wraparound is possible for
the cached data LRUs on a filesystem that has done sufficient
(petabytes) reads; this is now handled.
One notable user reported bugfix, where we were forgetting to correctly
set the bucket data type, which should have been BCH_DATA_need_gc_gens
instead of BCH_DATA_free; this was causing us to go emergency read-only
on a filesystem that had seen heavy enough use to see bucket gen
wraparoud.
We're now starting to fix simple (safe) errors without requiring user
intervention - i.e. a small incremental step towards full self healing.
This is currently limited to just certain allocation information
counters, and the error is still logged in the superblock; see that
patch for more information. ("bcachefs: Fix safe errors by default").
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmZ22GAACgkQE6szbY3K
bnYJaQ/+Pzep1M9JU9bQRCjmbR1pDkHswqeiVR0DSPTqDSCR0KmoypA+iwAbAmzC
X0Z3bHgh9X36QdnF5s+JSLSzeAdzD74btLeyCI58iH//QaIg7da6tE2FgJCstMt8
11i9a172fhiYLH7YjigZczV10nrWApClS/9qHgY+paVEgMeJgx/3zJwysC1UhuT9
6bsSZKeGMhkGDca3k5hd7mZZKUvFXpE//xe6axK05aTHvd2wDQbDOaMdn07XC+hF
KIWloxYVu9utqprjIq2XHWLJaxRhguHwlI4xq+n8eljLw8Kt6S9lZp7CA85Hq4RA
hLmv1qoqJvh8+YZ7twwYAhflm9mcz58GGKrIqPCG/YaftIktJx3DCOkZzn2b7TmD
iVXBVYkcmlZqLpZzPisKO8omqVkH4YIN/WPIGa1JU/+jkw5Qzpw62K+9AjYowUCp
q47TVWRNtAuL5sct2KVUdTkC5Dkhx7lu3NDvx4jVXfbPsv0ssNYiTKMNnAUqefz/
eM37MCVzmy7OwAymdb5d83CzMIHm0JKetc7CgLBAjOcMLMoLDjRdGEcFGxq/iBMB
2Ty4rUWGFbXlwV1umcYd2cODqIt+iLwmHWCAIXjtTlOw1h5YuwX67wb9zw/tzB1W
JUEetJQWzQ7P/Q1huntNUbiIHw2GbWzeB2u0wBPaVVEgyHWftyk=
=ktka
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-06-22' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Lots of (mostly boring) fixes for syzbot bugs and rare(r) CI bugs.
The LRU_TIME_BITS fix was slightly more involved; we only have 48 bits
for the LRU position (we would prefer 64), so wraparound is possible
for the cached data LRUs on a filesystem that has done sufficient
(petabytes) reads; this is now handled.
One notable user reported bugfix, where we were forgetting to
correctly set the bucket data type, which should have been
BCH_DATA_need_gc_gens instead of BCH_DATA_free; this was causing us to
go emergency read-only on a filesystem that had seen heavy enough use
to see bucket gen wraparoud.
We're now starting to fix simple (safe) errors without requiring user
intervention - i.e. a small incremental step towards full self
healing.
This is currently limited to just certain allocation information
counters, and the error is still logged in the superblock; see that
patch for more information. ("bcachefs: Fix safe errors by default")"
* tag 'bcachefs-2024-06-22' of https://evilpiepirate.org/git/bcachefs: (22 commits)
bcachefs: Move the ei_flags setting to after initialization
bcachefs: Fix a UAF after write_super()
bcachefs: Use bch2_print_string_as_lines for long err
bcachefs: Fix I_NEW warning in race path in bch2_inode_insert()
bcachefs: Replace bare EEXIST with private error codes
bcachefs: Fix missing alloc_data_type_set()
closures: Change BUG_ON() to WARN_ON()
bcachefs: fix alignment of VMA for memory mapped files on THP
bcachefs: Fix safe errors by default
bcachefs: Fix bch2_trans_put()
bcachefs: set_worker_desc() for delete_dead_snapshots
bcachefs: Fix bch2_sb_downgrade_update()
bcachefs: Handle cached data LRU wraparound
bcachefs: Guard against overflowing LRU_TIME_BITS
bcachefs: delete_dead_snapshots() doesn't need to go RW
bcachefs: Fix early init error path in journal code
bcachefs: Check for invalid btree IDs
bcachefs: Fix btree ID bitmasks
bcachefs: Fix shift overflow in read_one_super()
bcachefs: Fix a locking bug in the do_discard_fast() path
...
With ARCH=arm, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/crypto/libsha256.o
Add the missing invocation of the MODULE_DESCRIPTION() macro to all
files which have a MODULE_LICENSE().
This includes sha1.c and utils.c which, although they did not produce
a warning with the arm allmodconfig configuration, may cause this
warning with other configurations.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use existing swap() function rather than duplicating its implementation.
./lib/crypto/mpi/mpi-pow.c:211:11-12: WARNING opportunity for swap().
./lib/crypto/mpi/mpi-pow.c:239:12-13: WARNING opportunity for swap().
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9327
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use existing swap() function rather than duplicating its implementation.
./lib/crypto/mpi/ec.c:1291:20-21: WARNING opportunity for swap().
./lib/crypto/mpi/ec.c:1292:20-21: WARNING opportunity for swap().
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9328
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
UAPI Changes:
- Deprecate DRM date and return a 0 date in DRM_IOCTL_VERSION
Core Changes:
- connector: Create a set of helpers to help with HDMI support
- fbdev: Create memory manager optimized fbdev emulation
- panic: Allow to select fonts, improve drm_fb_dma_get_scanout_buffer
Driver Changes:
- Remove driver owner assignments
- Allow more drivers to compile with COMPILE_TEST
- Conversions to drm_edid
- ivpu: hardware scheduler support, profiling support, improvements
to the platform support layer
- mgag200: general reworks and improvements
- nouveau: Add NVreg_RegistryDwords command line option
- rockchip: Conversion to the hdmi helpers
- sun4i: Conversion to the hdmi helpers
- vc4: Conversion to the hdmi helpers
- v3d: Perf counters improvements
- zynqmp: IRQ and debugfs improvements
- bridge:
- Remove redundant checks on bridge->encoder
- panels:
- Switch panels from register table initialization to proper code
- Now that the panel code tracks the panel state, remove every
ad-hoc implementation in the panel drivers
- New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology
13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0, BOE
nv110wum-l60, IVO t109nw41
-----BEGIN PGP SIGNATURE-----
iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZlhUKAAKCRAnX84Zoj2+
dgHoAYDTpShgXFXnlnMtqZr+ZuShcjcwiqzwM4qNWdtyji9MONtJJU3ZQnGlnXbI
ZU+oZP0Bf0PyT0/8bf+rmZBJ1UdAxt2IQaLkP1tTHOad4E+KlcL5n1opzMi160mB
EZSm9f7aNw==
=bZPt
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2024-05-30' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.11:
UAPI Changes:
- Deprecate DRM date and return a 0 date in DRM_IOCTL_VERSION
Core Changes:
- connector: Create a set of helpers to help with HDMI support
- fbdev: Create memory manager optimized fbdev emulation
- panic: Allow to select fonts, improve drm_fb_dma_get_scanout_buffer
Driver Changes:
- Remove driver owner assignments
- Allow more drivers to compile with COMPILE_TEST
- Conversions to drm_edid
- ivpu: hardware scheduler support, profiling support, improvements
to the platform support layer
- mgag200: general reworks and improvements
- nouveau: Add NVreg_RegistryDwords command line option
- rockchip: Conversion to the hdmi helpers
- sun4i: Conversion to the hdmi helpers
- vc4: Conversion to the hdmi helpers
- v3d: Perf counters improvements
- zynqmp: IRQ and debugfs improvements
- bridge:
- Remove redundant checks on bridge->encoder
- panels:
- Switch panels from register table initialization to proper code
- Now that the panel code tracks the panel state, remove every
ad-hoc implementation in the panel drivers
- New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology
13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0, BOE
nv110wum-l60, IVO t109nw41
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240530-hilarious-flat-magpie-5fa186@houat
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
1e7962114c ("bnxt_en: Restore PTP tx_avail count in case of skb_pad() error")
165f87691a ("bnxt_en: add timestamping statistics support")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All fake flexible arrays should have been removed now, so remove the
special casing that was avoiding checking them. If a destination claims
to be 0 sized, believe it. This is especially important for cases where
__counted_by is in use and may have a 0 element count.
Link: https://lore.kernel.org/r/20240619203105.work.747-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
It would be useful to see what the sched_ext scheduler state is, and what
scheduler is running, when we're dumping a task's stack. This patch
therefore adds a new print_scx_info() function that's called in the same
context as print_worker_info() and print_stop_info(). An example dump
follows.
BUG: kernel NULL pointer dereference, address: 0000000000000999
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP
CPU: 13 PID: 2047 Comm: insmod Tainted: G O 6.6.0-work-10323-gb58d4cae8e99-dirty #34
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS unknown 2/2/2022
Sched_ext: qmap (enabled+all), task: runnable_at=-17ms
RIP: 0010:init_module+0x9/0x1000 [test_module]
...
v3: - scx_ops_enable_state_str[] definition moved to an earlier patch as
it's now used by core implementation.
- Convert jiffy delta to msecs using jiffies_to_msecs() instead of
multiplying by (HZ / MSEC_PER_SEC). The conversion is implemented in
jiffies_delta_msecs().
v2: - We are now using scx_ops_enable_state_str[] outside
CONFIG_SCHED_DEBUG. Move it outside of CONFIG_SCHED_DEBUG and to the
top. This was reported by Changwoo and Andrea.
Signed-off-by: David Vernet <void@manifault.com>
Reported-by: Changwoo Min <changwoo@igalia.com>
Reported-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/find_bit_benchmark.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/cpumask_kunit.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_bitmap.o
Add the missing invocations of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Hardcoding the number of CPUs at compile time does improve code
generation, but if you get it wrong the result will be confusion.
We already limited this earlier to only "experts" (see commit
fe5759d5bf "cpumask: limit visibility of FORCE_NR_CPUS"), but with
distro kernel configs often having EXPERT enabled, that turns out to not
be much of a limit.
To quote the philosophers at Disney: "Everyone can be an expert. And
when everyone's an expert, no one will be".
There's a runtime warning if you then set nr_cpus to anything but the
forced number, but apparently that can be ignored too [1] and by then
it's pretty much too late anyway.
If we had some real way to limit this to "embedded only", maybe it would
be worth it, but let's see if anybody even notices that the option is
gone. We need to simplify kernel configuration anyway.
Link: https://lore.kernel.org/all/20240618105036.208a8860@rorschach.local.home/ [1]
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Paul McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mainly MM singleton fixes. And a couple of ocfs2 regression fixes.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZnCEQAAKCRDdBJ7gKXxA
jmgSAQDk3BYs1n67cnwx/Zi04yMYDyfYTCYg2udPfT2a+GpmbwD+N5dJd/vCztXH
5eLpP11xd/yr2+I9FefyZeUuA80KtgQ=
=2agY
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Mainly MM singleton fixes. And a couple of ocfs2 regression fixes"
* tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
kcov: don't lose track of remote references during softirqs
mm: shmem: fix getting incorrect lruvec when replacing a shmem folio
mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick
mm: fix possible OOB in numa_rebuild_large_mapping()
mm/migrate: fix kernel BUG at mm/compaction.c:2761!
selftests: mm: make map_fixed_noreplace test names stable
mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default
gcov: add support for GCC 14
zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get()
lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n
MAINTAINERS: remove Lorenzo as vmalloc reviewer
Revert "mm: init_mlocked_on_free_v3"
mm/page_table_check: fix crash on ZONE_DEVICE
gcc: disable '-Warray-bounds' for gcc-9
ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger()
ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmZvTbAeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGVksIAJEn4a9IVM8FNCJy
Dxo0BItD1/qJ5mLDptqUFRKlxInjbojofz5CyoeIeXb0DwRfB16ALXqNXAkd3APi
saoOpfjFsg2H2OqL9CHdkzWcJEAq2lDnL0zaOjumeDVu/EyeT+tC4e4hq1e6Bm0E
fPC5ms2b+07DF9Rg6/DW8yPbdM5n6Mz1bRd3fQOIgvpM3yGOyGztEBgTRub/ZUgH
5pNJauknFAZgdiWhgNpc+lPWYZbgHKULQPhUBPdVhDIXPtQNUlKgNTQc6+L0Nmbb
K1sG1q7FLeMJOTFGQfD4r26X5DNQUi894q/9SX8X7rcrECdJKcw2WjVyB4myADpf
ae2gP+A=
=XjWP
-----END PGP SIGNATURE-----
Merge tag 'v6.10-rc4' into driver-core-next
We need the driver core and sysfs fixes in here to build on top of.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmZvTbAeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGVksIAJEn4a9IVM8FNCJy
Dxo0BItD1/qJ5mLDptqUFRKlxInjbojofz5CyoeIeXb0DwRfB16ALXqNXAkd3APi
saoOpfjFsg2H2OqL9CHdkzWcJEAq2lDnL0zaOjumeDVu/EyeT+tC4e4hq1e6Bm0E
fPC5ms2b+07DF9Rg6/DW8yPbdM5n6Mz1bRd3fQOIgvpM3yGOyGztEBgTRub/ZUgH
5pNJauknFAZgdiWhgNpc+lPWYZbgHKULQPhUBPdVhDIXPtQNUlKgNTQc6+L0Nmbb
K1sG1q7FLeMJOTFGQfD4r26X5DNQUi894q/9SX8X7rcrECdJKcw2WjVyB4myADpf
ae2gP+A=
=XjWP
-----END PGP SIGNATURE-----
Merge tag 'v6.10-rc4' into char-misc-next
We need the char-misc and iio fixes in here as well to build on top of.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Memory allocation profiling is trying to register sysctl interface even
when CONFIG_SYSCTL=n, resulting in proc_do_static_key() being undefined.
Prevent that by skipping sysctl registration for such configurations.
Link: https://lkml.kernel.org/r/20240601233831.617124-1-surenb@google.com
Fixes: 22d407b164 ("lib: add allocation tagging support for memory allocation profiling")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405280616.wcOGWJEj-lkp@intel.com/
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Convert the runtime tests of hardened usercopy to standard KUnit tests.
Additionally disable usercopy_test_invalid() for systems with separate
address spaces (or no MMU) since it's not sensible to test for address
confusion there (e.g. m68k).
Co-developed-by: Vitor Massaru Iha <vitor@massaru.org>
Signed-off-by: Vitor Massaru Iha <vitor@massaru.org>
Link: https://lore.kernel.org/r/20200721174654.72132-1-vitor@massaru.org
Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
For tests that need to allocate using vm_mmap() (e.g. usercopy and
execve), provide the interface to have the allocation tracked by KUnit
itself. This requires bringing up a placeholder userspace mm.
This combines my earlier attempt at this with Mark Rutland's version[1].
Normally alloc_mm() and arch_pick_mmap_layout() aren't exported for
modules, so export these only for KUnit testing.
Link: https://lore.kernel.org/lkml/20230321122514.1743889-2-mark.rutland@arm.com/ [1]
Co-developed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
This commit is part of a greater effort to remove all empty elements at
the end of the ctl_table arrays (sentinels) which will reduce the
overall build time size of the kernel and run time memory bloat by ~64
bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
Removed sentinel from memory_allocation_profiling_sysctls
Signed-off-by: Joel Granados <j.granados@samsung.com>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_printf.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_scanf.o
Add the missing invocations of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240531-md-vsprintf-v1-1-d8bc7e21539a@quicinc.com
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
zap_modalias_env() wrongly calculates size of memory block to move, so
will cause OOB memory access issue if variable MODALIAS is not the last
one within its @env parameter, fixed by correcting size to memmove.
Fixes: 9b3fa47d4a ("kobject: fix suppressing modalias in uevents delivered over netlink")
Cc: stable@vger.kernel.org
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Lk Sii <lk_sii@163.com>
Link: https://lore.kernel.org/r/1717074877-11352-1-git-send-email-quic_zijuhu@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZmIsRAAKCRDbK58LschI
g4SSAP0bkl6rPMn7zp1h+/l7hlvpp2aVOmasBTe8hIhAGUbluwD/TGq4sNsGgXFI
i4tUtFRhw8pOjy2guy6526qyJvBs8wY=
=WMhY
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-06-06
We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 50 files changed, 1887 insertions(+), 527 deletions(-).
The main changes are:
1) Add a user space notification mechanism via epoll when a struct_ops
object is getting detached/unregistered, from Kui-Feng Lee.
2) Big batch of BPF selftest refactoring for sockmap and BPF congctl
tests, from Geliang Tang.
3) Add BTF field (type and string fields, right now) iterator support
to libbpf instead of using existing callback-based approaches,
from Andrii Nakryiko.
4) Extend BPF selftests for the latter with a new btf_field_iter
selftest, from Alan Maguire.
5) Add new kfuncs for a generic, open-coded bits iterator,
from Yafang Shao.
6) Fix BPF selftests' kallsyms_find() helper under kernels configured
with CONFIG_LTO_CLANG_THIN, from Yonghong Song.
7) Remove a bunch of unused structs in BPF selftests,
from David Alan Gilbert.
8) Convert test_sockmap section names into names understood by libbpf
so it can deduce program type and attach type, from Jakub Sitnicki.
9) Extend libbpf with the ability to configure log verbosity
via LIBBPF_LOG_LEVEL environment variable, from Mykyta Yatsenko.
10) Fix BPF selftests with regards to bpf_cookie and find_vma flakiness
in nested VMs, from Song Liu.
11) Extend riscv32/64 JITs to introduce shift/add helpers to generate Zba
optimization, from Xiao Wang.
12) Enable BPF programs to declare arrays and struct fields with kptr,
bpf_rb_root, and bpf_list_head, from Kui-Feng Lee.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
selftests/bpf: Drop useless arguments of do_test in bpf_tcp_ca
selftests/bpf: Use start_test in test_dctcp in bpf_tcp_ca
selftests/bpf: Use start_test in test_dctcp_fallback in bpf_tcp_ca
selftests/bpf: Add start_test helper in bpf_tcp_ca
selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca
libbpf: Auto-attach struct_ops BPF maps in BPF skeleton
selftests/bpf: Add btf_field_iter selftests
selftests/bpf: Fix send_signal test with nested CONFIG_PARAVIRT
libbpf: Remove callback-based type/string BTF field visitor helpers
bpftool: Use BTF field iterator in btfgen
libbpf: Make use of BTF field iterator in BTF handling code
libbpf: Make use of BTF field iterator in BPF linker code
libbpf: Add BTF field iterator
selftests/bpf: Ignore .llvm.<hash> suffix in kallsyms_find()
selftests/bpf: Fix bpf_cookie and find_vma in nested VM
selftests/bpf: Test global bpf_list_head arrays.
selftests/bpf: Test global bpf_rb_root arrays and fields in nested struct types.
selftests/bpf: Test kptr arrays and kptrs in nested struct fields.
bpf: limit the number of levels of a nested struct type.
bpf: look into the types of the fields of a struct type recursively.
...
====================
Link: https://lore.kernel.org/r/20240606223146.23020-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a flexible array structure has a __counted_by annotation, its use
with DEFINE_RAW_FLEX() will result in the count being zero-initialized.
This is expected since one doesn't want to use RAW with a counted_by
struct. Adjust the tests to check for the condition and for compiler
support.
Reported-by: Christian Schrefl <chrisi.schrefl@gmail.com>
Closes: https://lore.kernel.org/all/0bfc6b38-8bc5-4971-b6fb-dc642a73fbfe@gmail.com/
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240610182301.work.272-kees@kernel.org
Tested-by: Christian Schrefl <chrisi.schrefl@gmail.com>
Reviewed-by: Christian Schrefl <chrisi.schrefl@gmail.com>
Signed-off-by: Kees Cook <kees@kernel.org>
ACLs in Spectrum-2 and newer ASICs can reside in the algorithmic TCAM
(A-TCAM) or in the ordinary circuit TCAM (C-TCAM). The former can
contain more ACLs (i.e., tc filters), but the number of masks in each
region (i.e., tc chain) is limited.
In order to mitigate the effects of the above limitation, the device
allows filters to share a single mask if their masks only differ in up
to 8 consecutive bits. For example, dst_ip/25 can be represented using
dst_ip/24 with a delta of 1 bit. The C-TCAM does not have a limit on the
number of masks being used (and therefore does not support mask
aggregation), but can contain a limited number of filters.
The driver uses the "objagg" library to perform the mask aggregation by
passing it objects that consist of the filter's mask and whether the
filter is to be inserted into the A-TCAM or the C-TCAM since filters in
different TCAMs cannot share a mask.
The set of created objects is dependent on the insertion order of the
filters and is not necessarily optimal. Therefore, the driver will
periodically ask the library to compute a more optimal set ("hints") by
looking at all the existing objects.
When the library asks the driver whether two objects can be aggregated
the driver only compares the provided masks and ignores the A-TCAM /
C-TCAM indication. This is the right thing to do since the goal is to
move as many filters as possible to the A-TCAM. The driver also forbids
two identical masks from being aggregated since this can only happen if
one was intentionally put in the C-TCAM to avoid a conflict in the
A-TCAM.
The above can result in the following set of hints:
H1: {mask X, A-TCAM} -> H2: {mask Y, A-TCAM} // X is Y + delta
H3: {mask Y, C-TCAM} -> H4: {mask Z, A-TCAM} // Y is Z + delta
After getting the hints from the library the driver will start migrating
filters from one region to another while consulting the computed hints
and instructing the device to perform a lookup in both regions during
the transition.
Assuming a filter with mask X is being migrated into the A-TCAM in the
new region, the hints lookup will return H1. Since H2 is the parent of
H1, the library will try to find the object associated with it and
create it if necessary in which case another hints lookup (recursive)
will be performed. This hints lookup for {mask Y, A-TCAM} will either
return H2 or H3 since the driver passes the library an object comparison
function that ignores the A-TCAM / C-TCAM indication.
This can eventually lead to nested objects which are not supported by
the library [1].
Fix by removing the object comparison function from both the driver and
the library as the driver was the only user. That way the lookup will
only return exact matches.
I do not have a reliable reproducer that can reproduce the issue in a
timely manner, but before the fix the issue would reproduce in several
minutes and with the fix it does not reproduce in over an hour.
Note that the current usefulness of the hints is limited because they
include the C-TCAM indication and represent aggregation that cannot
actually happen. This will be addressed in net-next.
[1]
WARNING: CPU: 0 PID: 153 at lib/objagg.c:170 objagg_obj_parent_assign+0xb5/0xd0
Modules linked in:
CPU: 0 PID: 153 Comm: kworker/0:18 Not tainted 6.9.0-rc6-custom-g70fbc2c1c38b #42
Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:objagg_obj_parent_assign+0xb5/0xd0
[...]
Call Trace:
<TASK>
__objagg_obj_get+0x2bb/0x580
objagg_obj_get+0xe/0x80
mlxsw_sp_acl_erp_mask_get+0xb5/0xf0
mlxsw_sp_acl_atcam_entry_add+0xe8/0x3c0
mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
process_one_work+0x151/0x370
Fixes: 9069a3817d ("lib: objagg: implement optimization hints assembly and use hints for object creation")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The library supports aggregation of objects into other objects only if
the parent object does not have a parent itself. That is, nesting is not
supported.
Aggregation happens in two cases: Without and with hints, where hints
are a pre-computed recommendation on how to aggregate the provided
objects.
Nesting is not possible in the first case due to a check that prevents
it, but in the second case there is no check because the assumption is
that nesting cannot happen when creating objects based on hints. The
violation of this assumption leads to various warnings and eventually to
a general protection fault [1].
Before fixing the root cause, error out when nesting happens and warn.
[1]
general protection fault, probably for non-canonical address 0xdead000000000d90: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1083 Comm: kworker/1:9 Tainted: G W 6.9.0-rc6-custom-gd9b4f1cca7fb #7
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:mlxsw_sp_acl_erp_bf_insert+0x25/0x80
[...]
Call Trace:
<TASK>
mlxsw_sp_acl_atcam_entry_add+0x256/0x3c0
mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
process_one_work+0x151/0x370
worker_thread+0x2cb/0x3e0
kthread+0xd0/0x100
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Fixes: 9069a3817d ("lib: objagg: implement optimization hints assembly and use hints for object creation")
Reported-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 0a020d416d ("lib: introduce initial implementation of object aggregation manager")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 0a020d416d ("lib: introduce initial implementation of object aggregation manager")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/list-test.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
make allmodconfig && make W=1 C=1 reports in lib/kunit:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/kunit/kunit.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/kunit/kunit-test.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/kunit/kunit-example-test.o
Add the missing invocations of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fix the allmodconfig 'make W=1' warnings:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/crypto/libchacha.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/crypto/libarc4.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/crypto/libdes.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/crypto/libpoly1305.o
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
and drivers.
Current release - regressions:
- vxlan: fix regression when dropping packets due to invalid src addresses
- bpf: fix a potential use-after-free in bpf_link_free()
- xdp: revert support for redirect to any xsk socket bound to the same
UMEM as it can result in a corruption
- virtio_net:
- add missing lock protection when reading return code from control_buf
- fix false-positive lockdep splat in DIM
- Revert "wifi: wilc1000: convert list management to RCU"
- wifi: ath11k: fix error path in ath11k_pcic_ext_irq_config
Previous releases - regressions:
- rtnetlink: make the "split" NLM_DONE handling generic, restore the old
behavior for two cases where we started coalescing those messages with
normal messages, breaking sloppily-coded userspace
- wifi:
- cfg80211: validate HE operation element parsing
- cfg80211: fix 6 GHz scan request building
- mt76: mt7615: add missing chanctx ops
- ath11k: move power type check to ASSOC stage, fix connecting
to 6 GHz AP
- ath11k: fix WCN6750 firmware crash caused by 17 num_vdevs
- rtlwifi: ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
- iwlwifi: mvm: fix a crash on 7265
Previous releases - always broken:
- ncsi: prevent multi-threaded channel probing, a spec violation
- vmxnet3: disable rx data ring on dma allocation failure
- ethtool: init tsinfo stats if requested, prevent unintentionally
reporting all-zero stats on devices which don't implement any
- dst_cache: fix possible races in less common IPv6 features
- tcp: auth: don't consider TCP_CLOSE to be in TCP_AO_ESTABLISHED
- ax25: fix two refcounting bugs
- eth: ionic: fix kernel panic in XDP_TX action
Misc:
- tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZh3mUACgkQMUZtbf5S
IrvPwRAApv8X0ZIbPD5PuVEkiYuSkSE6QVou5GaVO7DzF4gj07zPNtCe6B/ZZdBu
RLdlppxjAmVwdCRmUo0plxSydYZcqFpQqV6lRH/rbWmktWIp0pGIOAcOG7ISRPCC
FAYJ4udSt4+wrq0hXTsE1KO1JZ0p7zE2bXxNC8uR8wgM9yonUjqhYdAUZhrl3yCY
zOCD/+kvWFLYtehDcmyNK0ANS3yNveTNkRhXDc1UrpOGMtza60lf5u3bWK+sU5VS
NGPe9cU60WKMQi6QnWFBZKIcp4Vgy2MukOLdNn9e8BRjFLh2dbY86LAmE4HWPA7I
ONZagOfEjeOcRSCMdFHxui/PUDZLBZNhrnqQ6x8uC2yKwwIMr+CgEt5sCmVFwH6n
3HTlWSjL38yuiVuYuhxGchmVnZfC4bLi2qAFF1oxhlDGViBDhAwi36MSCnjDpN8k
Jo0x6crQLS/uvwVXPKWAUcQhy7OE69A3FwwA1PtkxRX5EQPn1if2Z7yq7YfYb9aD
bChvCarlfuVDm+CBItphXg0ajVZc+im7+JK62Zn50A1cTbEK0lnYCOcmqzqiqrXI
Vr3XXt6gVVnvwY374JDO1vmB5ft2IYBn7sWnLcIvR2UlggqEfqMdKSSwm7pOprG9
YJ/LDAXVmG0kLN7rZUYUBLItnpuHAhYDrBOsV5HaFeksWauc1oY=
=mwEJ
-----END PGP SIGNATURE-----
Merge tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from BPF and big collection of fixes for WiFi core and
drivers.
Current release - regressions:
- vxlan: fix regression when dropping packets due to invalid src
addresses
- bpf: fix a potential use-after-free in bpf_link_free()
- xdp: revert support for redirect to any xsk socket bound to the
same UMEM as it can result in a corruption
- virtio_net:
- add missing lock protection when reading return code from
control_buf
- fix false-positive lockdep splat in DIM
- Revert "wifi: wilc1000: convert list management to RCU"
- wifi: ath11k: fix error path in ath11k_pcic_ext_irq_config
Previous releases - regressions:
- rtnetlink: make the "split" NLM_DONE handling generic, restore the
old behavior for two cases where we started coalescing those
messages with normal messages, breaking sloppily-coded userspace
- wifi:
- cfg80211: validate HE operation element parsing
- cfg80211: fix 6 GHz scan request building
- mt76: mt7615: add missing chanctx ops
- ath11k: move power type check to ASSOC stage, fix connecting to
6 GHz AP
- ath11k: fix WCN6750 firmware crash caused by 17 num_vdevs
- rtlwifi: ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
- iwlwifi: mvm: fix a crash on 7265
Previous releases - always broken:
- ncsi: prevent multi-threaded channel probing, a spec violation
- vmxnet3: disable rx data ring on dma allocation failure
- ethtool: init tsinfo stats if requested, prevent unintentionally
reporting all-zero stats on devices which don't implement any
- dst_cache: fix possible races in less common IPv6 features
- tcp: auth: don't consider TCP_CLOSE to be in TCP_AO_ESTABLISHED
- ax25: fix two refcounting bugs
- eth: ionic: fix kernel panic in XDP_TX action
Misc:
- tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB"
* tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
selftests: net: lib: set 'i' as local
selftests: net: lib: avoid error removing empty netns name
selftests: net: lib: support errexit with busywait
net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool()
ipv6: fix possible race in __fib6_drop_pcpu_from()
af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill().
af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen().
af_unix: Use skb_queue_empty_lockless() in unix_release_sock().
af_unix: Use unix_recvq_full_lockless() in unix_stream_connect().
af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen.
af_unix: Annotate data-races around sk->sk_sndbuf.
af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG.
af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb().
af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg().
af_unix: Annotate data-race of sk->sk_state in unix_accept().
af_unix: Annotate data-race of sk->sk_state in unix_stream_connect().
af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll().
af_unix: Annotate data-race of sk->sk_state in unix_inq_len().
af_unix: Annodate data-races around sk->sk_state for writers.
af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
...
GCC 14.1 complains about the argument usage of kmemdup_array():
drivers/soc/tegra/fuse/fuse-tegra.c:130:65: error: 'kmemdup_array' sizes specified with 'sizeof' in the earlier argument and not in the later argument [-Werror=calloc-transposed-args]
130 | fuse->lookups = kmemdup_array(fuse->soc->lookups, sizeof(*fuse->lookups),
| ^
drivers/soc/tegra/fuse/fuse-tegra.c:130:65: note: earlier argument should specify number of elements, later size of each element
The annotation introduced by commit 7d78a77733 ("string: Add
additional __realloc_size() annotations for "dup" helpers") lets the
compiler think that kmemdup_array() follows the same format as calloc(),
with the number of elements preceding the size of one element. So we
could simply swap the arguments to __realloc_size() to get rid of that
warning, but it seems cleaner to instead have kmemdup_array() follow the
same format as krealloc_array(), memdup_array_user(), calloc() etc.
Fixes: 7d78a77733 ("string: Add additional __realloc_size() annotations for "dup" helpers")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20240606144608.97817-2-jean-philippe@linaro.org
Signed-off-by: Kees Cook <kees@kernel.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/math/prime_numbers.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/math/rational-test.o
Add the missing invocations of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240531-md-lib-math-v1-1-11a3bec51ebb@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_dynamic_debug.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240531-md-test_dynamic_debug-v1-1-2194b477f55e@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_rhashtable.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240531-md-lib-test_rhashtable-v1-1-cd6d4138f1b6@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
'klist_test_struct' has been unused since the original
commit 57b4f760f9 ("list: test: Test the klist structure").
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_bpf.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240531-md-lib-test_bpf-v1-1-868e4bd2f9ed@quicinc.com
There are multiple assertion formatting functions in the `assert.c`
file, which are not covered with tests yet. Implement the KUnit test
for these functions.
The test consists of 11 test cases for the following functions:
1) 'is_literal'
2) 'is_str_literal'
3) 'kunit_assert_prologue', test case for multiple assert types
4) 'kunit_assert_print_msg'
5) 'kunit_unary_assert_format'
6) 'kunit_ptr_not_err_assert_format'
7) 'kunit_binary_assert_format'
8) 'kunit_binary_ptr_assert_format'
9) 'kunit_binary_str_assert_format'
10) 'kunit_assert_hexdump'
11) 'kunit_mem_assert_format'
The test aims at maximizing the branch coverage for the assertion
formatting functions.
As you can see, it covers some of the static helper functions as
well, so mark the static functions in `assert.c` as 'VISIBLE_IF_KUNIT'
and conditionally export them with EXPORT_SYMBOL_IF_KUNIT. Add the
corresponding definitions to `assert.h`.
Build the assert test when CONFIG_KUNIT_TEST is enabled, similar to
how it is done for the string stream test.
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Acked-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The functions __kmalloc_noprof(), kmalloc_large_noprof(),
kmalloc_trace_noprof() and their _node variants are all internal to the
implementations of kmalloc_noprof() and kmalloc_node_noprof() and are
only declared in the "public" slab.h and exported so that those
implementations can be static inline and distinguish the build-time
constant size variants. The only other users for some of the internal
functions are slub_kunit and fortify_kunit tests which make very
short-lived allocations.
Therefore we can stop wrapping them with the alloc_hooks() macro.
Instead add a __ prefix to all of them and a comment documenting these
as internal. Also rename __kmalloc_trace() to __kmalloc_cache() which is
more descriptive - it is a variant of __kmalloc() where the exact
kmalloc cache has been already determined.
The usage in fortify_kunit can be removed completely, as the internal
functions should be tested already through kmalloc() tests in the
test variant that passes non-constant allocation size.
Reported-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
A few nilfs2 fixes, the remainder are for MM: a couple of selftests fixes,
various singletons fixing various issues in various parts.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZlIOUgAKCRDdBJ7gKXxA
jrYnAP9UeOw8YchTIsjEllmAbTMAqWGI+54CU/qD78jdIHoVWAEAmp0QqgFW3r2p
jze4jBkh3lGQjykTjkUskaR71h9AZww=
=AHeV
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"16 hotfixes, 11 of which are cc:stable.
A few nilfs2 fixes, the remainder are for MM: a couple of selftests
fixes, various singletons fixing various issues in various parts"
* tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/ksm: fix possible UAF of stable_node
mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
nilfs2: fix potential hang in nilfs_detach_log_writer()
nilfs2: fix unexpected freezing of nilfs_segctor_sync()
nilfs2: fix use-after-free of timer for log writer thread
selftests/mm: fix build warnings on ppc64
arm64: patching: fix handling of execmem addresses
selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
selftests/mm: compaction_test: fix bogus test success on Aarch64
mailmap: update email address for Satya Priya
mm/huge_memory: don't unpoison huge_zero_folio
kasan, fortify: properly rename memintrinsics
lib: add version into /proc/allocinfo output
mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
Bergmann which enables a number of additional build-time warnings. We
fixed all the fallout which we could find, there may still be a few
stragglers.
- Samuel Holland has developed the series "Unified cross-architecture
kernel-mode FPU API". This does a lot of consolidation of
per-architecture kernel-mode FPU usage and enables the use of newer AMD
GPUs on RISC-V.
- Tao Su has fixed some selftests build warnings in the series
"Selftests: Fix compilation warnings due to missing _GNU_SOURCE
definition".
- This pull also includes a nilfs2 fixup from Ryusuke Konishi.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZk6OSAAKCRDdBJ7gKXxA
jpTGAP9hQaZ+g7CO38hKQAtEI8rwcZJtvUAP84pZEGMjYMGLxQD/S8z1o7UHx61j
DUbnunbOkU/UcPx3Fs/gp4KcJARMEgs=
=EPi9
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more non-mm updates from Andrew Morton:
- A series ("kbuild: enable more warnings by default") from Arnd
Bergmann which enables a number of additional build-time warnings. We
fixed all the fallout which we could find, there may still be a few
stragglers.
- Samuel Holland has developed the series "Unified cross-architecture
kernel-mode FPU API". This does a lot of consolidation of
per-architecture kernel-mode FPU usage and enables the use of newer
AMD GPUs on RISC-V.
- Tao Su has fixed some selftests build warnings in the series
"Selftests: Fix compilation warnings due to missing _GNU_SOURCE
definition".
- This pull also includes a nilfs2 fixup from Ryusuke Konishi.
* tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
nilfs2: make block erasure safe in nilfs_finish_roll_forward()
selftests/harness: use 1024 in place of LINE_MAX
Revert "selftests/harness: remove use of LINE_MAX"
selftests/fpu: allow building on other architectures
selftests/fpu: move FP code to a separate translation unit
drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT
drm/amd/display: only use hard-float, not altivec on powerpc
riscv: add support for kernel-mode FPU
x86: implement ARCH_HAS_KERNEL_FPU_SUPPORT
powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT
LoongArch: implement ARCH_HAS_KERNEL_FPU_SUPPORT
lib/raid6: use CC_FLAGS_FPU for NEON CFLAGS
arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGS
arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT
ARM: crypto: use CC_FLAGS_FPU for NEON CFLAGS
ARM: implement ARCH_HAS_KERNEL_FPU_SUPPORT
arch: add ARCH_HAS_KERNEL_FPU_SUPPORT
x86/fpu: fix asm/fpu/types.h include guard
kbuild: enable -Wcast-function-type-strict unconditionally
kbuild: enable -Wformat-truncation on clang
...
nested allocations within stackdepot and page-owner.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZk6MRwAKCRDdBJ7gKXxA
jnzeAP9WHW425N7pWmE7rK7n8oXZK9f356dKJMtz2A35Bx6XJgEAuK86kDRA4Kv3
kg8mtwzOIQYKZWzn5VlcvBbtlhjKGwM=
=9/Ou
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-05-22-17-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more mm updates from Andrew Morton:
"A series from Dave Chinner which cleans up and fixes the handling of
nested allocations within stackdepot and page-owner"
* tag 'mm-stable-2024-05-22-17-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/page-owner: use gfp_nested_mask() instead of open coded masking
stackdepot: use gfp_nested_mask() instead of open coded masking
mm: lift gfp_kmemleak_mask() to gfp.h
Here is the big set of tty/serial driver changes for 6.10-rc1. Included
in here are:
- Usual good set of api cleanups and evolution by Jiri Slaby to make
the serial interfaces move out of the 1990's by using kfifos instead
of hand-rolling their own logic.
- 8250_exar driver updates
- max3100 driver updates
- sc16is7xx driver updates
- exar driver updates
- sh-sci driver updates
- tty ldisc api addition to help refuse bindings
- other smaller serial driver updates
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZk4Cvg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymqpwCgnHU1NeBBUsvoSDOLk5oApIQ4jVgAn102jWlw
3dNDhA4i3Ay/mZdv8/Kj
=TI+P
-----END PGP SIGNATURE-----
Merge tag 'tty-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial updates from Greg KH:
"Here is the big set of tty/serial driver changes for 6.10-rc1.
Included in here are:
- Usual good set of api cleanups and evolution by Jiri Slaby to make
the serial interfaces move out of the 1990's by using kfifos
instead of hand-rolling their own logic.
- 8250_exar driver updates
- max3100 driver updates
- sc16is7xx driver updates
- exar driver updates
- sh-sci driver updates
- tty ldisc api addition to help refuse bindings
- other smaller serial driver updates
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (113 commits)
serial: Clear UPF_DEAD before calling tty_port_register_device_attr_serdev()
serial: imx: Raise TX trigger level to 8
serial: 8250_pnp: Simplify "line" related code
serial: sh-sci: simplify locking when re-issuing RXDMA fails
serial: sh-sci: let timeout timer only run when DMA is scheduled
serial: sh-sci: describe locking requirements for invalidating RXDMA
serial: sh-sci: protect invalidating RXDMA on shutdown
tty: add the option to have a tty reject a new ldisc
serial: core: Call device_set_awake_path() for console port
dt-bindings: serial: brcm,bcm2835-aux-uart: convert to dtschema
tty: serial: uartps: Add support for uartps controller reset
arm64: zynqmp: Add resets property for UART nodes
dt-bindings: serial: cdns,uart: Add optional reset property
serial: 8250_pnp: Switch to DEFINE_SIMPLE_DEV_PM_OPS()
serial: 8250_exar: Keep the includes sorted
serial: 8250_exar: Make type of bit the same in exar_ee_*_bit()
serial: 8250_exar: Use BIT() in exar_ee_read()
serial: 8250_exar: Switch to use dev_err_probe()
serial: 8250_exar: Return directly from switch-cases
serial: 8250_exar: Decrease indentation level
...
Hi Linus,
Please pull patches for 6.10. This includes:
- topology_span_sane() optimization from Kyle Meyer;
- fns() rework from Kuan-Wei Chiu (used in
cpumask_local_spread() and other places); and
- headers cleanup from Andy.
This also adds a MAINTAINERS record for bitops API as it's unattended,
and I'd like to follow it closer.
Thanks,
Yury
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmZKh/kACgkQsUSA/Tof
vshtSQv/eT5+KyXg5qCY3fLaIjWYD0uch5jxkdqtib5BncfIrUMsFpZBon+E2x9C
fWu7K/nfxUjKZF0Sfgl9gVns6K0rC4F24WzHjzWRVVV7+g4idXwMC1kxSX733KQC
o+D2065Dx9EmhnzypBbmNsGQsQ09WXP1GsJLf8qSGCw0lT1zNtgqsAD5sSogFGGn
ca9ZsndThuzTst5lXPXipt1W/c26frchh6SgjVTPjzALCDAf5r9Ls5np3AL1AW8X
yR8cuV9UphT1ysBplzPbBET/Fy/AGbZl1g4u72M6NvGy/nVkQ5Ic4HZj0zIem0Ic
C60PokY8lg6hQ7tWN8da12/g6WZINgZcfUfuodKiQAzryBGUJlW0aDzDUZPcCqB/
gmV/Op4RPJeQr9sibQ6nIFx73ydKVQEmZRliahzXR0p33HJCOLTATOeYqLTXQMdi
ZwhYCqG5fNEUK0VMBy8S4+tEsUAoykU21hFD04b/Ur8A49bxxJ9RDlAUC0IEc1Pj
fiU0VPFx
=H6BQ
-----END PGP SIGNATURE-----
Merge tag 'bitmap-for-6.10v2' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- topology_span_sane() optimization from Kyle Meyer
- fns() rework from Kuan-Wei Chiu (used in cpumask_local_spread() and
other places)
- headers cleanup from Andy
- add a MAINTAINERS record for bitops API
* tag 'bitmap-for-6.10v2' of https://github.com/norov/linux:
usercopy: Don't use "proxy" headers
bitops: Move aligned_byte_mask() to wordpart.h
MAINTAINERS: add BITOPS API record
bitmap: relax find_nth_bit() limitation on return value
lib: make test_bitops compilable into the kernel image
bitops: Optimize fns() for improved performance
lib/test_bitops: Add benchmark test for fns()
Compiler Attributes: Add __always_used macro
sched/topology: Optimize topology_span_sane()
cpumask: Add for_each_cpu_from()
We can easily have up to 24 flags with sane
atomicity, _without_ pushing anything out
of the first cacheline of struct block_device.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZkznRwAKCRBZ7Krx/gZQ
69XpAQDOZCyvYOZ/dlMOKKLf2vAojC/h++E/NjvGt3erbvVN2wEArXMi13ECsoCw
JYJA3MsmvjuY6VNcm24icf2/p4TMIgo=
=JyYi
-----END PGP SIGNATURE-----
Merge tag 'pull-bd_flags-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull bdev flags update from Al Viro:
"Compactifying bdev flags.
We can easily have up to 24 flags with sane atomicity, _without_
pushing anything out of the first cacheline of struct block_device"
* tag 'pull-bd_flags-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
bdev: move ->bd_make_it_fail to ->__bd_flags
bdev: move ->bd_ro_warned to ->__bd_flags
bdev: move ->bd_has_subit_bio to ->__bd_flags
bdev: move ->bd_write_holder into ->__bd_flags
bdev: move ->bd_read_only to ->__bd_flags
bdev: infrastructure for flags
wrapper for access to ->bd_partno
Use bdev_is_paritition() instead of open-coding it
Update header inclusions to follow IWYU (Include What You Use)
principle.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
The bitops.h is for bit related operations. The aligned_byte_mask()
is about byte (or part of the machine word) operations, for which
we have a separate header, move the mentioned macro to wordpart.h
to consolidate similar operations.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
The stackdepot code is used by KASAN and lockdep for recoding stack
traces. Both of these track allocation context information, and so their
internal allocations must obey the caller allocation contexts to avoid
generating their own false positive warnings that have nothing to do with
the code they are instrumenting/tracking.
We also don't want recording stack traces to deplete emergency memory
reserves - debug code is useless if it creates new issues that can't be
replicated when the debug code is disabled.
Switch the stackdepot allocation masking to use gfp_nested_mask() to
address these issues. gfp_nested_mask() also strips GFP_ZONEMASK
naturally, so that greatly simplifies this code.
Link: https://lkml.kernel.org/r/20240430054604.4169568-3-david@fromorbit.com
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Now that ARCH_HAS_KERNEL_FPU_SUPPORT provides a common way to compile and
run floating-point code, this test is no longer x86-specific.
Link: https://lkml.kernel.org/r/20240329072441.591471-16-samuel.holland@sifive.com
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: WANG Xuerui <git@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This ensures no compiler-generated floating-point code can appear outside
kernel_fpu_{begin,end}() sections, and some architectures enforce this
separation.
Link: https://lkml.kernel.org/r/20240329072441.591471-15-samuel.holland@sifive.com
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: WANG Xuerui <git@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.
Link: https://lkml.kernel.org/r/20240329072441.591471-7-samuel.holland@sifive.com
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: WANG Xuerui <git@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Notable series include:
- Some maintenance and performance work for ocfs2 in Heming Zhao's
series "improve write IO performance when fragmentation is high".
- Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes
exposed by fstests".
- kfifo header rework from Andy Shevchenko in the series "kfifo: Clean
up kfifo.h".
- GDB script fixes from Florian Rommel in the series "scripts/gdb: Fixes
for $lx_current and $lx_per_cpu".
- After much discussion, a coding-style update from Barry Song
explaining one reason why inline functions are preferred over macros.
The series is "codingstyle: avoid unused parameters for a function-like
macro".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkpLYQAKCRDdBJ7gKXxA
jo9NAQDctSD3TMXqxqCHLaEpCaYTYzi6TGAVHjgkqGzOt7tYjAD/ZIzgcmRwthjP
R7SSiSgZ7UnP9JRn16DQILmFeaoG1gs=
=lYhr
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-mm updates from Andrew Morton:
"Mainly singleton patches, documented in their respective changelogs.
Notable series include:
- Some maintenance and performance work for ocfs2 in Heming Zhao's
series "improve write IO performance when fragmentation is high".
- Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes
exposed by fstests".
- kfifo header rework from Andy Shevchenko in the series "kfifo:
Clean up kfifo.h".
- GDB script fixes from Florian Rommel in the series "scripts/gdb:
Fixes for $lx_current and $lx_per_cpu".
- After much discussion, a coding-style update from Barry Song
explaining one reason why inline functions are preferred over
macros. The series is "codingstyle: avoid unused parameters for a
function-like macro""
* tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (62 commits)
fs/proc: fix softlockup in __read_vmcore
nilfs2: convert BUG_ON() in nilfs_finish_roll_forward() to WARN_ON()
scripts: checkpatch: check unused parameters for function-like macro
Documentation: coding-style: ask function-like macros to evaluate parameters
nilfs2: use __field_struct() for a bitwise field
selftests/kcmp: remove unused open mode
nilfs2: remove calls to folio_set_error() and folio_clear_error()
kernel/watchdog_perf.c: tidy up kerneldoc
watchdog: allow nmi watchdog to use raw perf event
watchdog: handle comma separated nmi_watchdog command line
nilfs2: make superblock data array index computation sparse friendly
squashfs: remove calls to set the folio error flag
squashfs: convert squashfs_symlink_read_folio to use folio APIs
scripts/gdb: fix detection of current CPU in KGDB
scripts/gdb: make get_thread_info accept pointers
scripts/gdb: fix parameter handling in $lx_per_cpu
scripts/gdb: fix failing KGDB detection during probe
kfifo: don't use "proxy" headers
media: stih-cec: add missing io.h
media: rc: add missing io.h
...
- More safety fixes, primarily found by syzbot
- Run the upgrade/downgrade paths in nochnages mode. Nochanges mode is
primarily for testing fsck/recovery in dry run mode, so it shouldn't
change anything besides disabling writes and holding dirty metadata in
memory.
The idea here was to reduce the amount of activity if we can't write
anything out, so that bringing up a filesystem in "super ro" mode
would be more lilkely to work for data recovery - but norecovery is
the correct option for this.
- btree_trans->locked; we now track whether a btree_trans has any btree
nodes locked, and this is used for improved assertions related to
trans_unlock() and trans_relock(). We'll also be using it for
improving how we work with lockdep in the future: we don't want
lockdep to be tracking individual btree node locks because we take too
many for lockdep to track, and it's not necessary since we have a
cycle detector.
- Trigger improvements that are prep work for online fsck
- BTREE_TRIGGER_check_repair; this regularizes how we do some repair
work for extents that goes with running triggers in fsck, and fixes
some subtle issues with transaction restarts there.
- bch2_snapshot_equiv() has now been ripped out of fsck.c; snapshot
equivalence classes are for when snapshot deletion leaves behind
redundant snapshot nodes, but snapshot deletion now cleans this up
right away, so the abstraction doesn't need to leak.
- Improvements to how we resume writing to the journal in recovery. The
code for picking the new place to write when reading the journal is
greatly simplified and we also store the position in the superblock
for when we don't read the journal; this means that we preserve more
of the journal for list_journal debugging.
- Improvements to sysfs btree_cache and btree_node_cache, for debugging
memory reclaim.
- We now detect when we've blocked for 10 seconds on the allocator in
the write path and dump some useful info.
- Safety fixes for devices references: this is a big series that changes
almost all device lookups to properly check if the device exists and
take a reference to it.
Previously we assumed that if a bkey exists that references a device
then the device must exist, and this was enforced in .invalid methods,
but this was incorrect because it meant device removal relied on
accounting being correct to not leave keys pointing to invalid
devices, and that's not something we can assume.
Getting the "pointer to invalid device" checks out of our .invalid()
methods fixes some long standing device removal bugs; the only
outstanding bug with device removal now is a race between the discard
path and deleting alloc info, which should be easily fixed.
- The allocator now prefers not to expand the new
member_info.btree_allocated bitmap, meaning if repair ever requires
scanning for btree nodes (because of a corrupt interior nodes) we
won't have to scan the whole device(s).
- New coding style document, which among other things talks about the
correct usage of assertions
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmZKJQgACgkQE6szbY3K
bnZETg//SU9H0OHnBSMB/cteF6PKo9QR+dhT+3n+gWTxl0o/egbGTqwbzVqGtd2f
J6II1BsDk8VoTOb/gFfLRShlmJfnj2jpRThU265faR/7LQYeSaqndDPkjOpTayAD
Nj/DJyiSUTL753rZh3yUhOpOIHf7iapH6wuaZCPfhdfk+yvZNW8iz07JHjHLKRp8
I2cFH0r6kN916NdRkt9oDCz68WouT8eWTqwcKra04XsLEZjNJHxLpKMq4M8UdPc7
YynJPVt+aP8+VduGIq6pV8Co3afCP2oUywo11JpRmvLsw4tex/59wxOYtpMfgn6k
4H+9WqiBwkbmnLDrfFHWRameS6F/7+GRAOVuz9nkmfk61UPU15gLjSRffqZ6u2YC
7vbrXgebId/sZXtBpQd83RMMX52BnEJah0upNJ54IsSqfDYkU9lwl6CEyYpcX1hf
YNBGBTbspZztc3AB13b3ow421FMhaySUg0FDmntMR9O8Z6/BXk7Ykc7b8DPEfrFs
W6JY7q+ARBxr+EgFcV74fvMCf7NJTAhyv80AKryo7NFU2JZOyyaTxcTGSnolX4Mi
lyHiOgicmOX+vy3vbC1dZoDcmIDJ4Uc0vixYcpKiZqxlR8XJ+wpevC50TEhxrcW+
ZO4SloQvgyjI34xu/gZgjRYb3BhXK3x+ougVFpRG8V8zQ/+ccWg=
=MKrF
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs updates from Kent Overstreet:
- More safety fixes, primarily found by syzbot
- Run the upgrade/downgrade paths in nochnages mode. Nochanges mode is
primarily for testing fsck/recovery in dry run mode, so it shouldn't
change anything besides disabling writes and holding dirty metadata
in memory.
The idea here was to reduce the amount of activity if we can't write
anything out, so that bringing up a filesystem in "super ro" mode
would be more lilkely to work for data recovery - but norecovery is
the correct option for this.
- btree_trans->locked; we now track whether a btree_trans has any btree
nodes locked, and this is used for improved assertions related to
trans_unlock() and trans_relock(). We'll also be using it for
improving how we work with lockdep in the future: we don't want
lockdep to be tracking individual btree node locks because we take
too many for lockdep to track, and it's not necessary since we have a
cycle detector.
- Trigger improvements that are prep work for online fsck
- BTREE_TRIGGER_check_repair; this regularizes how we do some repair
work for extents that goes with running triggers in fsck, and fixes
some subtle issues with transaction restarts there.
- bch2_snapshot_equiv() has now been ripped out of fsck.c; snapshot
equivalence classes are for when snapshot deletion leaves behind
redundant snapshot nodes, but snapshot deletion now cleans this up
right away, so the abstraction doesn't need to leak.
- Improvements to how we resume writing to the journal in recovery. The
code for picking the new place to write when reading the journal is
greatly simplified and we also store the position in the superblock
for when we don't read the journal; this means that we preserve more
of the journal for list_journal debugging.
- Improvements to sysfs btree_cache and btree_node_cache, for debugging
memory reclaim.
- We now detect when we've blocked for 10 seconds on the allocator in
the write path and dump some useful info.
- Safety fixes for devices references: this is a big series that
changes almost all device lookups to properly check if the device
exists and take a reference to it.
Previously we assumed that if a bkey exists that references a device
then the device must exist, and this was enforced in .invalid
methods, but this was incorrect because it meant device removal
relied on accounting being correct to not leave keys pointing to
invalid devices, and that's not something we can assume.
Getting the "pointer to invalid device" checks out of our .invalid()
methods fixes some long standing device removal bugs; the only
outstanding bug with device removal now is a race between the discard
path and deleting alloc info, which should be easily fixed.
- The allocator now prefers not to expand the new
member_info.btree_allocated bitmap, meaning if repair ever requires
scanning for btree nodes (because of a corrupt interior nodes) we
won't have to scan the whole device(s).
- New coding style document, which among other things talks about the
correct usage of assertions
* tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs: (155 commits)
bcachefs: add no_invalid_checks flag
bcachefs: add counters for failed shrinker reclaim
bcachefs: Fix sb_field_downgrade validation
bcachefs: Plumb bch_validate_flags to sb_field_ops.validate()
bcachefs: s/bkey_invalid_flags/bch_validate_flags
bcachefs: fsync() should not return -EROFS
bcachefs: Invalid devices are now checked for by fsck, not .invalid methods
bcachefs: kill bch2_dev_bkey_exists() in bch2_check_fix_ptrs()
bcachefs: kill bch2_dev_bkey_exists() in bch2_read_endio()
bcachefs: bch2_dev_get_ioref() checks for device not present
bcachefs: bch2_dev_get_ioref2(); io_read.c
bcachefs: bch2_dev_get_ioref2(); debug.c
bcachefs: bch2_dev_get_ioref2(); journal_io.c
bcachefs: bch2_dev_get_ioref2(); io_write.c
bcachefs: bch2_dev_get_ioref2(); btree_io.c
bcachefs: bch2_dev_get_ioref2(); backpointers.c
bcachefs: bch2_dev_get_ioref2(); alloc_background.c
bcachefs: for_each_bset() declares loop iter
bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.h
bcachefs: Improve sysfs internal/btree_cache
...
documented (hopefully adequately) in the respective changelogs. Notable
series include:
- Lucas Stach has provided some page-mapping
cleanup/consolidation/maintainability work in the series "mm/treewide:
Remove pXd_huge() API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being allocated:
number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in largely
similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
Weiner has fixed the page allocator's handling of migratetype requests,
with resulting improvements in compaction efficiency.
- In the series "make the hugetlb migration strategy consistent" Baolin
Wang has fixed a hugetlb migration issue, which should improve hugetlb
allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when memory
almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting" Kairui
Song has optimized pagecache insertion, yielding ~10% performance
improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various page->flags
cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
"mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs. This
is a simple first-cut implementation for now. The series is "support
multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in the
series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts in
the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes
the initialization code so that migration between different memory types
works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant driver
in the series "mm: follow_pte() improvements and acrn follow_pte()
fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to folio
in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size THP's
in the series "mm: add per-order mTHP alloc and swpout counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His series
"mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback instrumentation
in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series "Fix
and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
series "Improve anon_vma scalability for anon VMAs". Intel's test bot
reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
=V3R/
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
Normal set of driver updates and small fixes:
- Small improvements and fixes for erdma, efa, hfi1, bnxt_re
- Fix a UAF crash after module unload on leaking restrack entry
- Continue adding full RDMA support in mana with support for EQs, GID's
and CQs
- Improvements to the mkey cache in mlx5
- DSCP traffic class support in hns and several bug fixes
- Cap the maximum number of MADs in the receive queue to avoid OOM
- Another batch of rxe bug fixes from large scale testing
- __iowrite64_copy() optimizations for write combining MMIO memory
- Remove NULL checks before dev_put/hold()
- EFA support for receive with immediate
- Fix a recent memleaking regression in a cma error path
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZkeo2gAKCRCFwuHvBreF
YbuNAQChzGmS4F0JAn5Wj0CDvkZghELqtvzEb92SzqcgdyQafAD/fC7f23LJ4OsO
1ZIaQEZu7j9DVg5PKFZ7WfdXjGTKqwA=
=QRXg
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Aside from the usual things this has an arch update for
__iowrite64_copy() used by the RDMA drivers.
This API was intended to generate large 64 byte MemWr TLPs on PCI.
These days most processors had done this by just repeating writel() in
a loop. S390 and some new ARM64 designs require a special helper to
get this to generate.
- Small improvements and fixes for erdma, efa, hfi1, bnxt_re
- Fix a UAF crash after module unload on leaking restrack entry
- Continue adding full RDMA support in mana with support for EQs,
GID's and CQs
- Improvements to the mkey cache in mlx5
- DSCP traffic class support in hns and several bug fixes
- Cap the maximum number of MADs in the receive queue to avoid OOM
- Another batch of rxe bug fixes from large scale testing
- __iowrite64_copy() optimizations for write combining MMIO memory
- Remove NULL checks before dev_put/hold()
- EFA support for receive with immediate
- Fix a recent memleaking regression in a cma error path"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (70 commits)
RDMA/cma: Fix kmemleak in rdma_core observed during blktests nvme/rdma use siw
RDMA/IPoIB: Fix format truncation compilation errors
bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwq
RDMA/efa: Support QP with unsolicited write w/ imm. receive
IB/hfi1: Remove generic .ndo_get_stats64
IB/hfi1: Do not use custom stat allocator
RDMA/hfi1: Use RMW accessors for changing LNKCTL2
RDMA/mana_ib: implement uapi for creation of rnic cq
RDMA/mana_ib: boundary check before installing cq callbacks
RDMA/mana_ib: introduce a helper to remove cq callbacks
RDMA/mana_ib: create and destroy RNIC cqs
RDMA/mana_ib: create EQs for RNIC CQs
RDMA/core: Remove NULL check before dev_{put, hold}
RDMA/ipoib: Remove NULL check before dev_{put, hold}
RDMA/mlx5: Remove NULL check before dev_{put, hold}
RDMA/mlx5: Track DCT, DCI and REG_UMR QPs as diver_detail resources.
RDMA/core: Add an option to display driver-specific QPs in the rdmatool
RDMA/efa: Add shutdown notifier
RDMA/mana_ib: Fix missing ret value
IB/mlx5: Use __iowrite64_copy() for write combining stores
...
- Avoid 'constexpr', which is a keyword in C23
- Allow 'dtbs_check' and 'dt_compatible_check' run independently of
'dt_binding_check'
- Fix weak references to avoid GOT entries in position-independent
code generation
- Convert the last use of 'optional' property in arch/sh/Kconfig
- Remove support for the 'optional' property in Kconfig
- Remove support for Clang's ThinLTO caching, which does not work with
the .incbin directive
- Change the semantics of $(src) so it always points to the source
directory, which fixes Makefile inconsistencies between upstream and
downstream
- Fix 'make tar-pkg' for RISC-V to produce a consistent package
- Provide reasonable default coverage for objtool, sanitizers, and
profilers
- Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.
- Remove the last use of tristate choice in drivers/rapidio/Kconfig
- Various cleanups and fixes in Kconfig
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmZFlGcVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsG8voQALC8NtFpduWVfLRj2Qg6Ll/xf1vX
2igcTJEOFHkeqXLGoT8dTDKLEipUBUvKyguPq66CGwVTe2g6zy/nUSXeVtFrUsIa
msLTi8FqhqUo5lodNvGMRf8qqmuqcvnXoiQwIocF92jtsFy14bhiFY+n4HfcFNjj
GOKwqBZYQUwY/VVb090efc7RfS9c7uwABJSBelSoxg3AGZriwjGy7Pw5aSKGgVYi
inqL1eR6qwPP6z7CgQWM99soP+zwybFZmnQrsD9SniRBI4rtAat8Ih5jQFaSUFUQ
lk2w0NQBRFN88/uR2IJ2GWuIlQ74WeJ+QnCqVuQ59tV5zw90wqSmLzngfPD057Dv
JjNuhk0UyXVtpIg3lRtd4810ppNSTe33b9OM4O2H846W/crju5oDRNDHcflUXcwm
Rmn5ho1rb5QVzDVejJbgwidnUInSgJ9PZcvXQ/RJVZPhpgsBzAY9pQexG1G3hviw
y9UDrt6KP6bF9tHjmolmtdIes9Pj0c4dN6/Rdj4HS4hIQ/GDar0tnwvOvtfUctNL
orJlBsA6GeMmDVXKkR0ytOCWRYqWWbyt8g70RVKQJfuHX7/hGyAQPaQ2/u4mQhC2
aevYfbNJMj0VDfGz81HDBKFtkc5n+Ite8l157dHEl2LEabkOkRdNVcn7SNbOvZmd
ZCSnZ31h7woGfNho
=D5B/
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Avoid 'constexpr', which is a keyword in C23
- Allow 'dtbs_check' and 'dt_compatible_check' run independently of
'dt_binding_check'
- Fix weak references to avoid GOT entries in position-independent code
generation
- Convert the last use of 'optional' property in arch/sh/Kconfig
- Remove support for the 'optional' property in Kconfig
- Remove support for Clang's ThinLTO caching, which does not work with
the .incbin directive
- Change the semantics of $(src) so it always points to the source
directory, which fixes Makefile inconsistencies between upstream and
downstream
- Fix 'make tar-pkg' for RISC-V to produce a consistent package
- Provide reasonable default coverage for objtool, sanitizers, and
profilers
- Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.
- Remove the last use of tristate choice in drivers/rapidio/Kconfig
- Various cleanups and fixes in Kconfig
* tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits)
kconfig: use sym_get_choice_menu() in sym_check_prop()
rapidio: remove choice for enumeration
kconfig: lxdialog: remove initialization with A_NORMAL
kconfig: m/nconf: merge two item_add_str() calls
kconfig: m/nconf: remove dead code to display value of bool choice
kconfig: m/nconf: remove dead code to display children of choice members
kconfig: gconf: show checkbox for choice correctly
kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal
Makefile: remove redundant tool coverage variables
kbuild: provide reasonable defaults for tool coverage
modules: Drop the .export_symbol section from the final modules
kconfig: use menu_list_for_each_sym() in sym_check_choice_deps()
kconfig: use sym_get_choice_menu() in conf_write_defconfig()
kconfig: add sym_get_choice_menu() helper
kconfig: turn defaults and additional prompt for choice members into error
kconfig: turn missing prompt for choice members into error
kconfig: turn conf_choice() into void function
kconfig: use linked list in sym_set_changed()
kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED
kconfig: gconf: remove debug code
...
- tracing/probes: Adding new pseudo-types %pd and %pD support for dumping
dentry name from 'struct dentry *' and file name from 'struct file *'.
- uprobes: Some performance optimizations have been done.
. Speed up the BPF uprobe event by delaying the fetching of the uprobe
event arguments that are not used in BPF.
. Avoid locking by speculatively checking whether uprobe event is valid.
. Reduce lock contention by using read/write_lock instead of spinlock for
uprobe list operation. This improved BPF uprobe benchmark result 43% on
average.
- rethook: Removes non-fatal warning messages when tracing stack from BPF
and skip rcu_is_watching() validation in rethook if possible.
- objpool: Optimizing objpool (which is used by kretprobes and fprobe as
rethook backend storage) by inlining functions and avoid caching nr_cpu_ids
because it is a const value.
- fprobe: Add entry/exit callbacks types (code cleanup)
- kprobes: Check ftrace was killed in kprobes if it uses ftrace.
-----BEGIN PGP SIGNATURE-----
iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmZFUxsbHG1hc2FtaS5o
aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8b+fIH/A96/SeC5WRLhXmHfTCM
IvKUea2n0b0oV/2pVfHqfkCBTICuUZ97Opd9VH9jLtjBOTh0fUOGZ2DNVGdSYfWm
IIkS5dhuZxHXrSHEVYykwLHI3AOL7Q6Ny9EmOg1CNMidUkPMNtBvppsBYPlFU/B/
qQJAvOdkVOnNITCaas0+MNgepoVVKdJzdNQ1I4WrGyG8isCZBaCYKo2QcGyheCNN
y8NXvnVHgmgHQ8nTaeE5AawclFzFnhwHfPQPe1kiyGrx15b8K+VYmaZxPKv33A1a
KT3TKJ1Ep7s7iWFh2iPVJzIwOXCmSnvNTKfNx/MDuKtO7UVfFwytoMEaekbmv3bG
VqM=
=n/mW
-----END PGP SIGNATURE-----
Merge tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
- tracing/probes: Add new pseudo-types %pd and %pD support for dumping
dentry name from 'struct dentry *' and file name from 'struct file *'
- uprobes performance optimizations:
- Speed up the BPF uprobe event by delaying the fetching of the
uprobe event arguments that are not used in BPF
- Avoid locking by speculatively checking whether uprobe event is
valid
- Reduce lock contention by using read/write_lock instead of
spinlock for uprobe list operation. This improved BPF uprobe
benchmark result 43% on average
- rethook: Remove non-fatal warning messages when tracing stack from
BPF and skip rcu_is_watching() validation in rethook if possible
- objpool: Optimize objpool (which is used by kretprobes and fprobe as
rethook backend storage) by inlining functions and avoid caching
nr_cpu_ids because it is a const value
- fprobe: Add entry/exit callbacks types (code cleanup)
- kprobes: Check ftrace was killed in kprobes if it uses ftrace
* tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
kprobe/ftrace: bail out if ftrace was killed
selftests/ftrace: Fix required features for VFS type test case
objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids
objpool: enable inlining objpool_push() and objpool_pop() operations
rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get()
ftrace: make extra rcu_is_watching() validation check optional
uprobes: reduce contention on uprobes_tree access
rethook: Remove warning messages printed for finding return address of a frame.
fprobe: Add entry/exit callbacks types
selftests/ftrace: add fprobe test cases for VFS type "%pd" and "%pD"
selftests/ftrace: add kprobe test cases for VFS type "%pd" and "%pD"
Documentation: tracing: add new type '%pd' and '%pD' for kprobe
tracing/probes: support '%pD' type for print struct file's name
tracing/probes: support '%pd' type for print struct dentry's name
uprobes: add speculative lockless system-wide uprobe filter check
uprobes: prepare uprobe args buffer lazily
uprobes: encapsulate preparation of uprobe args buffer
Core & protocols
----------------
- Complete rework of garbage collection of AF_UNIX sockets.
AF_UNIX is prone to forming reference count cycles due to fd passing
functionality. New method based on Tarjan's Strongly Connected Components
algorithm should be both faster and remove a lot of workarounds
we accumulated over the years.
- Add TCP fraglist GRO support, allowing chaining multiple TCP packets
and forwarding them together. Useful for small switches / routers which
lack basic checksum offload in some scenarios (e.g. PPPoE).
- Support using SMP threads for handling packet backlog i.e. packet
processing from software interfaces and old drivers which don't
use NAPI. This helps move the processing out of the softirq jumble.
- Continue work of converting from rtnl lock to RCU protection.
Don't require rtnl lock when reading: IPv6 routing FIB, IPv6 address
labels, netdev threaded NAPI sysfs files, bonding driver's sysfs files,
MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics, TC Qdiscs,
neighbor entries, ARP entries via ioctl(SIOCGARP), a lot of the link
information available via rtnetlink.
- Small optimizations from Eric to UDP wake up handling, memory accounting,
RPS/RFS implementation, TCP packet sizing etc.
- Allow direct page recycling in the bulk API used by XDP, for +2% PPS.
- Support peek with an offset on TCP sockets.
- Add MPTCP APIs for querying last time packets were received/sent/acked,
and whether MPTCP "upgrade" succeeded on a TCP socket.
- Add intra-node communication shortcut to improve SMC performance.
- Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol driver.
- Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
- Add reset reasons for tracing what caused a TCP reset to be sent.
- Introduce direction attribute for xfrm (IPSec) states.
State can be used either for input or output packet processing.
Things we sprinkled into general kernel code
--------------------------------------------
- Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
This required touch-ups and renaming of a few existing users.
- Add Endian-dependent __counted_by_{le,be} annotations.
- Make building selftests "quieter" by printing summaries like
"CC object.o" rather than full commands with all the arguments.
Netfilter
---------
- Use GFP_KERNEL to clone elements, to deal better with OOM situations
and avoid failures in the .commit step.
BPF
---
- Add eBPF JIT for ARCv2 CPUs.
- Support attaching kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function entry
and return, the entry program can decide if the return program gets
executed and the entry program can share u64 cookie value with return
program. "Session mode" is a common use-case for tetragon and bpftrace.
- Add the ability to specify and retrieve BPF cookie for raw tracepoint
programs in order to ease migration from classic to raw tracepoints.
- Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86, ARM64 and RISC-V JITs.
This allows inlining functions which need to access per-CPU state.
- Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86 instruction.
Support BPF arena on ARM64.
- Add a new bpf_wq API for deferring events and refactor process-context
bpf_timer code to keep common code where possible.
- Harden the BPF verifier's and/or/xor value tracking.
- Introduce crypto kfuncs to let BPF programs call kernel crypto APIs.
- Support bpf_tail_call_static() helper for BPF programs with GCC 13.
- Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled.
Driver API
----------
- Skip software TC processing completely if all installed rules are
marked as HW-only, instead of checking the HW-only flag rule by rule.
- Add support for configuring PoE (Power over Ethernet), similar to
the already existing support for PoDL (Power over Data Line) config.
- Initial bits of a queue control API, for now allowing a single queue
to be reset without disturbing packet flow to other queues.
- Common (ethtool) statistics for hardware timestamping.
Tests and tooling
-----------------
- Remove the need to create a config file to run the net forwarding tests
so that a naive "make run_tests" can exercise them.
- Define a method of writing tests which require an external endpoint
to communicate with (to send/receive data towards the test machine).
Add a few such tests.
- Create a shared code library for writing Python tests. Expose the YAML
Netlink library from tools/ to the tests for easy Netlink access.
- Move netfilter tests under net/, extend them, separate performance tests
from correctness tests, and iron out issues found by running them
"on every commit".
- Refactor BPF selftests to use common network helpers.
- Further work filling in YAML definitions of Netlink messages for:
nftables, team driver, bonding interfaces, vlan interfaces, VF info,
TC u32 mark, TC police action.
- Teach Python YAML Netlink to decode attribute policies.
- Extend the definition of the "indexed array" construct in the specs
to cover arrays of scalars rather than just nests.
- Add hyperlinks between definitions in generated Netlink docs.
Drivers
-------
- Make sure unsupported flower control flags are rejected by drivers,
and make more drivers report errors directly to the application rather
than dmesg (large number of driver changes from Asbjørn Sloth Tønnesen).
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support multiple RSS contexts and steering traffic to them
- support XDP metadata
- make page pool allocations more NUMA aware
- Intel (100G, ice, idpf):
- extract datapath code common among Intel drivers into a library
- use fewer resources in switchdev by sharing queues with the PF
- add PFCP filter support
- add Ethernet filter support
- use a spinlock instead of HW lock in PTP clock ops
- support 5 layer Tx scheduler topology
- nVidia/Mellanox:
- 800G link modes and 100G SerDes speeds
- per-queue IRQ coalescing configuration
- Marvell Octeon:
- support offloading TC packet mark action
- Ethernet NICs consumer, embedded and virtual:
- stop lying about skb->truesize in USB Ethernet drivers, it messes up
TCP memory calculations
- Google cloud vNIC:
- support changing ring size via ethtool
- support ring reset using the queue control API
- VirtIO net:
- expose flow hash from RSS to XDP
- per-queue statistics
- add selftests
- Synopsys (stmmac):
- support controllers which require an RX clock signal from the MII
bus to perform their hardware initialization
- TI:
- icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
- icssg_prueth: add SW TX / RX Coalescing based on hrtimers
- cpsw: minimal XDP support
- Renesas (ravb):
- support describing the MDIO bus
- Realtek (r8169):
- add support for RTL8168M
- Microchip Sparx5:
- matchall and flower actions mirred and redirect
- Ethernet switches:
- nVidia/Mellanox:
- improve events processing performance
- Marvell:
- add support for MV88E6250 family internal PHYs
- Microchip:
- add DCB and DSCP mapping support for KSZ switches
- vsc73xx: convert to PHYLINK
- Realtek:
- rtl8226b/rtl8221b: add C45 instances and SerDes switching
- Many driver changes related to PHYLIB and PHYLINK deprecated API cleanup.
- Ethernet PHYs:
- Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
- micrel: lan8814: add support for PPS out and external timestamp trigger
- WiFi:
- Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices drivers.
Modern devices can only be configured using nl80211.
- mac80211/cfg80211
- handle color change per link for WiFi 7 Multi-Link Operation
- Intel (iwlwifi):
- don't support puncturing in 5 GHz
- support monitor mode on passive channels
- BZ-W device support
- P2P with HE/EHT support
- re-add support for firmware API 90
- provide channel survey information for Automatic Channel Selection
- MediaTek (mt76):
- mt7921 LED control
- mt7925 EHT radiotap support
- mt7920e PCI support
- Qualcomm (ath11k):
- P2P support for QCA6390, WCN6855 and QCA2066
- support hibernation
- ieee80211-freq-limit Device Tree property support
- Qualcomm (ath12k):
- refactoring in preparation of multi-link support
- suspend and hibernation support
- ACPI support
- debugfs support, including dfs_simulate_radar support
- RealTek:
- rtw88: RTL8723CS SDIO device support
- rtw89: RTL8922AE Wi-Fi 7 PCI device support
- rtw89: complete features of new WiFi 7 chip 8922AE including
BT-coexistence and Wake-on-WLAN
- rtw89: use BIOS ACPI settings to set TX power and channels
- rtl8xxxu: enable Management Frame Protection (MFP) support
- Bluetooth:
- support for Intel BlazarI and Filmore Peak2 (BE201)
- support for MediaTek MT7921S SDIO
- initial support for Intel PCIe BT driver
- remove HCI_AMP support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZD6sQACgkQMUZtbf5S
IrtLYw/+I73ePGIye37o2jpbodcLAUZVfF3r6uYUzK8hokEcKD0QVJa9w7PizLZ3
UO45ClOXFLJCkfP4reFenLfxGCel2AJI+F7VFl2xaO2XgrcH/lnVrHqKZEAEXjls
KoYMnShIolv7h2MKP6hHtyTi2j1wvQUKsZC71o9/fuW+4fUT8gECx1YtYcL73wrw
gEMdlUgBYC3jiiCUHJIFX6iPJ2t/TC+q1eIIF2K/Osrk2kIqQhzoozcL4vpuAZQT
99ljx/qRelXa8oppDb7nM5eulg7WY8ZqxEfFZphTMC5nLEGzClxuOTTl2kDYI/D/
UZmTWZDY+F5F0xvNk2gH84qVJXBOVDoobpT7hVA/tDuybobc/kvGDzRayEVqVzKj
Q0tPlJs+xBZpkK5TVnxaFLJVOM+p1Xosxy3kNVXmuYNBvT/R89UbJiCrUKqKZF+L
z/1mOYUv8UklHqYAeuJSptHvqJjTGa/fsEYP7dAUBbc1N2eVB8mzZ4mgU5rYXbtC
E6UXXiWnoSRm8bmco9QmcWWoXt5UGEizHSJLz6t1R5Df/YmXhWlytll5aCwY1ksf
FNoL7S4u7AZThL1Nwi7yUs4CAjhk/N4aOsk+41S0sALCx30BJuI6UdesAxJ0lu+Z
fwCQYbs27y4p7mBLbkYwcQNxAxGm7PSK4yeyRIy2njiyV4qnLf8=
=EsC2
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Complete rework of garbage collection of AF_UNIX sockets.
AF_UNIX is prone to forming reference count cycles due to fd
passing functionality. New method based on Tarjan's Strongly
Connected Components algorithm should be both faster and remove a
lot of workarounds we accumulated over the years.
- Add TCP fraglist GRO support, allowing chaining multiple TCP
packets and forwarding them together. Useful for small switches /
routers which lack basic checksum offload in some scenarios (e.g.
PPPoE).
- Support using SMP threads for handling packet backlog i.e. packet
processing from software interfaces and old drivers which don't use
NAPI. This helps move the processing out of the softirq jumble.
- Continue work of converting from rtnl lock to RCU protection.
Don't require rtnl lock when reading: IPv6 routing FIB, IPv6
address labels, netdev threaded NAPI sysfs files, bonding driver's
sysfs files, MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics,
TC Qdiscs, neighbor entries, ARP entries via ioctl(SIOCGARP), a lot
of the link information available via rtnetlink.
- Small optimizations from Eric to UDP wake up handling, memory
accounting, RPS/RFS implementation, TCP packet sizing etc.
- Allow direct page recycling in the bulk API used by XDP, for +2%
PPS.
- Support peek with an offset on TCP sockets.
- Add MPTCP APIs for querying last time packets were received/sent/acked
and whether MPTCP "upgrade" succeeded on a TCP socket.
- Add intra-node communication shortcut to improve SMC performance.
- Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol
driver.
- Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
- Add reset reasons for tracing what caused a TCP reset to be sent.
- Introduce direction attribute for xfrm (IPSec) states. State can be
used either for input or output packet processing.
Things we sprinkled into general kernel code:
- Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
This required touch-ups and renaming of a few existing users.
- Add Endian-dependent __counted_by_{le,be} annotations.
- Make building selftests "quieter" by printing summaries like
"CC object.o" rather than full commands with all the arguments.
Netfilter:
- Use GFP_KERNEL to clone elements, to deal better with OOM
situations and avoid failures in the .commit step.
BPF:
- Add eBPF JIT for ARCv2 CPUs.
- Support attaching kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function
entry and return, the entry program can decide if the return
program gets executed and the entry program can share u64 cookie
value with return program. "Session mode" is a common use-case for
tetragon and bpftrace.
- Add the ability to specify and retrieve BPF cookie for raw
tracepoint programs in order to ease migration from classic to raw
tracepoints.
- Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86, ARM64 and RISC-V
JITs. This allows inlining functions which need to access per-CPU
state.
- Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86
instruction. Support BPF arena on ARM64.
- Add a new bpf_wq API for deferring events and refactor
process-context bpf_timer code to keep common code where possible.
- Harden the BPF verifier's and/or/xor value tracking.
- Introduce crypto kfuncs to let BPF programs call kernel crypto
APIs.
- Support bpf_tail_call_static() helper for BPF programs with GCC 13.
- Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled.
Driver API:
- Skip software TC processing completely if all installed rules are
marked as HW-only, instead of checking the HW-only flag rule by
rule.
- Add support for configuring PoE (Power over Ethernet), similar to
the already existing support for PoDL (Power over Data Line)
config.
- Initial bits of a queue control API, for now allowing a single
queue to be reset without disturbing packet flow to other queues.
- Common (ethtool) statistics for hardware timestamping.
Tests and tooling:
- Remove the need to create a config file to run the net forwarding
tests so that a naive "make run_tests" can exercise them.
- Define a method of writing tests which require an external endpoint
to communicate with (to send/receive data towards the test
machine). Add a few such tests.
- Create a shared code library for writing Python tests. Expose the
YAML Netlink library from tools/ to the tests for easy Netlink
access.
- Move netfilter tests under net/, extend them, separate performance
tests from correctness tests, and iron out issues found by running
them "on every commit".
- Refactor BPF selftests to use common network helpers.
- Further work filling in YAML definitions of Netlink messages for:
nftables, team driver, bonding interfaces, vlan interfaces, VF
info, TC u32 mark, TC police action.
- Teach Python YAML Netlink to decode attribute policies.
- Extend the definition of the "indexed array" construct in the specs
to cover arrays of scalars rather than just nests.
- Add hyperlinks between definitions in generated Netlink docs.
Drivers:
- Make sure unsupported flower control flags are rejected by drivers,
and make more drivers report errors directly to the application
rather than dmesg (large number of driver changes from Asbjørn
Sloth Tønnesen).
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support multiple RSS contexts and steering traffic to them
- support XDP metadata
- make page pool allocations more NUMA aware
- Intel (100G, ice, idpf):
- extract datapath code common among Intel drivers into a library
- use fewer resources in switchdev by sharing queues with the PF
- add PFCP filter support
- add Ethernet filter support
- use a spinlock instead of HW lock in PTP clock ops
- support 5 layer Tx scheduler topology
- nVidia/Mellanox:
- 800G link modes and 100G SerDes speeds
- per-queue IRQ coalescing configuration
- Marvell Octeon:
- support offloading TC packet mark action
- Ethernet NICs consumer, embedded and virtual:
- stop lying about skb->truesize in USB Ethernet drivers, it
messes up TCP memory calculations
- Google cloud vNIC:
- support changing ring size via ethtool
- support ring reset using the queue control API
- VirtIO net:
- expose flow hash from RSS to XDP
- per-queue statistics
- add selftests
- Synopsys (stmmac):
- support controllers which require an RX clock signal from the
MII bus to perform their hardware initialization
- TI:
- icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
- icssg_prueth: add SW TX / RX Coalescing based on hrtimers
- cpsw: minimal XDP support
- Renesas (ravb):
- support describing the MDIO bus
- Realtek (r8169):
- add support for RTL8168M
- Microchip Sparx5:
- matchall and flower actions mirred and redirect
- Ethernet switches:
- nVidia/Mellanox:
- improve events processing performance
- Marvell:
- add support for MV88E6250 family internal PHYs
- Microchip:
- add DCB and DSCP mapping support for KSZ switches
- vsc73xx: convert to PHYLINK
- Realtek:
- rtl8226b/rtl8221b: add C45 instances and SerDes switching
- Many driver changes related to PHYLIB and PHYLINK deprecated API
cleanup
- Ethernet PHYs:
- Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
- micrel: lan8814: add support for PPS out and external timestamp trigger
- WiFi:
- Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices
drivers. Modern devices can only be configured using nl80211.
- mac80211/cfg80211
- handle color change per link for WiFi 7 Multi-Link Operation
- Intel (iwlwifi):
- don't support puncturing in 5 GHz
- support monitor mode on passive channels
- BZ-W device support
- P2P with HE/EHT support
- re-add support for firmware API 90
- provide channel survey information for Automatic Channel Selection
- MediaTek (mt76):
- mt7921 LED control
- mt7925 EHT radiotap support
- mt7920e PCI support
- Qualcomm (ath11k):
- P2P support for QCA6390, WCN6855 and QCA2066
- support hibernation
- ieee80211-freq-limit Device Tree property support
- Qualcomm (ath12k):
- refactoring in preparation of multi-link support
- suspend and hibernation support
- ACPI support
- debugfs support, including dfs_simulate_radar support
- RealTek:
- rtw88: RTL8723CS SDIO device support
- rtw89: RTL8922AE Wi-Fi 7 PCI device support
- rtw89: complete features of new WiFi 7 chip 8922AE including
BT-coexistence and Wake-on-WLAN
- rtw89: use BIOS ACPI settings to set TX power and channels
- rtl8xxxu: enable Management Frame Protection (MFP) support
- Bluetooth:
- support for Intel BlazarI and Filmore Peak2 (BE201)
- support for MediaTek MT7921S SDIO
- initial support for Intel PCIe BT driver
- remove HCI_AMP support"
* tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1827 commits)
selftests: netfilter: fix packetdrill conntrack testcase
net: gro: fix napi_gro_cb zeroed alignment
Bluetooth: btintel_pcie: Refactor and code cleanup
Bluetooth: btintel_pcie: Fix warning reported by sparse
Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1
Bluetooth: btintel: Fix compiler warning for multi_v7_defconfig config
Bluetooth: btintel_pcie: Fix compiler warnings
Bluetooth: btintel_pcie: Add *setup* function to download firmware
Bluetooth: btintel_pcie: Add support for PCIe transport
Bluetooth: btintel: Export few static functions
Bluetooth: HCI: Remove HCI_AMP support
Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init()
Bluetooth: qca: Fix error code in qca_read_fw_build_info()
Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warning
Bluetooth: btintel: Add support for Filmore Peak2 (BE201)
Bluetooth: btintel: Add support for BlazarI
LE Create Connection command timeout increased to 20 secs
dt-bindings: net: bluetooth: Add MediaTek MT7921S SDIO Bluetooth
Bluetooth: compute LE flow credits based on recvbuf space
Bluetooth: hci_sync: Use cmd->num_cis instead of magic number
...
This kunit update for Linux 6.10-rc1 consists of:
- fix to race condition in try-catch completion
- change to __kunit_test_suites_init() to exit early if there is
nothing to test
- change to string-stream-test to use KUNIT_DEFINE_ACTION_WRAPPER
- moving fault tests behind KUNIT_FAULT_TEST Kconfig option
- kthread test fixes and improvements
- iov_iter test fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmZCNsAACgkQCwJExA0N
Qxx03w/9EmjF3T16LPaeuerdoypWDcroDT6gpoFXGrvf3lDrna8uDNija5Pb1yMn
l97wla3IJ1EZRMTy1jgWGQiiGIdkV8hcze65HZMi19qx/49TUbhA/pTmpYC56cp9
sk2fBjOHz8iI4kdL4eCMr9MpSiwOIDcfWOr1Lh/AP2LHOU1pRdFZbwO6iZ3wyGlJ
JH4D1CwmfgMGEau4qUo0jvuRbFAf33S+yEI9gr8CskPItljFVO4jVz4lprnTbU9i
qAOivHzwcHyYc0upb6q2vIlp8vhmDygG/m07lnwfF7ZHsYo+3zV4FkxHspN2+jGA
frH7Y0X9zt6YjRRMb9NcNnI67VTiSNzdCvB7urUhKlbXoZ2gjtgB7zHeQtAhlXRo
XVa4QgWBI5ExKBuLI+0yKo4wEO8M0quXxhbX+2Q+tsRnoYmhwb0G8AUyl/26bt2g
RelGrArDS5eMrlxl97rjMGFrB5Uan2MR751tl+aZPgyNRW3tRKJnQLZmM1z8aFQp
vGReT6POzCnQ1wLUkcj6mnObbv9XuuYY1BQgKCtmJflvRToEuwpLOKK8Uca7ou3p
TbVarGIn0jdHv4zGkXrAkt/mhcxanBXhVKLfh/MqQ7fCZBULkSrjJFLhCpvvHwIV
nckaP2sZWls6FTDuawFOUxrr/+LjJchMmHhFy9MiDaVoieiTg6U=
=3QIa
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-kunit-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- fix race condition in try-catch completion
- change __kunit_test_suites_init() to exit early if there is
nothing to test
- change string-stream-test to use KUNIT_DEFINE_ACTION_WRAPPER
- move fault tests behind KUNIT_FAULT_TEST Kconfig option
- kthread test fixes and improvements
- iov_iter test fixes
* tag 'linux_kselftest-kunit-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: bail out early in __kunit_test_suites_init() if there are no suites to test
kunit: string-stream-test: use KUNIT_DEFINE_ACTION_WRAPPER
kunit: test: Move fault tests behind KUNIT_FAULT_TEST Kconfig option
kunit: unregister the device on error
kunit: Fix race condition in try-catch completion
kunit: Add tests for fault
kunit: Print last test location on fault
kunit: Fix KUNIT_SUCCESS() calls in iov_iter tests
kunit: Handle test faults
kunit: Fix timeout message
kunit: Fix kthread reference
kunit: Handle thread creation error
- Core code:
- Interrupt storm detection for the lockup watchdog:
Lockups which are caused by interrupt storms are not easy to debug
because there is no information about the events which make the lockup
detector trigger.
To make this more user friendly, provide an extenstion to interrupt
statistics which allows to take snapshots and an interface to retrieve
the delta to the snapshot. Use this new mechanism in the watchdog code
to do a two stage lockup analysis by taking the snapshot and printing
the deltas for the topmost active interrupts on the second trigger.
Note: This contains both the interrupt and the watchdog changes as
the latter depend on the former obviously.
- Avoid summation loops in the /proc/interrupts output and use the global
counter when possible
- Skip suspended interrupts on CPU hotplug operations to ensure that they
are not delivered before the system resumes the device drivers when
coming out of suspend.
- On CPU hot-unplug interrupts which are affine to the outgoing CPU are
migrated to a different CPU in the affinity mask. This can fail when
the CPUs have no vectors left. Instead of giving up try to migrate it
to any online CPU and thereby breaking the affinity setting in order to
prevent a stale device interrupt which targets an offline CPU
- The usual small cleanups
- Driver code:
- Support for the RISCV AIA MSI controller
- Make the interrupt allocation for the Loongson PCH controller more
flexible to prevent vector exhaustion
- The usual set of cleanups and fixes all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmZBCM0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoeZHEACqMLN3K+1HyWflYtcTHJeYCjZLHS77
2tQeKaaskOA4W6dcGXPxMw5CHqAobHVQQMqgcJxhUdqQiOJnFFnrtCD7JtqM0hWK
UORNbyeovuhAo+iJ0fTuS8p63H7vm2GIWwBLWJnOuChYv/6Yyx5Cald1skvyvbzL
zePhiiAf5mkdmJMeT5wJSCqEWSRYOXsVAJ/0YAwFG3bKkJH3bmDo6SDJY02sXT5P
pjbtD/0hum9wIVT4fNdYleHHQMdBdj9dLlcxXBikHq50mDMw7GxvjKiLcXmoerw3
rEBfVVJp3qpSofpNJZ3HH0ywcF3yUzq04/LPE9Tk2MoQ8NF0GzP8r9Ahke4B7cUj
FysWNiAlC2IisEi6th313FZkTLx0zgewdsdEBTLt8eAE9TU0wamRbo99LZ8i/Qr3
hk7jV8DzL+EDQJLgl4p1iPJgA708eW17tbCxLEa15VKVV6P58miohmhx/IfPO2Gx
FV1PPehtItsmiK/UoRtUCoFdFsqNQtOE+h8DWLyy8RDmhBqGbn9Ut4euXiQIF+rX
WJKPFfslCTR39BrBcZnZeNsgOCN7tEfFRstzjzkey1DaeTGWtxmA5UGhpC2vT74y
YyXluvZlgKr4S64ABmcqQj++hQLho0OQAih3uW5YVxt4VxEUcXYMJOsV1AQGpMjF
UnewWH5opBQdfw==
=jFLf
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt subsystem updates from Thomas Gleixner:
"Core code:
- Interrupt storm detection for the lockup watchdog:
Lockups which are caused by interrupt storms are not easy to debug
because there is no information about the events which make the
lockup detector trigger.
To make this more user friendly, provide an extenstion to interrupt
statistics which allows to take snapshots and an interface to
retrieve the delta to the snapshot. Use this new mechanism in the
watchdog code to do a two stage lockup analysis by taking the
snapshot and printing the deltas for the topmost active interrupts
on the second trigger.
Note: This contains both the interrupt and the watchdog changes as
the latter depend on the former obviously.
- Avoid summation loops in the /proc/interrupts output and use the
global counter when possible
- Skip suspended interrupts on CPU hotplug operations to ensure that
they are not delivered before the system resumes the device drivers
when coming out of suspend.
- On CPU hot-unplug interrupts which are affine to the outgoing CPU
are migrated to a different CPU in the affinity mask. This can fail
when the CPUs have no vectors left. Instead of giving up try to
migrate it to any online CPU and thereby breaking the affinity
setting in order to prevent a stale device interrupt which targets
an offline CPU
- The usual small cleanups
Driver code:
- Support for the RISCV AIA MSI controller
- Make the interrupt allocation for the Loongson PCH controller more
flexible to prevent vector exhaustion
- The usual set of cleanups and fixes all over the place"
* tag 'irq-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
irqchip/gic-v3-its: Remove BUG_ON in its_vpe_irq_domain_alloc
cpuidle: Avoid explicit cpumask allocation on stack
irqchip/sifive-plic: Avoid explicit cpumask allocation on stack
irqchip/riscv-aplic-direct: Avoid explicit cpumask allocation on stack
irqchip/loongson-eiointc: Avoid explicit cpumask allocation on stack
irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack
irqchip/irq-bcm6345-l1: Avoid explicit cpumask allocation on stack
cpumask: Introduce cpumask_first_and_and()
irqchip/irq-brcmstb-l2: Avoid saving mask on shutdown
genirq: Reuse irq_is_nmi()
genirq/cpuhotplug: Retry with cpu_online_mask when migration fails
genirq/cpuhotplug: Skip suspended interrupts when restoring affinity
arm64: dts: st: Add interrupt parent to pinctrl on stm32mp251
arm64: dts: st: Add exti1 and exti2 nodes on stm32mp251
ARM: dts: stm32: List exti parent interrupts on stm32mp131
ARM: dts: stm32: List exti parent interrupts on stm32mp151
arm64: Kconfig.platforms: Enable STM32_EXTI for ARCH_STM32
irqchip/stm32-exti: Mark events reserved with RIF configuration check
irqchip/stm32-exti: Skip secure events
irqchip/stm32-exti: Convert driver to standard PM
...
- Core code:
- Make timekeeping and VDSO time readouts resilent against math overflow:
In guest context the kernel is prone to math overflow when the host
defers the timer interrupt due to overload, malfunction or malice.
This can be mitigated by checking the clocksource delta for the
maximum deferrement which is readily available. If that value is
exceeded then the code uses a slowpath function which can handle the
multiplication overflow.
This functionality is enabled unconditionally in the kernel, but made
conditional in the VDSO code. The latter is conditional because it
allows architectures to optimize the check so it is not causing
performance regressions.
On X86 this is achieved by reworking the existing check for negative
TSC deltas as a negative delta obviously exceeds the maximum
deferrement when it is evaluated as an unsigned value. That avoids two
conditionals in the hotpath and allows to hide both the negative delta
and the large delta handling in the same slow path.
- Add an initial minimal ktime_t abstraction for Rust
- The usual boring cleanups and enhancements
- Drivers:
- Boring updates to device trees and trivial enhancements in various
drivers.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmZBErUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoZVhD/9iUPzcGNgqGqcO1bXy6dH4xLpeec6o
2En1vg45DOaygN7DFxkoei20KJtfdFeaaEDH8UqmOfPcpLIuVAd0yqhgDQtx6ZcO
XNd09SFDInzUt1Ot/WcoXp5N6Wt3vyEgUAlIN1fQdbaZ3fh6OhGhXXCRfiRCGXU1
ea2pSunLuRf1pKU0AYhGIexnZMOHC4NmVXw/m+WNw5DJrmWB+OaNFKfMoQjtQ1HD
Vgyr2RALHnIeXm60y2j3dD7TWGXICE/edzOd7pEyg5LFXsmcp388eu/DEdOq3OTV
tsHLgIi05GJym3dykPBVwZk09M5oVNNfkg9zDxHWhSLkEJmc4QUaH3dgM8uBoaRW
pS3LaO3ePxWmtAOdSNKFY6xnl6df+PYJoZcIF/GuXgty7im+VLK9C4M05mSjey00
omcEywvmGdFezY6D9MmjjhFa+q2v9zpRjFpCWaIv3DQdAaDPrOzBk4SSqHZOV4lq
+hp7ar1mTn1FPrXBouwyOgSOUANISV5cy/QuwOtrVIuVR4rWFVgfWo/7J32/q5Ik
XBR0lTdQy1Biogf6xy0HCY+4wItOLTqEXXqeknHSMJpDzj5uZglZemgKbix1wVJ9
8YlD85Q7sktlPmiLMKV9ra0MKVyXDoIrgt4hX98A8M12q9bNdw23x0p0jkJHwGha
ZYUyX+XxKgOJug==
=pL+S
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers and timekeeping updates from Thomas Gleixner:
"Core code:
- Make timekeeping and VDSO time readouts resilent against math
overflow:
In guest context the kernel is prone to math overflow when the host
defers the timer interrupt due to overload, malfunction or malice.
This can be mitigated by checking the clocksource delta for the
maximum deferrement which is readily available. If that value is
exceeded then the code uses a slowpath function which can handle
the multiplication overflow.
This functionality is enabled unconditionally in the kernel, but
made conditional in the VDSO code. The latter is conditional
because it allows architectures to optimize the check so it is not
causing performance regressions.
On X86 this is achieved by reworking the existing check for
negative TSC deltas as a negative delta obviously exceeds the
maximum deferrement when it is evaluated as an unsigned value. That
avoids two conditionals in the hotpath and allows to hide both the
negative delta and the large delta handling in the same slow path.
- Add an initial minimal ktime_t abstraction for Rust
- The usual boring cleanups and enhancements
Drivers:
- Boring updates to device trees and trivial enhancements in various
drivers"
* tag 'timers-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
clocksource/drivers/arm_arch_timer: Mark hisi_161010101_oem_info const
clocksource/drivers/timer-ti-dm: Remove an unused field in struct dmtimer
clocksource/drivers/renesas-ostm: Avoid reprobe after successful early probe
clocksource/drivers/renesas-ostm: Allow OSTM driver to reprobe for RZ/V2H(P) SoC
dt-bindings: timer: renesas: ostm: Document Renesas RZ/V2H(P) SoC
rust: time: doc: Add missing C header links
clocksource: Make the int help prompt unit readable in ncurses
hrtimer: Rename __hrtimer_hres_active() to hrtimer_hres_active()
timerqueue: Remove never used function timerqueue_node_expires()
rust: time: Add Ktime
vdso: Fix powerpc build U64_MAX undeclared error
clockevents: Convert s[n]printf() to sysfs_emit()
clocksource: Convert s[n]printf() to sysfs_emit()
clocksource: Make watchdog and suspend-timing multiplication overflow safe
timekeeping: Let timekeeping_cycles_to_ns() handle both under and overflow
timekeeping: Make delta calculation overflow safe
timekeeping: Prepare timekeeping_cycles_to_ns() for overflow safety
timekeeping: Fold in timekeeping_delta_to_ns()
timekeeping: Consolidate timekeeping helpers
timekeeping: Refactor timekeeping helpers
...
- Add cpufreq pressure feedback for the scheduler
- Rework misfit load-balancing wrt. affinity restrictions
- Clean up and simplify the code around ::overutilized and
::overload access.
- Simplify sched_balance_newidle()
- Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
handling that changed the output.
- Rework & clean up <asm/vtime.h> interactions wrt. arch_vtime_task_switch()
- Reorganize, clean up and unify most of the higher level
scheduler balancing function names around the sched_balance_*()
prefix.
- Simplify the balancing flag code (sched_balance_running)
- Miscellaneous cleanups & fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZBtA0RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gQEw//WiCiV7zTlWShSiG/g8GTfoAvl53QTWXF
0jQ8TUcoIhxB5VeGgxVG1srYt8f505UXjH7L0MJLrbC3nOgRCg4NK57WiQEachKK
HORIJHT0tMMsKIwX9D5Ovo4xYJn+j7mv7j/caB+hIlzZAbWk+zZPNWcS84p0ZS/4
appY6RIcp7+cI7bisNMGUuNZS14+WMdWoX3TgoI6ekgDZ7Ky+kQvkwGEMBXsNElO
qZOj6yS/QUE4Htwz0tVfd6h5svoPM/VJMIvl0yfddPGurfNw6jEh/fjcXnLdAzZ6
9mgcosETncQbm0vfSac116lrrZIR9ygXW/yXP5S7I5dt+r+5pCrBZR2E5g7U4Ezp
GjX1+6J9U6r6y12AMLRjadFOcDvxdwtszhZq4/wAcmS3B9dvupnH/w7zqY9ho3wr
hTdtDHoAIzxJh7RNEHgeUC0/yQX3wJ9THzfYltDRIIjHTuvl4d5lHgsug+4Y9ClE
pUIQm/XKouweQN9TZz2ULle4ZhRrR9sM9QfZYfirJ/RppmuKool4riWyQFQNHLCy
mBRMjFFsTpFIOoZXU6pD4EabOpWdNrRRuND/0yg3WbDat2gBWq6jvSFv2UN1/v7i
Un5jijTuN7t8yP5lY5Tyf47kQfLlA9bUx1v56KnF9mrpI87FyiDD3MiQVhDsvpGX
rP96BIOrkSo=
=obph
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Add cpufreq pressure feedback for the scheduler
- Rework misfit load-balancing wrt affinity restrictions
- Clean up and simplify the code around ::overutilized and
::overload access.
- Simplify sched_balance_newidle()
- Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
handling that changed the output.
- Rework & clean up <asm/vtime.h> interactions wrt arch_vtime_task_switch()
- Reorganize, clean up and unify most of the higher level
scheduler balancing function names around the sched_balance_*()
prefix
- Simplify the balancing flag code (sched_balance_running)
- Miscellaneous cleanups & fixes
* tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
sched/pelt: Remove shift of thermal clock
sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()
thermal/cpufreq: Remove arch_update_thermal_pressure()
sched/cpufreq: Take cpufreq feedback into account
cpufreq: Add a cpufreq pressure feedback for the scheduler
sched/fair: Fix update of rd->sg_overutilized
sched/vtime: Do not include <asm/vtime.h> header
s390/irq,nmi: Include <asm/vtime.h> header directly
s390/vtime: Remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover
sched/vtime: Get rid of generic vtime_task_switch() implementation
sched/vtime: Remove confusing arch_vtime_task_switch() declaration
sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags
sched/fair: Rename set_rd_overutilized_status() to set_rd_overutilized()
sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED
sched/fair: Rename {set|get}_rd_overload() to {set|get}_rd_overloaded()
sched/fair: Rename root_domain::overload to ::overloaded
sched/fair: Use helper functions to access root_domain::overload
sched/fair: Check root_domain::overload value before update
sched/fair: Combine EAS check with root_domain::overutilized access
sched/fair: Simplify the continue_balancing logic in sched_balance_newidle()
...
- selftests: Add str*cmp tests (Ivan Orlov)
- __counted_by: provide UAPI for _le/_be variants (Erick Archer)
- Various strncpy deprecation refactors (Justin Stitt)
- stackleak: Use a copy of soon-to-be-const sysctl table (Thomas Weißschuh)
- UBSAN: Work around i386 -regparm=3 bug with Clang prior to version 19
- Provide helper to deal with non-NUL-terminated string copying
- SCSI: Fix older string copying bugs (with new helper)
- selftests: Consolidate string helper behavioral tests
- selftests: add memcpy() fortify tests
- string: Add additional __realloc_size() annotations for "dup" helpers
- LKDTM: Fix KCFI+rodata+objtool confusion
- hardening.config: Enable KCFI
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmY/yCUWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJuf2D/9xlQA7UxUDlm1Z6DPYzTZfNm4M
D+RJ1QoLNbZEYSzULWvfRSWI+c82qINoSgvtv2DdhWqSKivcMoeNDN846gewfwMY
0q3iChbhPaNBAHaXat1pf0iA6q2n/wpg1jv1C1PmPVSaEpl0CeQ2MLXSOMz9Gb7G
FkkaN/v+YlShUzkw61KwKPg959/bh5vCBbeLjSd1XAhLGKU7nWw4yj0J3usTnRbV
icCnW4mk9SD+pIli/+n7t/QIvPMf6TrJZoSgH9P7YNm+wNme4UEAm1PJz8F+KVAH
D3CJhlH36l8TrndsHMsHgDjKtUUchh+ExOlWGw3ObUnbU7ST2JP6crAdjtnyT2eN
uF+ELBT97SskFBAlzOzBSIs8lEwBZzTdJCmWqEBr3ZxxR7lcClmqbJY+X/FhvXko
o7PvtCbHCatpDPJPZ0e25nVsfEJS29RUED5Gen6vWcUtuvdFEgws70s5BDAbSZTo
RoJsuDqlRAFLdNDYmEN3UTGcm+PBjPgKsBrXiiNr4Y0BilU67Bzdmd8jiZC9ARe6
+3cfQRs0uWdemANzvrN5FnrIUhjRHWTvfVTXcC9Jt53HntIuMhhRajJuMcTAX5uQ
iWACUR14RL8lfInS8phWB5T4AvNexTFc6kVRqNzsGB0ZutsnAsqELttCk57tYQVr
Hlv/MbePyyLSKF/nYA==
=CgsW
-----END PGP SIGNATURE-----
Merge tag 'hardening-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"The bulk of the changes here are related to refactoring and expanding
the KUnit tests for string helper and fortify behavior.
Some trivial strncpy replacements in fs/ were carried in my tree. Also
some fixes to SCSI string handling were carried in my tree since the
helper for those was introduce here. Beyond that, just little fixes
all around: objtool getting confused about LKDTM+KCFI, preparing for
future refactors (constification of sysctl tables, additional
__counted_by annotations), a Clang UBSAN+i386 crash fix, and adding
more options in the hardening.config Kconfig fragment.
Summary:
- selftests: Add str*cmp tests (Ivan Orlov)
- __counted_by: provide UAPI for _le/_be variants (Erick Archer)
- Various strncpy deprecation refactors (Justin Stitt)
- stackleak: Use a copy of soon-to-be-const sysctl table (Thomas
Weißschuh)
- UBSAN: Work around i386 -regparm=3 bug with Clang prior to
version 19
- Provide helper to deal with non-NUL-terminated string copying
- SCSI: Fix older string copying bugs (with new helper)
- selftests: Consolidate string helper behavioral tests
- selftests: add memcpy() fortify tests
- string: Add additional __realloc_size() annotations for "dup"
helpers
- LKDTM: Fix KCFI+rodata+objtool confusion
- hardening.config: Enable KCFI"
* tag 'hardening-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (29 commits)
uapi: stddef.h: Provide UAPI macros for __counted_by_{le, be}
stackleak: Use a copy of the ctl_table argument
string: Add additional __realloc_size() annotations for "dup" helpers
kunit/fortify: Fix replaced failure path to unbreak __alloc_size
hardening: Enable KCFI and some other options
lkdtm: Disable CFI checking for perms functions
kunit/fortify: Add memcpy() tests
kunit/fortify: Do not spam logs with fortify WARNs
kunit/fortify: Rename tests to use recommended conventions
init: replace deprecated strncpy with strscpy_pad
kunit/fortify: Fix mismatched kvalloc()/vfree() usage
scsi: qla2xxx: Avoid possible run-time warning with long model_num
scsi: mpi3mr: Avoid possible run-time warning with long manufacturer strings
scsi: mptfusion: Avoid possible run-time warning with long manufacturer strings
fs: ecryptfs: replace deprecated strncpy with strscpy
hfsplus: refactor copy_name to not use strncpy
reiserfs: replace deprecated strncpy with scnprintf
virt: acrn: replace deprecated strncpy with strscpy
ubsan: Avoid i386 UBSAN handler crashes with Clang
ubsan: Remove 1-element array usage in debug reporting
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmY/YgsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpvi0EACwnFRtYioizBH0x7QUHTBcIr0IhACd5gfz
bm+uwlDUtf6G6lupHdJT9gOVB2z2z1m2Pz//8RuUVWw3Eqw2+rfgG8iJd+yo7IaV
DpX3WaM4NnBvB7FKOKHlMPvGuf7KgbZ3uPm3x8cbrn/axMmkZ6ljxTixJ3p5t4+s
xRsef/lVdG71DkXIFgTKATB86yNRJNlRQTbL+sZW22vdXdtfyBbOgR1sBuFfp7Hd
g/uocZM/z0ahM6JH/5R2IX2ttKXMIBZLA8HRkJdvYqg022cj4js2YyRCPU3N6jQN
MtN4TpJV5I++8l6SPQOOhaDNrK/6zFtDQpwG0YBiKKj3nQDgVbWWb8ejYTIUv4MP
SrEto4MVBEqg5N65VwYYhIf45rmueFyJp6z0Vqv6Owur5nuww/YIFknmoMa/WDMd
V8dIU3zL72FZDbPjIBjxHeqAGz9OgzEVafled7pi0Xbw6wqiB4kZihlMGXlD+WBy
Yd6xo8PX4i5+d2LLKKPxpW1X0eJlKYJ/4dnYCoFN8LmXSiPJnMx2pYrV+NqMxy4X
Thr8lxswLQC7j9YBBuIeDl8NB9N5FZZLvaC6I25QKq045M2ckJ+VrounsQb3vGwJ
72nlxxBZL8wz3sasgX9Pc1Cez9AqYbM+UZahq8ezPY5y3Jh0QfRw/MOk1ZaDNC8V
CNOHBH0E+Q==
=HnjE
-----END PGP SIGNATURE-----
Merge tag 'for-6.10/block-20240511' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- Add a partscan attribute in sysfs, fixing an issue with systemd
relying on an internal interface that went away.
- Attempt #2 at making long running discards interruptible. The
previous attempt went into 6.9, but we ended up mostly reverting it
as it had issues.
- Remove old ida_simple API in bcache
- Support for zoned write plugging, greatly improving the performance
on zoned devices.
- Remove the old throttle low interface, which has been experimental
since 2017 and never made it beyond that and isn't being used.
- Remove page->index debugging checks in brd, as it hasn't caught
anything and prepares us for removing in struct page.
- MD pull request from Song
- Don't schedule block workers on isolated CPUs
* tag 'for-6.10/block-20240511' of git://git.kernel.dk/linux: (84 commits)
blk-throttle: delay initialization until configuration
blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW
block: fix that util can be greater than 100%
block: support to account io_ticks precisely
block: add plug while submitting IO
bcache: fix variable length array abuse in btree_iter
bcache: Remove usage of the deprecated ida_simple_xx() API
md: Revert "md: Fix overflow in is_mddev_idle"
blk-lib: check for kill signal in ioctl BLKDISCARD
block: add a bio_await_chain helper
block: add a blk_alloc_discard_bio helper
block: add a bio_chain_and_submit helper
block: move discard checks into the ioctl handler
block: remove the discard_granularity check in __blkdev_issue_discard
block/ioctl: prefer different overflow check
null_blk: Fix the WARNING: modpost: missing MODULE_DESCRIPTION()
block: fix and simplify blkdevparts= cmdline parsing
block: refine the EOF check in blkdev_iomap_begin
block: add a partscan sysfs attribute for disks
block: add a disk_has_partscan helper
...
These are the changes for the TPM driver with a single major new
feature: TPM bus encryption and integrity protection. The key pair
on TPM side is generated from so called null random seed per power
on of the machine [1]. This supports the TPM encryption of the hard
drive by adding layer of protection against bus interposer attacks.
Other than the pull request a few minor fixes and documentation for
tpm_tis to clarify basics of TPM localities for future patch review
discussions (will be extended and refined over times, just a seed).
[1] https://lore.kernel.org/linux-integrity/20240429202811.13643-1-James.Bottomley@HansenPartnership.com/
BR, Jarkko
-----BEGIN PGP SIGNATURE-----
iJYEABYKAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCZj0l2iAcamFya2tvLnNh
a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0m8yAP4hBjMtpgAJZ4eZ
5o9tEQJrh/1JFZJ+8HU5IKPc4RU8BAEAyyYOCtxtS/C5B95iP+LvNla0KWi0pprU
HsCLULnV2Aw=
=RTXJ
-----END PGP SIGNATURE-----
Merge tag 'tpmdd-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull TPM updates from Jarkko Sakkinen:
"These are the changes for the TPM driver with a single major new
feature: TPM bus encryption and integrity protection. The key pair on
TPM side is generated from so called null random seed per power on of
the machine [1]. This supports the TPM encryption of the hard drive by
adding layer of protection against bus interposer attacks.
Other than that, a few minor fixes and documentation for tpm_tis to
clarify basics of TPM localities for future patch review discussions
(will be extended and refined over times, just a seed)"
Link: https://lore.kernel.org/linux-integrity/20240429202811.13643-1-James.Bottomley@HansenPartnership.com/ [1]
* tag 'tpmdd-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: (28 commits)
Documentation: tpm: Add TPM security docs toctree entry
tpm: disable the TPM if NULL name changes
Documentation: add tpm-security.rst
tpm: add the null key name as a sysfs export
KEYS: trusted: Add session encryption protection to the seal/unseal path
tpm: add session encryption protection to tpm2_get_random()
tpm: add hmac checks to tpm2_pcr_extend()
tpm: Add the rest of the session HMAC API
tpm: Add HMAC session name/handle append
tpm: Add HMAC session start and end functions
tpm: Add TCG mandated Key Derivation Functions (KDFs)
tpm: Add NULL primary creation
tpm: export the context save and load commands
tpm: add buffer function to point to returned parameters
crypto: lib - implement library version of AES in CFB mode
KEYS: trusted: tpm2: Use struct tpm_buf for sized buffers
tpm: Add tpm_buf_read_{u8,u16,u32}
tpm: TPM2B formatted buffers
tpm: Store the length of the tpm_buf data separately.
tpm: Update struct tpm_buf documentation comments
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmY8mxAACgkQu+CwddJF
iJru7AgAmBfolYwYjm9fCkH+px40smQQF08W+ygJaKF4+6e+b5ijfI8H3AG7QtuE
5FmdCjSvu56lr15sjeUy7giYWRfeEwxC/ztJ0FJ+RCzSEQVKCo2wWGYxDneelwdH
/v0Of5ENbIiH/svK4TArY9AemZw+nowNrwa4TI1QAEcp47T7x52r0GFOs1pnduep
eV6uSwHSx00myiF3fuMGQ7P4aUDLNTGn5LSHNI4sykObesGPx4Kvr0zZvhQT41me
c6Sc0GwV5M9sqBFwjujIeD7CB98wVPju4SDqNiEL+R1u+pnIA0kkefO4D4VyKvpr
7R/WXmqZI4Ae/HEtcRd8+5Z4FvapPw==
=7ez3
-----END PGP SIGNATURE-----
Merge tag 'slab-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
"This time it's mostly random cleanups and fixes, with two performance
fixes that might have significant impact, but limited to systems
experiencing particular bad corner case scenarios rather than general
performance improvements.
The memcg hook changes are going through the mm tree due to
dependencies.
- Prevent stalls when reading /proc/slabinfo (Jianfeng Wang)
This fixes the long-standing problem that can happen with workloads
that have alloc/free patterns resulting in many partially used
slabs (in e.g. dentry cache). Reading /proc/slabinfo will traverse
the long partial slab list under spinlock with disabled irqs and
thus can stall other processes or even trigger the lockup
detection. The traversal is only done to count free objects so that
<active_objs> column can be reported along with <num_objs>.
To avoid affecting fast paths with another shared counter
(attempted in the past) or complex partial list traversal schemes
that allow rescheduling, the chosen solution resorts to
approximation - when the partial list is over 10000 slabs long, we
will only traverse first 5000 slabs from head and tail each and use
the average of those to estimate the whole list. Both head and tail
are used as the slabs near head to tend to have more free objects
than the slabs towards the tail.
It is expected the approximation should not break existing
/proc/slabinfo consumers. The <num_objs> field is still accurate
and reflects the overall kmem_cache footprint. The <active_objs>
was already imprecise due to cpu and percpu-partial slabs, so can't
be relied upon to determine exact cache usage. The difference
between <active_objs> and <num_objs> is mainly useful to determine
the slab fragmentation, and that will be possible even with the
approximation in place.
- Prevent allocating many slabs when a NUMA node is full (Chen Jun)
Currently, on NUMA systems with a node under significantly bigger
pressure than other nodes, the fallback strategy may result in each
kmalloc_node() that can't be safisfied from the preferred node, to
allocate a new slab on a fallback node, and not reuse the slabs
already on that node's partial list.
This is now fixed and partial lists of fallback nodes are checked
even for kmalloc_node() allocations. It's still preferred to
allocate a new slab on the requested node before a fallback, but
only with a GFP_NOWAIT attempt, which will fail quickly when the
node is under a significant memory pressure.
- More SLAB removal related cleanups (Xiu Jianfeng, Hyunmin Lee)
- Fix slub_kunit self-test with hardened freelists (Guenter Roeck)
- Mark racy accesses for KCSAN (linke li)
- Misc cleanups (Xiongwei Song, Haifeng Xu, Sangyun Kim)"
* tag 'slab-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slub: remove the check for NULL kmalloc_caches
mm/slub: create kmalloc 96 and 192 caches regardless cache size order
mm/slub: mark racy access on slab->freelist
slub: use count_partial_free_approx() in slab_out_of_memory()
slub: introduce count_partial_free_approx()
slub: Set __GFP_COMP in kmem_cache by default
mm/slub: remove duplicate initialization for early_kmem_cache_node_alloc()
mm/slub: correct comment in do_slab_free()
mm/slub, kunit: Use inverted data to corrupt kmem cache
mm/slub: simplify get_partial_node()
mm/slub: add slub_get_cpu_partial() helper
mm/slub: remove the check of !kmem_cache_has_cpu_partial()
mm/slub: Reduce memory consumption in extreme scenarios
mm/slub: mark racy accesses on slab->slabs
mm/slub: remove dummy slabinfo functions
This series provides native one-byte and two-byte cmpxchg() support
for sparc32 and parisc, courtesy of Al Viro. This support is provided
by the same hashed-array-of-locks technique used for the other atomic
operations provided for these two platforms.
This series also provides emulated one-byte cmpxchg() support for csky
using a new cmpxchg_emu_u8() function that uses a four-byte cmpxchg()
to emulate the one-byte variant.
Similar patches for emulation of one-byte cmpxchg() for arc, sh, and
xtensa have not yet received maintainer acks, so they are slated for
the v6.11 merge window.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmY/gZ8THHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jFIjD/0Uu4VZZN96jYbSaDbC5aAkEHg/swBK
6OVn+yspLOvkebVZlSfus+7rc5VUrxT3GA/gvAWEQsUlPqpYg6Qja/efFpPPRjIq
lwkFE5HFgE0J4lBo9p78ggm6Hx60WUPlNg9uS23qURZbFTx5TYQyAdzXw9HlYzr8
jg5IuTtO5L5AZzR2ocDRh4A5sqfcBJCVdVsKO+XzdFLLtgum+kJY7StYLPdY8VtL
pIV3+ZQENoiwzE+wccnCb2R/4kt6jsEDShlpV4VEfv76HwbjBdvSq4jEg4jS2N3/
AIyThclD97AEdbbM1oJ3oZdjD3GLGVPhVFfiMSGD5HGA+JVJPjJe2it4o+xY7CIR
sSdI/E3Rs67qgaga6t2vHygDZABOwgNLAsc4VwM7X6I20fRixkYVc7aVOTnAPzmr
15iaFd/T7fLKJcC3m/IXb9iNdlfe0Op4+YVD0lOTWmzIk80Xgf45a39u1VFlqQvh
CLIZG3IdmuxXSWjOmk70iokzJgoSmBriGLbAT3K++pzGYUN/BNQs6XRR77BczFsX
CbZTZKnEWZMR1U0UWa/TbvUKcsVBZTYebJSvJOG2/+oVqayzvwYfBsE/vWZcI72K
XEEpKY9ZPDf/gCs/G4OFWt2QPJ0PL+Nt4UZDr5Khrqgo1PwN0uIXstA4mnJ0WjqQ
sGiACjdTXk4h0w==
=AEPy
-----END PGP SIGNATURE-----
Merge tag 'cmpxchg.2024.05.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull cmpxchg updates from Paul McKenney:
"Provide one-byte and two-byte cmpxchg() support on sparc32, parisc,
and csky
This provides native one-byte and two-byte cmpxchg() support for
sparc32 and parisc, courtesy of Al Viro. This support is provided by
the same hashed-array-of-locks technique used for the other atomic
operations provided for these two platforms.
There is also emulated one-byte cmpxchg() support for csky using a new
cmpxchg_emu_u8() function that uses a four-byte cmpxchg() to emulate
the one-byte variant.
Similar patches for emulation of one-byte cmpxchg() for arc, sh, and
xtensa have not yet received maintainer acks, so they are slated for
the v6.11 merge window"
* tag 'cmpxchg.2024.05.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
csky: Emulate one-byte cmpxchg
lib: Add one-byte emulation function
parisc: add u16 support to cmpxchg()
parisc: add missing export of __cmpxchg_u8()
parisc: unify implementations of __cmpxchg_u{8,32,64}
parisc: __cmpxchg_u32(): lift conversion into the callers
sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes
sparc32: unify __cmpxchg_u{32,64}
sparc32: make the first argument of __cmpxchg_u64() volatile u64 *
sparc32: make __cmpxchg_u32() return u32
More fixups for this cycle's page_owner updates. And a few userfaultfd
fixes. Otherwise, random singletons - see the individual changelogs for
details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZj6AhAAKCRDdBJ7gKXxA
jsvHAQCoSRI4qM0a6j5Fs2Q+B1in+kGWTe50q5Rd755VgolEsgD8CUASDgZ2Qv7g
yDAlluXMv4uvA4RqkZvDiezsENzYQw0=
=MApd
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-05-10-13-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"18 hotfixes, 7 of which are cc:stable.
More fixups for this cycle's page_owner updates. And a few userfaultfd
fixes. Otherwise, random singletons - see the individual changelogs
for details"
* tag 'mm-hotfixes-stable-2024-05-10-13-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mailmap: add entry for Barry Song
selftests/mm: fix powerpc ARCH check
mailmap: add entry for John Garry
XArray: set the marks correctly when splitting an entry
selftests/vDSO: fix runtime errors on LoongArch
selftests/vDSO: fix building errors on LoongArch
mm,page_owner: don't remove __GFP_NOLOCKDEP in add_stack_record_to_list
fs/proc/task_mmu: fix uffd-wp confusion in pagemap_scan_pmd_entry()
fs/proc/task_mmu: fix loss of young/dirty bits during pagemap scan
mm/vmalloc: fix return value of vb_alloc if size is 0
mm: use memalloc_nofs_save() in page_cache_ra_order()
kmsan: compiler_types: declare __no_sanitize_or_inline
lib/test_xarray.c: fix error assumptions on check_xa_multi_store_adv_add()
tools: fix userspace compilation with new test_xarray changes
MAINTAINERS: update URL's for KEYS/KEYRINGS_INTEGRITY and TPM DEVICE DRIVER
mm: page_owner: fix wrong information in dump_page_owner
maple_tree: fix mas_empty_area_rev() null pointer dereference
mm/userfaultfd: reset ptes when close() for wr-protected ones
Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for
checked-in source files. It is merely a convention without any functional
difference. In fact, $(obj) and $(src) are exactly the same, as defined
in scripts/Makefile.build:
src := $(obj)
When the kernel is built in a separate output directory, $(src) does
not accurately reflect the source directory location. While Kbuild
resolves this discrepancy by specifying VPATH=$(srctree) to search for
source files, it does not cover all cases. For example, when adding a
header search path for local headers, -I$(srctree)/$(src) is typically
passed to the compiler.
This introduces inconsistency between upstream and downstream Makefiles
because $(src) is used instead of $(srctree)/$(src) for the latter.
To address this inconsistency, this commit changes the semantics of
$(src) so that it always points to the directory in the source tree.
Going forward, the variables used in Makefiles will have the following
meanings:
$(obj) - directory in the object tree
$(src) - directory in the source tree (changed by this commit)
$(objtree) - the top of the kernel object tree
$(srctree) - the top of the kernel source tree
Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced
with $(src).
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Implement AES in CFB mode using the existing, mostly constant-time
generic AES library implementation. This will be used by the TPM code
to encrypt communications with TPM hardware, which is often a discrete
component connected using sniffable wires or traces.
While a CFB template does exist, using a skcipher is a major pain for
non-performance critical synchronous crypto where the algorithm is known
at compile time and the data is in contiguous buffers with valid kernel
virtual addresses.
Tested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lore.kernel.org/all/20230216201410.15010-1-James.Bottomley@HansenPartnership.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
35d92abfba ("net: hns3: fix kernel crash when devlink reload during initialization")
2a1a1a7b5f ("net: hns3: add command queue trace for hns3")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function claims to return the bitmap size, if Nth bit doesn't exist.
This rule is violated in inline case because the fns() that is used
there doesn't know anything about size of the bitmap.
So, relax this requirement to '>= size', and make the outline
implementation a bit cheaper.
All in-tree kernel users of find_nth_bit() are safe against that.
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Closes: https://lore.kernel.org/all/Zi50cAgR8nZvgLa3@yury-ThinkPad/T/#m6da806a0525e74dcc91f35e5f20766ed4e853e8a
Signed-off-by: Yury Norov <yury.norov@gmail.com>
The test now is limited to be compiled as a module. There's no technical
reason for it. Now that the test bears some performance benchmarks, it
would be reasonable to run it at kernel load time, before userspace
starts, to reduce possible jitter.
Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Introduce a benchmark test for the fns(). It measures the total time
taken by fns() to process 10,000 test data generated using
get_random_bytes() for each n in the range [0, BITS_PER_LONG).
example:
test_bitops: fns: 7637268 ns
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Rasmus Villemoes <linux@rasmusvillemoes.dk>
CC: David Laight <David.Laight@aculab.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Suggested-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Add a new variant of closure_sync_timeout() that takes a timeout.
Note that when this returns -ETIME the closure will still be waiting on
something, i.e. it's not safe to return if you've got a stack allocated
closure.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Update header inclusions to follow IWYU (Include What You Use) principle.
Link: https://lkml.kernel.org/r/20240423192529.3249134-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alain Volmat <alain.volmat@foss.st.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jernej Skrabec <jernej.skrabec@gmail.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Patrice Chotard <patrice.chotard@foss.st.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samuel Holland <samuel@sholland.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Sean Young <sean@mess.org>
Cc: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Allow the Dynamic Interrupt Moderation (DIM) library to be built as a
module. This is particularly useful in an Android GKI (Google Kernel
Image) configuration where everything is built as a module, including
Ethernet controller drivers. Having to build DIMLIB into the kernel
image with potentially no user is wasteful.
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20240506175040.410446-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit c72a870926 added a mutex to prevent kunit tests from running
concurrently. Unfortunately that mutex gets locked during module load
regardless of whether the module actually has any kunit tests. This
causes a problem for kunit tests that might need to load other kernel
modules (e.g. gss_krb5_test loading the camellia module).
So check to see if there are actually any tests to run before locking
the kunit_run_lock mutex.
Fixes: c72a870926 ("kunit: add ability to run tests after boot using debugfs")
Reported-by: Nico Pache <npache@redhat.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Use KUNIT_DEFINE_ACTION_WRAPPER macro to define the 'kfree' and
'string_stream_destroy' wrappers for kunit_add_action.
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Acked-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The NULL dereference tests in kunit_fault deliberately trigger a kernel
BUG(), and therefore print the associated stack trace, even when the
test passes. This is both annoying (as it bloats the test output), and
can confuse some test harnesses, which assume any BUG() is a failure.
Allow these tests to be specifically disabled (without disabling all
of KUnit's other tests), by placing them behind the
CONFIG_KUNIT_FAULT_TEST Kconfig option. This is enabled by default, but
can be set to 'n' to disable the test. An empty 'kunit_fault' suite is
left behind, which will automatically be marked 'skipped'.
As the fault tests already were disabled under UML (as they weren't
compatible with its fault handling), we can simply adapt those
conditions, and add a dependency on !UML for our new option.
Suggested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/928249cc-e027-4f7f-b43f-502f99a1ea63@roeck-us.net/
Fixes: 82b0beff3497 ("kunit: Add tests for fault")
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
kunit_init_device() should unregister the device on bus register error,
but mistakenly it tries to unregister the bus.
Unregister the device instead of the bus.
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Fixes: d03c720e03 ("kunit: Add APIs for managing devices")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
KUnit's try-catch infrastructure now uses vfork_done, which is always
set to a valid completion when a kthread is created, but which is set to
NULL once the thread terminates. This creates a race condition, where
the kthread exits before we can wait on it.
Keep a copy of vfork_done, which is taken before we wake_up_process()
and so valid, and wait on that instead.
Fixes: 93533996100c ("kunit: Handle test faults")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/lkml/20240410102710.35911-1-naresh.kamboju@linaro.org/
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Acked-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Tested-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add a test case to check NULL pointer dereference and make sure it would
result as a failed test.
The full kunit_fault test suite is marked as skipped when run on UML
because it would result to a kernel panic.
Tested with:
./tools/testing/kunit/kunit.py run --arch x86_64 kunit_fault
./tools/testing/kunit/kunit.py run --arch arm64 \
--cross_compile=aarch64-linux-gnu- kunit_fault
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Rae Moar <rmoar@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240408074625.65017-8-mic@digikod.net
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
This helps identify the location of test faults with opportunistic calls
to _KUNIT_SAVE_LOC(). This can be useful while writing tests or
debugging them. It is possible to call KUNIT_SUCCESS() to explicit save
last location.
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Rae Moar <rmoar@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240408074625.65017-7-mic@digikod.net
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fix KUNIT_SUCCESS() calls to pass a test argument.
This is a no-op for now because this macro does nothing, but it will be
required for the next commit.
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Rae Moar <rmoar@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240408074625.65017-6-mic@digikod.net
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Previously, when a kernel test thread crashed (e.g. NULL pointer
dereference, general protection fault), the KUnit test hanged for 30
seconds and exited with a timeout error.
Fix this issue by waiting on task_struct->vfork_done instead of the
custom kunit_try_catch.try_completion, and track the execution state by
initially setting try_result with -EINTR and only setting it to 0 if
the test passed.
Fix kunit_generic_run_threadfn_adapter() signature by returning 0
instead of calling kthread_complete_and_exit(). Because thread's exit
code is never checked, always set it to 0 to make it clear. To make
this explicit, export kthread_exit() for KUnit tests built as module.
Fix the -EINTR error message, which couldn't be reached until now.
This is tested with a following patch.
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Gow <davidgow@google.com>
Tested-by: Rae Moar <rmoar@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240408074625.65017-5-mic@digikod.net
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
There is a race condition when a kthread finishes after the deadline and
before the call to kthread_stop(), which may lead to use after free.
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Fixes: adf5054570 ("kunit: fix UAF when run kfence test case test_gfpzero")
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240408074625.65017-3-mic@digikod.net
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Previously, if a thread creation failed (e.g. -ENOMEM), the function was
called (kunit_catch_run_case or kunit_catch_run_case_cleanup) without
marking the test as failed. Instead, fill try_result with the error
code returned by kthread_run(), which will mark the test as failed and
print "internal error occurred...".
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240408074625.65017-2-mic@digikod.net
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The commit 63b1898fff ("XArray: Disallow sibling entries of nodes")
modified the xas_descend function in such a way that it was no longer
being compiled as an inline function, because it increased the size of
xas_descend(), and the compiler no longer optimizes it as inline. This
had a negative impact on performance, xas_descend is called frequently to
traverse downwards in the xarray tree, making it a hot function.
Inlining xas_descend has been shown to significantly improve performance
by approximately 4.95% in the iozone write test.
Machine: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
#iozone i 0 -i 1 -s 64g -r 16m -f /test/tmptest
Before this patch:
kB reclen write rewrite read reread
67108864 16384 2230080 3637689 6315197 5496027
After this patch:
kB reclen write rewrite read reread
67108864 16384 2340360 3666175 6272401 5460782
Percentage change:
4.95% 0.78% -0.68% -0.64%
This patch introduces inlining to the xas_descend function. While this
change increases the size of lib/xarray.o, the performance gains in
critical workloads make this an acceptable trade-off.
Size comparison before and after patch:
.text .data .bss file
0x3502 0 0 lib/xarray.o.before
0x3602 0 0 lib/xarray.o.after
Link: https://lkml.kernel.org/r/20240416061628.3768901-1-leo.lilong@huawei.com
Signed-off-by: Long Li <leo.lilong@huawei.com>
Cc: Hou Tao <houtao1@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: yangerkun <yangerkun@huawei.com>
Cc: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If we created a new node to replace an entry which had search marks set,
we were setting the search mark on every entry in that node. That works
fine when we're splitting to order 0, but when splitting to a larger
order, we must not set the search marks on the sibling entries.
Link: https://lkml.kernel.org/r/20240501153120.4094530-1-willy@infradead.org
Fixes: c010d47f10 ("mm: thp: split huge page to any lower order pages")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/ZjFGCOYk3FK_zVy3@bombadil.infradead.org
Tested-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
While testing lib/test_xarray in userspace I've noticed we can fail with:
make -C tools/testing/radix-tree
./tools/testing/radix-tree/xarray
BUG at check_xa_multi_store_adv_add:749
xarray: 0x55905fb21a00x head 0x55905fa1d8e0x flags 0 marks 0 0 0
0: 0x55905fa1d8e0x
xarray: ../../../lib/test_xarray.c:749: check_xa_multi_store_adv_add: Assertion `0' failed.
Aborted
We get a failure with a BUG_ON(), and that is because we actually can
fail due to -ENOMEM, the check in xas_nomem() will fix this for us so
it makes no sense to expect no failure inside the loop. So modify the
check and since this is also useful for instructional purposes clarify
the situation.
The check for XA_BUG_ON(xa, xa_load(xa, index) != p) is already done
at the end of the loop so just remove the bogus on inside the loop.
With this we now pass the test in both kernel and userspace:
In userspace:
./tools/testing/radix-tree/xarray
XArray: 149092856 of 149092856 tests passed
In kernel space:
XArray: 148257077 of 148257077 tests passed
Link: https://lkml.kernel.org/r/20240423192221.301095-3-mcgrof@kernel.org
Fixes: a60cc288a1 ("test_xarray: add tests for advanced multi-index use")
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently the code calls mas_start() followed by mas_data_end() if the
maple state is MA_START, but mas_start() may return with the maple state
node == NULL. This will lead to a null pointer dereference when checking
information in the NULL node, which is done in mas_data_end().
Avoid setting the offset if there is no node by waiting until after the
maple state is checked for an empty or single entry state.
A user could trigger the events to cause a kernel oops by unmapping all
vmas to produce an empty maple tree, then mapping a vma that would cause
the scenario described above.
Link: https://lkml.kernel.org/r/20240422203349.2418465-1-Liam.Howlett@oracle.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Marius Fleischer <fleischermarius@gmail.com>
Closes: https://lore.kernel.org/lkml/CAJg=8jyuSxDL6XvqEXY_66M20psRK2J53oBTP+fjV5xpW2-R6w@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAJg=8jyuSxDL6XvqEXY_66M20psRK2J53oBTP+fjV5xpW2-R6w@mail.gmail.com/
Tested-by: Marius Fleischer <fleischermarius@gmail.com>
Tested-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Here are some small char/misc/other driver fixes and new device ids for
6.9-rc7 that resolve some reported problems.
Included in here are:
- iio driver fixes
- mei driver fix and new device ids
- dyndbg bugfix
- pvpanic-pci driver bugfix
- slimbus driver bugfix
- fpga new device id
All have been in linux-next with no reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZjdD2Q8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yk38wCeJeUXW4/yQ4BTj7cHir0aOowVs+UAnAxCUwzt
NpooaVg3v9tzLtvAOp1O
=YfmA
-----END PGP SIGNATURE-----
Merge tag 'char-misc-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc/other driver fixes and new device ids
for 6.9-rc7 that resolve some reported problems.
Included in here are:
- iio driver fixes
- mei driver fix and new device ids
- dyndbg bugfix
- pvpanic-pci driver bugfix
- slimbus driver bugfix
- fpga new device id
All have been in linux-next with no reported problems"
* tag 'char-misc-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
slimbus: qcom-ngd-ctrl: Add timeout for wait operation
dyndbg: fix old BUG_ON in >control parser
misc/pvpanic-pci: register attributes via pci_driver
fpga: dfl-pci: add PCI subdevice ID for Intel D5005 card
mei: me: add lunar lake point M DID
mei: pxp: match against PCI_CLASS_DISPLAY_OTHER
iio:imu: adis16475: Fix sync mode setting
iio: accel: mxc4005: Reset chip on probe() and resume()
iio: accel: mxc4005: Interrupt handling fixes
dt-bindings: iio: health: maxim,max30102: fix compatible check
iio: pressure: Fixes SPI support for BMP3xx devices
iio: pressure: Fixes BME280 SPI driver data
Cross-merge networking fixes after downstream PR.
Conflicts:
include/linux/filter.h
kernel/bpf/core.c
66e13b615a ("bpf: verifier: prevent userspace memory access")
d503a04f8b ("bpf: Add support for certain atomics in bpf_arena to x86 JIT")
https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Relatively calm week, likely due to public holiday in most places.
No known outstanding regressions.
Current release - regressions:
- rxrpc: fix wrong alignmask in __page_frag_alloc_align()
- eth: e1000e: change usleep_range to udelay in PHY mdic access
Previous releases - regressions:
- gro: fix udp bad offset in socket lookup
- bpf: fix incorrect runtime stat for arm64
- tipc: fix UAF in error path
- netfs: fix a potential infinite loop in extract_user_to_sg()
- eth: ice: ensure the copied buf is NUL terminated
- eth: qeth: fix kernel panic after setting hsuid
Previous releases - always broken:
- bpf:
- verifier: prevent userspace memory access
- xdp: use flags field to disambiguate broadcast redirect
- bridge: fix multicast-to-unicast with fraglist GSO
- mptcp: ensure snd_nxt is properly initialized on connect
- nsh: fix outer header access in nsh_gso_segment().
- eth: bcmgenet: fix racing registers access
- eth: vxlan: fix stats counters.
Misc:
- a bunch of MAINTAINERS file updates
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmYzaRsSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkh70P/jzsTsvzHspu3RUwcsyvWpSoJPcxP2tF
5SKR66o8sbSjB5I26zUi/LtRZgbPO32GmLN2Y8GvP74h9lwKdDo4AY4volZKCT6f
lRG6GohvMa0lSPSn1fti7CKVzDOsaTHvLz3uBBr+Xb9ITCKh+I+zGEEDGj/47SQN
tmDWHPF8OMs2ezmYS5NqRIQ3CeRz6uyLmEoZhVm4SolypZ18oEg7GCtL3u6U48n+
e3XB3WwKl0ZxK8ipvPgUDwGIDuM5hEyAaeNon3zpYGoqitRsRITUjULpb9dT4DtJ
Jma3OkarFJNXgm4N/p/nAtQ9AdiAloF9ivZXs2t0XCdrrUZJUh05yuikoX+mLfpw
GedG2AbaVl6mdqNkrHeyf5SXKuiPgeCLVfF2xMjS0l1kFbY+Bt8BqnRSdOrcoUG0
zlSzBeBtajttMdnalWv2ZshjP8uo/NjXydUjoVNwuq8xGO5wP+zhNnwhOvecNyUg
t7q2PLokahlz4oyDqyY/7SQ0hSEndqxOlt43I6CthoWH0XkS83nTPdQXcTKQParD
ntJUk5QYwefUT1gimbn/N8GoP7a1+ysWiqcf/7+SNm932gJGiDt36+HOEmyhIfIG
IDWTWJJW64SnPBIUw59MrG7hMtbfaiZiFQqeUJQpFVrRr+tg5z5NUZ5thA+EJVd8
qiVDvmngZFiv
=f6KY
-----END PGP SIGNATURE-----
Merge tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf.
Relatively calm week, likely due to public holiday in most places. No
known outstanding regressions.
Current release - regressions:
- rxrpc: fix wrong alignmask in __page_frag_alloc_align()
- eth: e1000e: change usleep_range to udelay in PHY mdic access
Previous releases - regressions:
- gro: fix udp bad offset in socket lookup
- bpf: fix incorrect runtime stat for arm64
- tipc: fix UAF in error path
- netfs: fix a potential infinite loop in extract_user_to_sg()
- eth: ice: ensure the copied buf is NUL terminated
- eth: qeth: fix kernel panic after setting hsuid
Previous releases - always broken:
- bpf:
- verifier: prevent userspace memory access
- xdp: use flags field to disambiguate broadcast redirect
- bridge: fix multicast-to-unicast with fraglist GSO
- mptcp: ensure snd_nxt is properly initialized on connect
- nsh: fix outer header access in nsh_gso_segment().
- eth: bcmgenet: fix racing registers access
- eth: vxlan: fix stats counters.
Misc:
- a bunch of MAINTAINERS file updates"
* tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
MAINTAINERS: mark MYRICOM MYRI-10G as Orphan
MAINTAINERS: remove Ariel Elior
net: gro: add flush check in udp_gro_receive_segment
net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb
ipv4: Fix uninit-value access in __ip_make_skb()
s390/qeth: Fix kernel panic after setting hsuid
vxlan: Pull inner IP header in vxlan_rcv().
tipc: fix a possible memleak in tipc_buf_append
tipc: fix UAF in error path
rxrpc: Clients must accept conn from any address
net: core: reject skb_copy(_expand) for fraglist GSO skbs
net: bridge: fix multicast-to-unicast with fraglist GSO
mptcp: ensure snd_nxt is properly initialized on connect
e1000e: change usleep_range to udelay in PHY mdic access
net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341
cxgb4: Properly lock TX queue for the selftest.
rxrpc: Fix using alignmask being zero for __page_frag_alloc_align()
vxlan: Add missing VNI filter counter update in arp_reduce().
vxlan: Fix racy device stats updates.
net: qede: use return from qede_parse_actions()
...
Several other "dup"-style interfaces could use the __realloc_size()
attribute. (As a reminder to myself and others: "realloc" is used here
instead of "alloc" because the "alloc_size" attribute implies that the
memory contents are uninitialized. Since we're copying contents into the
resulting allocation, it must use "realloc_size" to avoid confusing the
compiler's optimization passes.)
Add KUnit test coverage where possible. (KUnit still does not have the
ability to manipulate userspace memory.)
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Link: https://lore.kernel.org/r/20240502145218.it.729-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Weak references are references that are permitted to remain unsatisfied
in the final link. This means they cannot be implemented using place
relative relocations, resulting in GOT entries when using position
independent code generation.
The notes section should always exist, so the weak annotations can be
omitted.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The __alloc_size annotation for kmemdup() was getting disabled under
KUnit testing because the replaced fortify_panic macro implementation
was using "return NULL" as a way to survive the sanity checking. But
having the chance to return NULL invalidated __alloc_size, so kmemdup
was not passing the __builtin_dynamic_object_size() tests any more:
[23:26:18] [PASSED] fortify_test_alloc_size_kmalloc_const
[23:26:19] # fortify_test_alloc_size_kmalloc_dynamic: EXPECTATION FAILED at lib/fortify_kunit.c:265
[23:26:19] Expected __builtin_dynamic_object_size(p, 1) == expected, but
[23:26:19] __builtin_dynamic_object_size(p, 1) == -1 (0xffffffffffffffff)
[23:26:19] expected == 11 (0xb)
[23:26:19] __alloc_size() not working with __bdos on kmemdup("hello there", len, gfp)
[23:26:19] [FAILED] fortify_test_alloc_size_kmalloc_dynamic
Normal builds were not affected: __alloc_size continued to work there.
Use a zero-sized allocation instead, which allows __alloc_size to
behave.
Fixes: 4ce615e798 ("fortify: Provide KUnit counters for failure testing")
Fixes: fa4a3f86d4 ("fortify: Add KUnit tests for runtime overflows")
Link: https://lore.kernel.org/r/20240501232937.work.532-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Profiling shows that calling nr_possible_cpus() in objpool_pop() takes
a noticeable amount of CPU (when profiled on 80-core machine), as we
need to recalculate number of set bits in a CPU bit mask. This number
can't change, so there is no point in paying the price for recalculating
it. As such, cache this value in struct objpool_head and use it in
objpool_pop().
On the other hand, cached pool->nr_cpus isn't necessary, as it's not
used in hot path and is also a pretty trivial value to retrieve. So drop
pool->nr_cpus in favor of using nr_cpu_ids everywhere. This way the size
of struct objpool_head remains the same, which is a nice bonus.
Same BPF selftests benchmarks were used to evaluate the effect. Using
changes in previous patch (inlining of objpool_pop/objpool_push) as
baseline, here are the differences:
BASELINE
========
kretprobe : 9.937 ± 0.174M/s
kretprobe-multi: 10.440 ± 0.108M/s
AFTER
=====
kretprobe : 10.106 ± 0.120M/s (+1.7%)
kretprobe-multi: 10.515 ± 0.180M/s (+0.7%)
Link: https://lore.kernel.org/all/20240424215214.3956041-3-andrii@kernel.org/
Cc: Matt (Qiang) Wu <wuqiang.matt@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
objpool_push() and objpool_pop() are very performance-critical functions
and can be called very frequently in kretprobe triggering path.
As such, it makes sense to allow compiler to inline them completely to
eliminate function calls overhead. Luckily, their logic is quite well
isolated and doesn't have any sprawling dependencies.
This patch moves both objpool_push() and objpool_pop() into
include/linux/objpool.h and marks them as static inline functions,
enabling inlining. To avoid anyone using internal helpers
(objpool_try_get_slot, objpool_try_add_slot), rename them to use leading
underscores.
We used kretprobe microbenchmark from BPF selftests (bench trig-kprobe
and trig-kprobe-multi benchmarks) running no-op BPF kretprobe/kretprobe.multi
programs in a tight loop to evaluate the effect. BPF own overhead in
this case is minimal and it mostly stresses the rest of in-kernel
kretprobe infrastructure overhead. Results are in millions of calls per
second. This is not super scientific, but shows the trend nevertheless.
BEFORE
======
kretprobe : 9.794 ± 0.086M/s
kretprobe-multi: 10.219 ± 0.032M/s
AFTER
=====
kretprobe : 9.937 ± 0.174M/s (+1.5%)
kretprobe-multi: 10.440 ± 0.108M/s (+2.2%)
Link: https://lore.kernel.org/all/20240424215214.3956041-2-andrii@kernel.org/
Cc: Matt (Qiang) Wu <wuqiang.matt@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add fortify tests for memcpy() and memmove(). This can use a similar
method to the fortify_panic() replacement, only we can do it for what
was the WARN_ONCE(), which can be redefined.
Since this is primarily testing the fortify behaviors of the memcpy()
and memmove() defenses, the tests for memcpy() and memmove() are
identical.
Link: https://lore.kernel.org/r/20240429194342.2421639-3-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
When running KUnit fortify tests, we're already doing precise tracking
of which warnings are getting hit. Don't fill the logs with WARNs unless
we've been explicitly built with DEBUG enabled.
Link: https://lore.kernel.org/r/20240429194342.2421639-2-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Fix a BUG_ON from 2009. Even if it looks "unreachable" (I didn't
really look), lets make sure by removing it, doing pr_err and return
-EINVAL instead.
Cc: stable <stable@kernel.org>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20240429193145.66543-2-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZi9+AAAKCRDbK58LschI
g0nEAP487m7L0nLVriC2oIOWsi29tklW3etm6DO7gmGRGIHgrgEAnMyV1xBj3bGj
v6jJwDcybCym1hLx+1x1JCZ4eoAFswE=
=xbna
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-04-29
We've added 147 non-merge commits during the last 32 day(s) which contain
a total of 158 files changed, 9400 insertions(+), 2213 deletions(-).
The main changes are:
1) Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86 BPF JIT. This allows
inlining per-CPU array and hashmap lookups
and the bpf_get_smp_processor_id() helper, from Andrii Nakryiko.
2) Add BPF link support for sk_msg and sk_skb programs, from Yonghong Song.
3) Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86 instruction,
from Alexei Starovoitov.
4) Add support for passing mark with bpf_fib_lookup helper,
from Anton Protopopov.
5) Add a new bpf_wq API for deferring events and refactor sleepable
bpf_timer code to keep common code where possible,
from Benjamin Tissoires.
6) Fix BPF_PROG_TEST_RUN infra with regards to bpf_dummy_struct_ops programs
to check when NULL is passed for non-NULLable parameters,
from Eduard Zingerman.
7) Harden the BPF verifier's and/or/xor value tracking,
from Harishankar Vishwanathan.
8) Introduce crypto kfuncs to make BPF programs able to utilize the kernel
crypto subsystem, from Vadim Fedorenko.
9) Various improvements to the BPF instruction set standardization doc,
from Dave Thaler.
10) Extend libbpf APIs to partially consume items from the BPF ringbuffer,
from Andrea Righi.
11) Bigger batch of BPF selftests refactoring to use common network helpers
and to drop duplicate code, from Geliang Tang.
12) Support bpf_tail_call_static() helper for BPF programs with GCC 13,
from Jose E. Marchesi.
13) Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled,
from Kumar Kartikeya Dwivedi.
14) Allow invoking BPF kfuncs from BPF_PROG_TYPE_SYSCALL programs,
from David Vernet.
15) Extend the BPF verifier to allow different input maps for a given
bpf_for_each_map_elem() helper call in a BPF program, from Philo Lu.
16) Add support for PROBE_MEM32 and bpf_addr_space_cast instructions
for riscv64 and arm64 JITs to enable BPF Arena, from Puranjay Mohan.
17) Shut up a false-positive KMSAN splat in interpreter mode by unpoison
the stack memory, from Martin KaFai Lau.
18) Improve xsk selftest coverage with new tests on maximum and minimum
hardware ring size configurations, from Tushar Vyavahare.
19) Various ReST man pages fixes as well as documentation and bash completion
improvements for bpftool, from Rameez Rehman & Quentin Monnet.
20) Fix libbpf with regards to dumping subsequent char arrays,
from Quentin Deslandes.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (147 commits)
bpf, docs: Clarify PC use in instruction-set.rst
bpf_helpers.h: Define bpf_tail_call_static when building with GCC
bpf, docs: Add introduction for use in the ISA Internet Draft
selftests/bpf: extend BPF_SOCK_OPS_RTT_CB test for srtt and mrtt_us
bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args
selftests/bpf: dummy_st_ops should reject 0 for non-nullable params
bpf: check bpf_dummy_struct_ops program params for test runs
selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops
selftests/bpf: adjust dummy_st_ops_success to detect additional error
bpf: mark bpf_dummy_struct_ops.test_1 parameter as nullable
selftests/bpf: Add ring_buffer__consume_n test.
bpf: Add bpf_guard_preempt() convenience macro
selftests: bpf: crypto: add benchmark for crypto functions
selftests: bpf: crypto skcipher algo selftests
bpf: crypto: add skcipher to bpf crypto
bpf: make common crypto API for TC/XDP programs
bpf: update the comment for BTF_FIELDS_MAX
selftests/bpf: Fix wq test.
selftests/bpf: Use make_sockaddr in test_sock_addr
selftests/bpf: Use connect_to_addr in test_sock_addr
...
====================
Link: https://lore.kernel.org/r/20240429131657.19423-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZiwdfQAKCRDbK58LschI
g1oqAP9mjayeIHCfYMQZa2eevy1PmVlgdNdFdMDWZFS/pHv9cgD/ZdmGzbUDKCAQ
Y/KiTajitZw3kxtHX45v8/Ugtlsh9Qg=
=Ewiw
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-04-26
We've added 12 non-merge commits during the last 22 day(s) which contain
a total of 14 files changed, 168 insertions(+), 72 deletions(-).
The main changes are:
1) Fix BPF_PROBE_MEM in verifier and JIT to skip loads from vsyscall page,
from Puranjay Mohan.
2) Fix a crash in XDP with devmap broadcast redirect when the latter map
is in process of being torn down, from Toke Høiland-Jørgensen.
3) Fix arm64 and riscv64 BPF JITs to properly clear start time for BPF
program runtime stats, from Xu Kuohai.
4) Fix a sockmap KCSAN-reported data race in sk_psock_skb_ingress_enqueue,
from Jason Xing.
5) Fix BPF verifier error message in resolve_pseudo_ldimm64,
from Anton Protopopov.
6) Fix missing DEBUG_INFO_BTF_MODULES Kconfig menu item,
from Andrii Nakryiko.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Test PROBE_MEM of VSYSCALL_ADDR on x86-64
bpf, x86: Fix PROBE_MEM runtime load check
bpf: verifier: prevent userspace memory access
xdp: use flags field to disambiguate broadcast redirect
arm32, bpf: Reimplement sign-extension mov instruction
riscv, bpf: Fix incorrect runtime stats
bpf, arm64: Fix incorrect runtime stats
bpf: Fix a verifier verbose message
bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue
MAINTAINERS: bpf: Add Lehui and Puranjay as riscv64 reviewers
MAINTAINERS: Update email address for Puranjay Mohan
bpf, kconfig: Fix DEBUG_INFO_BTF_MODULES Kconfig definition
====================
Link: https://lore.kernel.org/r/20240426224248.26197-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The kv*() family of tests were accidentally freeing with vfree() instead
of kvfree(). Use kvfree() instead.
Fixes: 9124a26401 ("kunit/fortify: Validate __alloc_size attribute results")
Link: https://lore.kernel.org/r/20240425230619.work.299-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
post-6.8 issues or aren't considered suitable for backporting.
All except one of these are for MM. I see no particular theme - it's
singletons all over.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZiwPZwAKCRDdBJ7gKXxA
jmcQAPkB6UT/rBUMvFZb1dom9R6SDYl5ZBr20Vj1HvfakCLxmQEAqEd0N7QoWvKS
hKNCMDujiEKqDUWeUaJen4cqXFFE2Qg=
=1wP7
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-04-26-13-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"11 hotfixes. 8 are cc:stable and the remaining 3 (nice ratio!) address
post-6.8 issues or aren't considered suitable for backporting.
All except one of these are for MM. I see no particular theme - it's
singletons all over"
* tag 'mm-hotfixes-stable-2024-04-26-13-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()
selftests: mm: protection_keys: save/restore nr_hugepages value from launch script
stackdepot: respect __GFP_NOLOCKDEP allocation flag
hugetlb: check for anon_vma prior to folio allocation
mm: zswap: fix shrinker NULL crash with cgroup_disable=memory
mm: turn folio_test_hugetlb into a PageType
mm: support page_mapcount() on page_has_type() pages
mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros
mm/hugetlb: fix missing hugetlb_lock for resv uncharge
selftests: mm: fix unused and uninitialized variable warning
selftests/harness: remove use of LINE_MAX
Fix extract_user_to_sg() so that it will break out of the loop if
iov_iter_extract_pages() returns 0 rather than looping around forever.
[Note that I've included two fixes lines as the function got moved to a
different file and renamed]
Fixes: 85dd2c8ff3 ("netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator")
Fixes: f5f82cd187 ("Move netfs_extract_iter_to_sg() to lib/scatterlist.c")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: netfs@lists.linux.dev
Link: https://lore.kernel.org/r/1967121.1714034372@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In __sbitmap_queue_get_batch(), map->word is read several times, and
update atomically using atomic_long_try_cmpxchg(). But the first two read
of map->word is not protected.
This patch moves the statement val = READ_ONCE(map->word) forward,
eliminating unprotected accesses to map->word within the function.
It is aimed at reducing the number of benign races reported by KCSAN in
order to focus future debugging effort on harmful races.
Signed-off-by: linke li <lilinke99@qq.com>
Link: https://lore.kernel.org/r/tencent_0B517C25E519D3D002194E8445E86C04AD0A@qq.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
gcc can warn when a string is too long to fit into the strncpy()
destination buffer, as it is here depending on the function arguments:
inlined from 'test_hexdump_prepare_test.constprop' at /home/arnd/arm-soc/lib/test_hexdump.c:116:3:
include/linux/fortify-string.h:108:33: error: '__builtin_strncpy' output truncated copying between 0 and 32 bytes from a string of length 32 [-Werror=stringop-truncation]
108 | #define __underlying_strncpy __builtin_strncpy
| ^
include/linux/fortify-string.h:187:16: note: in expansion of macro '__underlying_strncpy'
187 | return __underlying_strncpy(p, q, size);
| ^~~~~~~~~~~~~~~~~~~~
The intention here is to copy exactly 'l' bytes without any padding or
NUL-termination, so the most logical change is to use memcpy(), just as
a previous change adapted the other output from strncpy() to memcpy().
Link: https://lkml.kernel.org/r/20240409140059.3806717-2-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Justin Stitt <justinstitt@google.com>
Cc: Alexey Starikovskiy <astarikovskiy@suse.de>
Cc: Bob Moore <robert.moore@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Len Brown <lenb@kernel.org>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Update header inclusions to follow IWYU (Include What You Use) principle.
Link: https://lkml.kernel.org/r/20240403104820.557487-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "devres: A couple of cleanups".
A couple of ad-hoc cleanups. No functional changes intended.
This patch (of 2):
The devm_*() APIs are supposed to be called during the ->probe() stage.
Many drivers (especially new ones) have switched to use dev_err_probe()
for error messaging for the sake of unification. Let's do the same in the
devres APIs.
Link: https://lkml.kernel.org/r/20240403104820.557487-1-andriy.shevchenko@linux.intel.com
Link: https://lkml.kernel.org/r/20240403104820.557487-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In a future patch HAS_IOPORT=n will disable inb()/outb() and friends at
compile time. We thus need to add HAS_IOPORT as dependency for those
drivers using them.
Link: https://lkml.kernel.org/r/20240403132547.762429-2-schnelle@linux.ibm.com
Co-developed-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This change strips the full path of the script generating
lib/oid_registry_data.c to just lib/build_OID_registry. The motivation
for this change is Yocto emitting a build warning
File /usr/src/debug/linux-lxatac/6.7-r0/lib/oid_registry_data.c in package linux-lxatac-src contains reference to TMPDIR [buildpaths]
So this change brings us one step closer to make the build result
reproducible independent of the build path.
Link: https://lkml.kernel.org/r/20240313211957.884561-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Instead of doing multiple tree walks, do one optimism range check with
lock hold, and exit if raced with another insertion. If a shadow exists,
check it with a new xas_get_order helper before releasing the lock to
avoid redundant tree walks for getting its order.
Drop the lock and do the allocation only if a split is needed.
In the best case, it only need to walk the tree once. If it needs to
alloc and split, 3 walks are issued (One for first ranged conflict check
and order retrieving, one for the second check after allocation, one for
the insert after split).
Testing with 4K pages, in an 8G cgroup, with 16G brd as block device:
echo 3 > /proc/sys/vm/drop_caches
fio -name=cached --numjobs=16 --filename=/mnt/test.img \
--buffered=1 --ioengine=mmap --rw=randread --time_based \
--ramp_time=30s --runtime=5m --group_reporting
Before:
bw ( MiB/s): min= 1027, max= 3520, per=100.00%, avg=2445.02, stdev=18.90, samples=8691
iops : min=263001, max=901288, avg=625924.36, stdev=4837.28, samples=8691
After (+7.3%):
bw ( MiB/s): min= 493, max= 3947, per=100.00%, avg=2625.56, stdev=25.74, samples=8651
iops : min=126454, max=1010681, avg=672142.61, stdev=6590.48, samples=8651
Test result with THP (do a THP randread then switch to 4K page in hope it
issues a lot of splitting):
echo 3 > /proc/sys/vm/drop_caches
fio -name=cached --numjobs=16 --filename=/mnt/test.img \
--buffered=1 --ioengine=mmap -thp=1 --readonly \
--rw=randread --time_based --ramp_time=30s --runtime=10m \
--group_reporting
fio -name=cached --numjobs=16 --filename=/mnt/test.img \
--buffered=1 --ioengine=mmap \
--rw=randread --time_based --runtime=5s --group_reporting
Before:
bw ( KiB/s): min= 4141, max=14202, per=100.00%, avg=7935.51, stdev=96.85, samples=18976
iops : min= 1029, max= 3548, avg=1979.52, stdev=24.23, samples=18976·
READ: bw=4545B/s (4545B/s), 4545B/s-4545B/s (4545B/s-4545B/s), io=64.0KiB (65.5kB), run=14419-14419msec
After (+12.5%):
bw ( KiB/s): min= 4611, max=15370, per=100.00%, avg=8928.74, stdev=105.17, samples=19146
iops : min= 1151, max= 3842, avg=2231.27, stdev=26.29, samples=19146
READ: bw=4635B/s (4635B/s), 4635B/s-4635B/s (4635B/s-4635B/s), io=64.0KiB (65.5kB), run=14137-14137msec
The performance is better for both 4K (+7.5%) and THP (+12.5%) cached read.
Link: https://lkml.kernel.org/r/20240415171857.19244-5-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It can be used after xas_load to check the order of loaded entries.
Compared to xa_get_order, it saves an XA_STATE and avoid a rewalk.
Added new test for xas_get_order, to make the test work, we have to export
xas_get_order with EXPORT_SYMBOL_GPL.
Also fix a sparse warning by checking the slot value with xa_entry instead
of accessing it directly, as suggested by Matthew Wilcox.
[kasong@tencent.com: simplify comment, sparse warning fix, per Matthew Wilcox]
Link: https://lkml.kernel.org/r/20240416071722.45997-4-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20240415171857.19244-4-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The /proc/allocinfo file exposes a tremendous about of information about
kernel build details, memory allocations (obviously), and potentially even
image layout (due to ordering). As this is intended to be consumed by
system owners (like /proc/slabinfo), use the same file permissions as
there: 0400.
Link: https://lkml.kernel.org/r/20240425200844.work.184-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
To store code tag for every slab object, a codetag reference is embedded
into slabobj_ext when CONFIG_MEM_ALLOC_PROFILING=y.
Link: https://lkml.kernel.org/r/20240321163705.3067592-23-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The highest memory overhead from memory allocation profiling comes from
page_ext objects. This overhead exists even if the feature is disabled
but compiled-in. To avoid it, introduce an early boot parameter that
prevents page_ext object creation. The new boot parameter is a tri-state
with possible values of 0|1|never. When it is set to "never" the memory
allocation profiling support is disabled, and overhead is minimized
(currently no page_ext objects are allocated, in the future more overhead
might be eliminated). As a result we also lose ability to enable memory
allocation profiling at runtime (because there is no space to store
alloctag references). Runtime sysctrl becomes read-only if the early boot
parameter was set to "never". Note that the default value of this boot
parameter depends on the CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
configuration. When CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n the
boot parameter is set to "never", therefore eliminating any overhead.
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y results in boot parameter
being set to 1 (enabled). This allows distributions to avoid any overhead
by setting CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n config and with
no changes to the kernel command line.
We reuse sysctl.vm.mem_profiling boot parameter name in order to avoid
introducing yet another control. This change turns it into a tri-state
early boot parameter.
Link: https://lkml.kernel.org/r/20240321163705.3067592-16-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Introduce helper functions to easily instrument page allocators by storing
a pointer to the allocation tag associated with the code that allocated
the page in a page_ext field.
Link: https://lkml.kernel.org/r/20240321163705.3067592-15-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Introduce CONFIG_MEM_ALLOC_PROFILING which provides definitions to easily
instrument memory allocators. It registers an "alloc_tags" codetag type
with /proc/allocinfo interface to output allocation tag information when
the feature is enabled.
CONFIG_MEM_ALLOC_PROFILING_DEBUG is provided for debugging the memory
allocation profiling instrumentation.
Memory allocation profiling can be enabled or disabled at runtime using
/proc/sys/vm/mem_profiling sysctl when CONFIG_MEM_ALLOC_PROFILING_DEBUG=n.
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT enables memory allocation
profiling by default.
[surenb@google.com: Documentation/filesystems/proc.rst: fix allocinfo title]
Link: https://lkml.kernel.org/r/20240326073813.727090-1-surenb@google.com
[surenb@google.com: do limited memory accounting for modules with ARCH_NEEDS_WEAK_PER_CPU]
Link: https://lkml.kernel.org/r/20240402180933.1663992-2-surenb@google.com
[klarasmodin@gmail.com: explicitly include irqflags.h in alloc_tag.h]
Link: https://lkml.kernel.org/r/20240407133252.173636-1-klarasmodin@gmail.com
[surenb@google.com: fix alloc_tag_init() to prevent passing NULL to PTR_ERR()]
Link: https://lkml.kernel.org/r/20240417003349.2520094-1-surenb@google.com
Link: https://lkml.kernel.org/r/20240321163705.3067592-14-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Klara Modin <klarasmodin@gmail.com>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Skip freeing module's data section if there are non-zero allocation tags
because otherwise, once these allocations are freed, the access to their
code tag would cause UAF.
Link: https://lkml.kernel.org/r/20240321163705.3067592-13-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add basic infrastructure to support code tagging which stores tag common
information consisting of the module name, function, file name and line
number. Provide functions to register a new code tag type and navigate
between code tags.
Link: https://lkml.kernel.org/r/20240321163705.3067592-11-surenb@google.com
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The kcalloc() in dmirror_device_evict_chunk() will return null if the
physical memory has run out. As a result, if src_pfns or dst_pfns is
dereferenced, the null pointer dereference bug will happen.
Moreover, the device is going away. If the kcalloc() fails, the pages
mapping a chunk could not be evicted. So add a __GFP_NOFAIL flag in
kcalloc().
Finally, as there is no need to have physically contiguous memory, Switch
kcalloc() to kvcalloc() in order to avoid failing allocations.
Link: https://lkml.kernel.org/r/20240312005905.9939-1-duoming@zju.edu.cn
Fixes: b2ef9f5a5c ("mm/hmm/test: add selftest driver for HMM")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Cc: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When generating Runtime Calls, Clang doesn't respect the -mregparm=3
option used on i386. Hopefully this will be fixed correctly in Clang 19:
https://github.com/llvm/llvm-project/pull/89707
but we need to fix this for earlier Clang versions today. Force the
calling convention to use non-register arguments.
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Closes: https://github.com/KSPP/linux/issues/350
Link: https://lore.kernel.org/r/20240424224026.it.216-kees@kernel.org
Acked-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Introduce cpumask_first_and_and() to get intersection between 3 cpumasks,
free of any intermediate cpumask variable. Instead, cpumask_first_and_and()
works in-place with all inputs and produces desired output directly.
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20240416085454.3547175-2-dawei.li@shingroup.cn
The "type_name" character array was still marked as a 1-element array.
While we don't validate strings used in format arguments yet, let's fix
this before it causes trouble some future day.
Link: https://lore.kernel.org/r/20240424162739.work.492-kees@kernel.org
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
It is more logical to have the strtomem() test in string_kunit.c instead
of the memcpy() suite. Move it to live with memtostr().
Signed-off-by: Kees Cook <keescook@chromium.org>
Another ambiguous use of strncpy() is to copy from strings that may not
be NUL-terminated. These cases depend on having the destination buffer
be explicitly larger than the source buffer's maximum size, having
the size of the copy exactly match the source buffer's maximum size,
and for the destination buffer to get explicitly NUL terminated.
This usually happens when parsing protocols or hardware character arrays
that are not guaranteed to be NUL-terminated. The code pattern is
effectively this:
char dest[sizeof(src) + 1];
strncpy(dest, src, sizeof(src));
dest[sizeof(dest) - 1] = '\0';
In practice it usually looks like:
struct from_hardware {
...
char name[HW_NAME_SIZE] __nonstring;
...
};
struct from_hardware *p = ...;
char name[HW_NAME_SIZE + 1];
strncpy(name, p->name, HW_NAME_SIZE);
name[NW_NAME_SIZE] = '\0';
This cannot be replaced with:
strscpy(name, p->name, sizeof(name));
because p->name is smaller and not NUL-terminated, so FORTIFY will
trigger when strnlen(p->name, sizeof(name)) is used. And it cannot be
replaced with:
strscpy(name, p->name, sizeof(p->name));
because then "name" may contain a 1 character early truncation of
p->name.
Provide an unambiguous interface for converting a maybe not-NUL-terminated
string to a NUL-terminated string, with compile-time buffer size checking
so that it can never fail at runtime: memtostr() and memtostr_pad(). Also
add KUnit tests for both.
Link: https://lore.kernel.org/r/20240410023155.2100422-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
We want the tty fixes in here as well, and it resolves a merge conflict
in:
drivers/tty/serial/serial_core.c
as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Complete switching the __iowriteXX_copy() routines over to use #define and
arch provided inline/macro functions instead of weak symbols.
S390 has an implementation that simply calls another memcpy
function. Inline this so the callers don't have to do two jumps.
Link: https://lore.kernel.org/r/3-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Start switching iomap_copy routines over to use #define and arch provided
inline/macro functions instead of weak symbols.
Inline functions allow more compiler optimization and this is often a
driver hot path.
x86 has the only weak implementation for __iowrite32_copy(), so replace it
with a static inline containing the same single instruction inline
assembly. The compiler will generate the "mov edx,ecx" in a more optimal
way.
Remove iomap_copy_64.S
Link: https://lore.kernel.org/r/1-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The KUnit convention for test names is AREA_test_WHAT. Adjust the string
test names to follow this pattern.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Link: https://lore.kernel.org/r/20240419140155.3028912-5-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Move the strcat() tests into string_kunit.c. Remove the separate
Kconfig and Makefile rule.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Link: https://lore.kernel.org/r/20240419140155.3028912-4-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
The test naming convention differs between string_kunit.c and
strcat_kunit.c. Move "test" to the beginning of the function name.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Link: https://lore.kernel.org/r/20240419140155.3028912-3-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Move the strscpy() tests into string_kunit.c. Remove the separate
Kconfig and Makefile rule.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Link: https://lore.kernel.org/r/20240419140155.3028912-2-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
In preparation for moving the strscpy_kunit.c tests into string_kunit.c,
rename "tc" to "strscpy_check" for better readability.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Link: https://lore.kernel.org/r/20240419140155.3028912-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
- Fix potential static_command_line buffer overrun. Currently we allocate
the memory for static_command_line based on "boot_command_line", but it
will copy "command_line" into it. So we use the length of "command_line"
instead of "boot_command_line" (as previously we did).
- Use memblock_free_late() in xbc_exit() instead of memblock_free() after
the buddy system is initialized.
- Fix a kerneldoc warning.
-----BEGIN PGP SIGNATURE-----
iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmYgN1kbHG1hc2FtaS5o
aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8b/yEH/1FFgb7UJDtQLbtHl5/b
bcxLbSzfb/N37Bc+sE/AKZYrt5QAMjaOmdtzQz9kdLtycxWcQinne4jqGxd6zfTU
UIisfDjEZr46/Rs5sJg+5i8wWrud1TJOmlMsqiSVcorl0f/wE4S7PqgYXRNWZ0p+
KipjuCCV43ITmVjsiq2NxfZGDaWzow/EJXwZzpQkJE1zaU13w2nzgzg64JW3f/lf
Dx/o9jlYEoLkCjiQJ6XaRuTpHbPP1grozSMbvE3z1WnxCaiFHlzXGi6WUhto+pTu
vt/pUrIFYE7k0IFHAVEgBjOkfCm5y9FwOdPLqwy3harQ5ek9D6h6bFnDhbZw7I27
6V8=
=e2c5
-----END PGP SIGNATURE-----
Merge tag 'bootconfig-fixes-v6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig fixes from Masami Hiramatsu:
- Fix potential static_command_line buffer overrun.
Currently we allocate the memory for static_command_line based on
"boot_command_line", but it will copy "command_line" into it. So we
use the length of "command_line" instead of "boot_command_line" (as
we previously did)
- Use memblock_free_late() in xbc_exit() instead of memblock_free()
after the buddy system is initialized
- Fix a kerneldoc warning
* tag 'bootconfig-fixes-v6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
bootconfig: Fix the kerneldoc of _xbc_exit()
bootconfig: use memblock_free_late to free xbc memory to buddy
init/main.c: Fix potential static_command_line memory overflow
Currently, str*cmp functions (strcmp, strncmp, strcasecmp and
strncasecmp) are not covered with tests. Extend the `string_kunit.c`
test by adding the test cases for them.
This patch adds 8 more test cases:
1) strcmp test
2) strcmp test on long strings (2048 chars)
3) strncmp test
4) strncmp test on long strings (2048 chars)
5) strcasecmp test
6) strcasecmp test on long strings
7) strncasecmp test
8) strncasecmp test on long strings
These test cases aim at covering as many edge cases as possible,
including the tests on empty strings, situations when the different
symbol is placed at the end of one of the strings, etc.
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Link: https://lore.kernel.org/r/20240417233033.717596-1-ivan.orlov0322@gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Currently, there are two comments with same name "64-bit ATOMIC magnitudes",
the second one should be "32-bit ATOMIC magnitudes" based on the context.
Signed-off-by: Chen Pei <cp0613@linux.alibaba.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240415081928.17440-1-cp0613@linux.alibaba.com
With the previous change, struct dqs->stall_thrs will be in the hot path
(at queue side), even if DQS is disabled.
The other fields accessed in this function (last_obj_cnt and num_queued)
are in the first cache line, let's move this field (stall_thrs) to the
very first cache line, since there is a hole there.
This does not change the structure size, since it moves an short (2
bytes) to 4-bytes whole in the first cache line.
This is the new structure format now:
struct dql {
unsigned int num_queued;
unsigned int last_obj_cnt;
...
short unsigned int stall_thrs;
/* XXX 2 bytes hole, try to pack */
...
/* --- cacheline 1 boundary (64 bytes) --- */
...
/* Longest stall detected, reported to user */
short unsigned int stall_max;
/* XXX 2 bytes hole, try to pack */
};
Also, read the stall_thrs (now in the very first cache line) earlier,
together with dql->num_queued (also in the first cache line).
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240411192241.2498631-5-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The following softlockup is caused by interrupt storm, but it cannot be
identified from the call tree. Because the call tree is just a snapshot
and doesn't fully capture the behavior of the CPU during the soft lockup.
watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
...
Call trace:
__do_softirq+0xa0/0x37c
__irq_exit_rcu+0x108/0x140
irq_exit+0x14/0x20
__handle_domain_irq+0x84/0xe0
gic_handle_irq+0x80/0x108
el0_irq_naked+0x50/0x58
Therefore, it is necessary to report CPU utilization during the
softlockup_threshold period (report once every sample_period, for a total
of 5 reportings), like this:
watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
CPU#28 Utilization every 4s during lockup:
#1: 0% system, 0% softirq, 100% hardirq, 0% idle
#2: 0% system, 0% softirq, 100% hardirq, 0% idle
#3: 0% system, 0% softirq, 100% hardirq, 0% idle
#4: 0% system, 0% softirq, 100% hardirq, 0% idle
#5: 0% system, 0% softirq, 100% hardirq, 0% idle
...
This is helpful in determining whether an interrupt storm has occurred or
in identifying the cause of the softlockup. The criteria for determination
are as follows:
a. If the hardirq utilization is high, then interrupt storm should be
considered and the root cause cannot be determined from the call tree.
b. If the softirq utilization is high, then the call might not necessarily
point at the root cause.
c. If the system utilization is high, then analyzing the root
cause from the call tree is possible in most cases.
The mechanism requires a considerable amount of global storage space
when configured for the maximum number of CPUs. Therefore, adding a
SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob that defaults to "yes"
if the max number of CPUs is <= 128.
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Liu Song <liusong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240411074134.30922-5-yaoma@linux.alibaba.com
Current release - new code bugs:
- netfilter: complete validation of user input
- mlx5: disallow SRIOV switchdev mode when in multi-PF netdev
Previous releases - regressions:
- core: fix u64_stats_init() for lockdep when used repeatedly in one file
- ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
- bluetooth: fix memory leak in hci_req_sync_complete()
- batman-adv: avoid infinite loop trying to resize local TT
- drv: geneve: fix header validation in geneve[6]_xmit_skb
- drv: bnxt_en: fix possible memory leak in bnxt_rdma_aux_device_init()
- drv: mlx5: offset comp irq index in name by one
- drv: ena: avoid double-free clearing stale tx_info->xdpf value
- drv: pds_core: fix pdsc_check_pci_health deadlock
Previous releases - always broken:
- xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
- bluetooth: fix setsockopt not validating user input
- af_unix: clear stale u->oob_skb.
- nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies
- drv: virtio_net: fix guest hangup on invalid RSS update
- drv: mlx5e: Fix mlx5e_priv_init() cleanup flow
- dsa: mt7530: trap link-local frames regardless of ST Port State
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmYXyoQSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOk72wQAJJ9DQra9b/8S3Zla1dutBcznSCxruas
vWrpgIZiT3Aw5zUmZUZn+rNP8xeWLBK78Yv4m236B8/D3Kji2uMbVrjUAApBBcHr
/lLmctZIhDHCoJCYvRTC/VOVPuqbbbmxOmx6rNvry93iNAiHAnBdCUlOYzMNPzJz
6XIGtztFP0ICtt9owFtQRnsPeKhZJ5DoxqgE9KS2Pmb9PU99i1bEShpwLwB5I83S
yTHKUY5W0rknkQZTW1gbv+o3dR0iFy7LZ+1FItJ/UzH0bG6JmcqzSlH5mZZJCc/L
5LdUwtwMmKG2Kez/vKr1DAwTeAyhwVU+d+Hb28QXiO0kAYbjbOgNXse1st3RwDt5
YKMKlsmR+kgPYLcvs9df2aubNSRvi2utwIA2kuH33HxBYF5PfQR5PTGeR21A+cKo
wvSit8aMaGFTPJ7rRIzkNaPdIHSvPMKYcXV/T8EPvlOHzi5GBX0qHWj99JO9Eri+
VFci+FG3HCPHK8v683g/WWiiVNx/IHMfNbcukes1oDFsCeNo7KZcnPY+zVhtdvvt
QBnvbAZGKeDXMbnHZLB3DCR3ENHWTrJzC3alLDp3/uFC79VKtfIRO2wEX3gkrN8S
JHsdYU13Yp1ERaNjUeq7Sqk2OGLfsBt4HSOhcK8OPPgE5rDRON5UPjkuNvbaEiZY
Morzaqzerg1B
=a9bB
-----END PGP SIGNATURE-----
Merge tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth.
Current release - new code bugs:
- netfilter: complete validation of user input
- mlx5: disallow SRIOV switchdev mode when in multi-PF netdev
Previous releases - regressions:
- core: fix u64_stats_init() for lockdep when used repeatedly in one
file
- ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
- bluetooth: fix memory leak in hci_req_sync_complete()
- batman-adv: avoid infinite loop trying to resize local TT
- drv: geneve: fix header validation in geneve[6]_xmit_skb
- drv: bnxt_en: fix possible memory leak in
bnxt_rdma_aux_device_init()
- drv: mlx5: offset comp irq index in name by one
- drv: ena: avoid double-free clearing stale tx_info->xdpf value
- drv: pds_core: fix pdsc_check_pci_health deadlock
Previous releases - always broken:
- xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
- bluetooth: fix setsockopt not validating user input
- af_unix: clear stale u->oob_skb.
- nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies
- drv: virtio_net: fix guest hangup on invalid RSS update
- drv: mlx5e: Fix mlx5e_priv_init() cleanup flow
- dsa: mt7530: trap link-local frames regardless of ST Port State"
* tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits)
net: ena: Set tx_info->xdpf value to NULL
net: ena: Fix incorrect descriptor free behavior
net: ena: Wrong missing IO completions check order
net: ena: Fix potential sign extension issue
af_unix: Fix garbage collector racing against connect()
net: dsa: mt7530: trap link-local frames regardless of ST Port State
Revert "s390/ism: fix receive message buffer allocation"
net: sparx5: fix wrong config being used when reconfiguring PCS
net/mlx5: fix possible stack overflows
net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev
net/mlx5e: RSS, Block XOR hash with over 128 channels
net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit
net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
net/mlx5e: Fix mlx5e_priv_init() cleanup flow
net/mlx5e: RSS, Block changing channels number when RXFH is configured
net/mlx5: Correctly compare pkt reformat ids
net/mlx5: Properly link new fs rules into the tree
net/mlx5: offset comp irq index in name by one
net/mlx5: Register devlink first under devlink lock
net/mlx5: E-switch, store eswitch pointer before registering devlink_param
...
Architectures are required to provide four-byte cmpxchg() and 64-bit
architectures are additionally required to provide eight-byte cmpxchg().
However, there are cases where one-byte cmpxchg() would be extremely
useful. Therefore, provide cmpxchg_emu_u8() that emulates one-byte
cmpxchg() in terms of four-byte cmpxchg().
Note that this emulations is fully ordered, and can (for example) cause
one-byte cmpxchg_relaxed() to incur the overhead of full ordering.
If this causes problems for a given architecture, that architecture is
free to provide its own lighter-weight primitives.
[ paulmck: Apply Marco Elver feedback. ]
[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
Link: https://lore.kernel.org/all/0733eb10-5e7a-4450-9b8a-527b97c842ff@paulmck-laptop/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Marco Elver <elver@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-arch@vger.kernel.org>
When the kfifo buffer is already dma-mapped, one cannot use the kfifo
API to fill in an SG list.
Add kfifo_dma_in_prepare_mapped() which allows exactly this. A mapped
dma_addr_t is passed and it is filled into provided sgl too. Including
the dma_len.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20240405060826.2521-8-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As a preparatory for dma addresses filling, we need the data offset
instead of virtual pointer in setup_sgl_buf(). So pass the former
instead the latter.
And pointer to fifo is needed in setup_sgl_buf() now too.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20240405060826.2521-7-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
So that one can make any sense of the name.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20240405060826.2521-6-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
First, there is no such user. The only user of this interface is
caam_rng_fill_async() and that uses kfifo_alloc() -> kmalloc().
Second, the implementation does not allow anything else than direct
mapping and kmalloc() (due to virt_to_phys()), anyway.
Therefore, there is no point in having this dead (and complex) code in
the kernel.
Note the setup_sgl_buf() function now boils down to simple sg_set_buf().
That is called twice from setup_sgl() to take care of kfifo buffer
wrap-around.
setup_sgl_buf() will be extended shortly, so keeping it in place.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stefani Seibold <stefani@seibold.net>
Link: https://lore.kernel.org/r/20240405060826.2521-5-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These are helpers which are going to be used in the serial layer. We
need a wrapper around kfifo which provides us with a tail (sometimes
"tail" offset, sometimes a pointer) to the kfifo data. And which returns
count of available data -- but not larger than to the end of the buffer
(hence _linear in the names). I.e. something like CIRC_CNT_TO_END() in
the legacy circ_buf.
This patch adds such two helpers.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20240405060826.2521-4-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is the same as __kfifo_skip_r(), so:
* drop __kfifo_dma_out_finish_r() completely, and
* replace its (only) use by __kfifo_skip_r().
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20240405060826.2521-2-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
U64_MAX is not in include/vdso/limits.h, although that isn't noticed on x86
because x86 includes include/linux/limits.h indirectly. However powerpc is
more selective, resulting in the following build error:
In file included from <command-line>:
lib/vdso/gettimeofday.c: In function 'vdso_calc_ns':
lib/vdso/gettimeofday.c:11:33: error: 'U64_MAX' undeclared
11 | # define VDSO_DELTA_MASK(vd) U64_MAX
| ^~~~~~~
Use ULLONG_MAX instead which will work just as well and is in
include/vdso/limits.h.
Fixes: c8e3a8b6f2 ("vdso: Consolidate vdso_calc_delta()")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240409062639.3393-1-adrian.hunter@intel.com
Closes: https://lore.kernel.org/all/20240409124905.6816db37@canb.auug.org.au/
Kernel timekeeping is designed to keep the change in cycles (since the last
timer interrupt) below max_cycles, which prevents multiplication overflow
when converting cycles to nanoseconds. However, if timer interrupts stop,
the calculation will eventually overflow.
Add protection against that, enabled by config option
CONFIG_GENERIC_VDSO_OVERFLOW_PROTECT. Check against max_cycles, falling
back to a slower higher precision calculation.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240325064023.2997-8-adrian.hunter@intel.com
Add CONFIG_GENERIC_VDSO_OVERFLOW_PROTECT in preparation to add
multiplication overflow protection to the VDSO time getter functions.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240325064023.2997-4-adrian.hunter@intel.com
Consolidate nanoseconds calculation to simplify and reduce code
duplication.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240325064023.2997-3-adrian.hunter@intel.com
Consolidate vdso_calc_delta(), in preparation for further simplification.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240325064023.2997-2-adrian.hunter@intel.com
When CONFIG_NET is disabled, an extra warning shows up for this
unused variable:
lib/checksum_kunit.c:218:18: error: 'expected_csum_ipv6_magic' defined but not used [-Werror=unused-const-variable=]
Replace the #ifdef with an IS_ENABLED() check that makes the compiler's
dead-code-elimination take care of the link failure.
Fixes: f24a70106d ("lib: checksum: Fix build with CONFIG_NET=n")
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3ee34eabac ("lib/stackdepot: fix first entry having a 0-handle")
changed the meaning of the pool_index field to mean "the pool index plus
1". This made the code accessing this field less self-documenting, as
well as causing debuggers such as drgn to not be able to easily remain
compatible with both old and new kernels, because they typically do that
by testing for presence of the new field. Because stackdepot is a
debugging tool, we should make sure that it is debugger friendly.
Therefore, give the field a different name to improve readability as well
as enabling debugger backwards compatibility.
This is needed in 6.9, which would otherwise become an odd release with
the new semantics and old name so debuggers wouldn't recognize the new
semantics there.
Fixes: 3ee34eabac ("lib/stackdepot: fix first entry having a 0-handle")
Link: https://lkml.kernel.org/r/20240402001500.53533-1-pcc@google.com
Link: https://linux-review.googlesource.com/id/Ib3e70c36c1d230dd0a118dc22649b33e768b9f88
Signed-off-by: Peter Collingbourne <pcc@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: Marco Elver <elver@google.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Turns out that due to CONFIG_DEBUG_INFO_BTF_MODULES not having an
explicitly specified "menu item name" in Kconfig, it's basically
impossible to turn it off (see [0]).
This patch fixes the issue by defining menu name for
CONFIG_DEBUG_INFO_BTF_MODULES, which makes it actually adjustable
and independent of CONFIG_DEBUG_INFO_BTF, in the sense that one can
have DEBUG_INFO_BTF=y and DEBUG_INFO_BTF_MODULES=n.
We still keep it as defaulting to Y, of course.
Fixes: 5f9ae91f7c ("kbuild: Build kernel module BTFs if BTF is enabled and pahole supports it")
Reported-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/CAK3+h2xiFfzQ9UXf56nrRRP=p1+iUxGoEP5B+aq9MDT5jLXDSg@mail.gmail.com [0]
Link: https://lore.kernel.org/bpf/20240404220344.3879270-1-andrii@kernel.org
Two failure patterns are seen randomly when running slub_kunit tests with
CONFIG_SLAB_FREELIST_RANDOM and CONFIG_SLAB_FREELIST_HARDENED enabled.
Pattern 1:
# test_clobber_zone: pass:1 fail:0 skip:0 total:1
ok 1 test_clobber_zone
# test_next_pointer: EXPECTATION FAILED at lib/slub_kunit.c:72
Expected 3 == slab_errors, but
slab_errors == 0 (0x0)
# test_next_pointer: EXPECTATION FAILED at lib/slub_kunit.c:84
Expected 2 == slab_errors, but
slab_errors == 0 (0x0)
# test_next_pointer: pass:0 fail:1 skip:0 total:1
not ok 2 test_next_pointer
In this case, test_next_pointer() overwrites p[s->offset], but the data
at p[s->offset] is already 0x12.
Pattern 2:
ok 1 test_clobber_zone
# test_next_pointer: EXPECTATION FAILED at lib/slub_kunit.c:72
Expected 3 == slab_errors, but
slab_errors == 2 (0x2)
# test_next_pointer: pass:0 fail:1 skip:0 total:1
not ok 2 test_next_pointer
In this case, p[s->offset] has a value other than 0x12, but one of the
expected failures is nevertheless missing.
Invert data instead of writing a fixed value to corrupt the cache data
structures to fix the problem.
Fixes: 1f9f78b1b3 ("mm/slub, kunit: add a KUnit test for SLUB debugging functionality")
Cc: Oliver Glitta <glittao@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
CC: Daniel Latypov <dlatypov@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
This is one of the drivers with an unused variable that is marked 'const'.
Adding a __used annotation here avoids the warning and lets us enable
the option by default:
lib/test_ubsan.c:137:28: error: unused variable 'skip_ubsan_array' [-Werror,-Wunused-const-variable]
Fixes: 4a26f49b7b ("ubsan: expand tests and reporting")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240403080702.3509288-3-arnd@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Commit dc34d50366 ("lib: test_bitmap: add compile-time
optimization/evaluations assertions") initially missed __assign_bit(),
which led to that quite a time passed before I realized it doesn't get
optimized at compilation time. Now that it does, add test for that just
to make sure nothing will break one day.
To make things more interesting, use bitmap_complement() and
bitmap_full(), thus checking their compile-time evaluation as well. And
remove the misleading comment mentioning the workaround removed recently
in favor of adding the whole file to GCov exceptions.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of times yet another open coded
`BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge.
Some generic helper is long overdue.
Add one, bitmap_size(), but with one detail.
BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both
divident and divisor are compile-time constants or when the divisor
is not a pow-of-2. When it is however, the compilers sometimes tend
to generate suboptimal code (GCC 13):
48 83 c0 3f add $0x3f,%rax
48 c1 e8 06 shr $0x6,%rax
48 8d 14 c5 00 00 00 00 lea 0x0(,%rax,8),%rdx
%BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does
full division of `nbits + 63` by it and then multiplication by 8.
Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC:
8d 50 3f lea 0x3f(%rax),%edx
c1 ea 03 shr $0x3,%edx
81 e2 f8 ff ff 1f and $0x1ffffff8,%edx
Now it shifts `nbits + 63` by 3 positions (IOW performs fast division
by 8) and then masks bits[2:0]. bloat-o-meter:
add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617)
Clang does it better and generates the same code before/after starting
from -O1, except that with the ALIGN() approach it uses %edx and thus
still saves some bytes:
add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520)
Note that we can't expand DIV_ROUND_UP() by adding a check and using
this approach there, as it's used in array declarations where
expressions are not allowed.
Add this helper to tools/ as well.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pr_err() messages may be treated as errors by some log readers, so let
us only use them for test failures. For non-error messages, replace them
with pr_info().
Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add basic tests ensuring that values can be added at arbitrary positions
of the bitmap, including those spanning into the adjacent unsigned
longs.
Two new performance tests, test_bitmap_read_perf() and
test_bitmap_write_perf(), can be used to assess future performance
improvements of bitmap_read() and bitmap_write():
[ 0.431119][ T1] test_bitmap: Time spent in test_bitmap_read_perf: 615253
[ 0.433197][ T1] test_bitmap: Time spent in test_bitmap_write_perf: 916313
(numbers from a Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz machine running
QEMU).
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmYAlq0eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGYqwH/0fb4pRbVtULpiIK
Cs7/e/IWzRRWLBq+Jj2KVVTxwjyiKFNOq6K/CHHnljIWo1yN2CIWeOgbHfTI0WfN
xmBdJP7OtK8MCN9PwwoWhZxMLcyv4pFCERrrkGa7AD+cdN4j/ytQ3mH5V8f/21fd
rnpQSdpgGXB2SSMHd520Y+e56+gxrrTmsDXjZWM08Wt0bbqAWJrjNe58BMz5hI1t
yQtcgYRTdUuZBn5TMkT99lK9EFQslV38YCo7RUP5D0DWXS1jSfWlgnCD1Nc1ziF4
ps/xPdUMDJAc5Tslg/hgJOciSuLqgMzIUsVgZrKysuu3NhwDY1LDWGORmH1t8E8W
RC25950=
=F+01
-----END PGP SIGNATURE-----
Merge tag 'v6.9-rc1' into sched/core, to pick up fixes and to refresh the branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>