mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-09-04 20:19:47 +08:00
dab55604ae
104 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
f8ffbc365f |
struct fd layout change (and conversion to accessor helpers)
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZvDNmgAKCRBZ7Krx/gZQ 63zrAP9vI0rf55v27twiabe9LnI7aSx5ckoqXxFIFxyT3dOYpQD/bPmoApnWDD3d 592+iDgLsema/H/0/CqfqlaNtDNY8Q0= =HUl5 -----END PGP SIGNATURE----- Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct fd' updates from Al Viro: "Just the 'struct fd' layout change, with conversion to accessor helpers" * tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: add struct fd constructors, get rid of __to_fd() struct fd: representation change introduce fd_file(), convert all accessors to it. |
||
|
|
1da91ea87a |
introduce fd_file(), convert all accessors to it.
For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are very few (3 in linux/file.h,
1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
explicit initializers).
Those can be dealt with in the commit converting to
new layout; accesses to struct fd::file are too many for that.
This commit converts (almost) all of f.file to
fd_file(f). It's not entirely mechanical ('file' is used as
a member name more than just in struct fd) and it does not
even attempt to distinguish the uses in pointer context from
those in boolean context; the latter will be eventually turned
into a separate helper (fd_empty()).
NOTE: mass conversion to fd_empty(), tempting as it
might be, is a bad idea; better do that piecewise in commit
that convert from fdget...() to CLASS(...).
[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
caught by git; fs/stat.c one got caught by git grep]
[fs/xattr.c conflict]
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
||
|
|
2124d84db2 |
module: make waiting for a concurrent module loader interruptible
The recursive aes-arm-bs module load situation reported by Russell King is getting fixed in the crypto layer, but this in the meantime fixes the "recursive load hangs forever" by just making the waiting for the first module load be interruptible. This should now match the old behavior before commit |
||
|
|
cb5b81bc9a |
module: warn about excessively long module waits
Russell King reported that the arm cbc(aes) crypto module hangs when loaded, and Herbert Xu bisected it to commit |
||
|
|
d4e48e3dd4 |
module, bpf: Store BTF base pointer in struct module
...as this will allow split BTF modules with a base BTF representation (rather than the full vmlinux BTF at time of BTF encoding) to resolve their references to kernel types in a way that is more resilient to small changes in kernel types. This will allow modules that are not built every time the kernel is to provide more resilient BTF, rather than have it invalidated every time BTF ids for core kernel types change. Fields are ordered to avoid holes in struct module. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240620091733.1967885-3-alan.maguire@oracle.com |
||
|
|
61307b7be4 |
The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs. Notable
series include:
- Lucas Stach has provided some page-mapping
cleanup/consolidation/maintainability work in the series "mm/treewide:
Remove pXd_huge() API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being allocated:
number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in largely
similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
Weiner has fixed the page allocator's handling of migratetype requests,
with resulting improvements in compaction efficiency.
- In the series "make the hugetlb migration strategy consistent" Baolin
Wang has fixed a hugetlb migration issue, which should improve hugetlb
allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when memory
almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting" Kairui
Song has optimized pagecache insertion, yielding ~10% performance
improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various page->flags
cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
"mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs. This
is a simple first-cut implementation for now. The series is "support
multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in the
series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts in
the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes
the initialization code so that migration between different memory types
works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant driver
in the series "mm: follow_pte() improvements and acrn follow_pte()
fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to folio
in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size THP's
in the series "mm: add per-order mTHP alloc and swpout counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His series
"mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback instrumentation
in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series "Fix
and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
series "Improve anon_vma scalability for anon VMAs". Intel's test bot
reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
=V3R/
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
|
||
|
|
223b5e57d0 |
mm/execmem, arch: convert remaining overrides of module_alloc to execmem
Extend execmem parameters to accommodate more complex overrides of module_alloc() by architectures. This includes specification of a fallback range required by arm, arm64 and powerpc, EXECMEM_MODULE_DATA type required by powerpc, support for allocation of KASAN shadow required by s390 and x86 and support for late initialization of execmem required by arm64. The core implementation of execmem_alloc() takes care of suppressing warnings when the initial allocation fails but there is a fallback range defined. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Song Liu <song@kernel.org> Tested-by: Liviu Dudau <liviu@dudau.co.uk> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
12af2b83d0 |
mm: introduce execmem_alloc() and execmem_free()
module_alloc() is used everywhere as a mean to allocate memory for code. Beside being semantically wrong, this unnecessarily ties all subsystems that need to allocate code, such as ftrace, kprobes and BPF to modules and puts the burden of code allocation to the modules code. Several architectures override module_alloc() because of various constraints where the executable memory can be located and this causes additional obstacles for improvements of code allocation. Start splitting code allocation from modules by introducing execmem_alloc() and execmem_free() APIs. Initially, execmem_alloc() is a wrapper for module_alloc() and execmem_free() is a replacement of module_memfree() to allow updating all call sites to use the new APIs. Since architectures define different restrictions on placement, permissions, alignment and other parameters for memory that can be used by different subsystems that allocate executable memory, execmem_alloc() takes a type argument, that will be used to identify the calling subsystem and to allow architectures define parameters for ranges suitable for that subsystem. No functional changes. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
bc6b94d3ea |
module: make module_memory_{alloc,free} more self-contained
Move the logic related to the memory allocation and freeing into module_memory_alloc() and module_memory_free(). Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
47a92dfbe0 |
lib: prevent module unloading if memory is not freed
Skip freeing module's data section if there are non-zero allocation tags because otherwise, once these allocations are freed, the access to their code tag would cause UAF. Link: https://lkml.kernel.org/r/20240321163705.3067592-13-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
a473573964 |
lib: code tagging module support
Add support for code tagging from dynamically loaded modules. Link: https://lkml.kernel.org/r/20240321163705.3067592-12-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
902861e34c |
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series "mm:
zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is hotplugged
as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving policy
wherein we allocate memory across nodes in a weighted fashion rather
than uniformly. This is beneficial in heterogeneous memory environments
appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the process
has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown situations.
The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings" Ryan
Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's series
"Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page faults.
He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction test",
Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in
our handling of DAX on archiecctures which have virtually aliasing data
caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic
improvements in worst-case mmap_lock hold times during certain
userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability improvements
in his series "Mitigate a vmap lock contention". It realizes a 12x
improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging of
large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages() to
an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which are
configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA
joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx
TMNhHfyiHYDTx/GAV9NXW84tasJSDgA=
=TG55
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series
"mm: zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is
hotplugged as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving
policy wherein we allocate memory across nodes in a weighted fashion
rather than uniformly. This is beneficial in heterogeneous memory
environments appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the
process has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown
situations. The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings"
Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's
series "Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page
faults. He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction
test", Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and
refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
in our handling of DAX on archiecctures which have virtually aliasing
data caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides
dramatic improvements in worst-case mmap_lock hold times during
certain userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability
improvements in his series "Mitigate a vmap lock contention". It
realizes a 12x improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging
of large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages()
to an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which
are configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM
Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
mm/zswap: remove the memcpy if acomp is not sleepable
crypto: introduce: acomp_is_async to expose if comp drivers might sleep
memtest: use {READ,WRITE}_ONCE in memory scanning
mm: prohibit the last subpage from reusing the entire large folio
mm: recover pud_leaf() definitions in nopmd case
selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
selftests/mm: skip uffd hugetlb tests with insufficient hugepages
selftests/mm: dont fail testsuite due to a lack of hugepages
mm/huge_memory: skip invalid debugfs new_order input for folio split
mm/huge_memory: check new folio order when split a folio
mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
mm: fix list corruption in put_pages_list
mm: remove folio from deferred split list before uncharging it
filemap: avoid unnecessary major faults in filemap_fault()
mm,page_owner: drop unnecessary check
mm,page_owner: check for null stack_record before bumping its refcount
mm: swap: fix race between free_swap_and_cache() and swapoff()
mm/treewide: align up pXd_leaf() retval across archs
mm/treewide: drop pXd_large()
...
|
||
|
|
8f8cd6c0a4 |
modules: wait do_free_init correctly
The synchronization here is to ensure the ordering of freeing of a module init so that it happens before W+X checking. It is worth noting it is not that the freeing was not happening, it is just that our sanity checkers raced against the permission checkers which assume init memory is already gone. Commit |
||
|
|
d1909c0221 |
module: Don't ignore errors from set_memory_XX()
set_memory_ro(), set_memory_nx(), set_memory_x() and other helpers can fail and return an error. In that case the memory might not be protected as expected and the module loading has to be aborted to avoid security issues. Check return value of all calls to set_memory_XX() and handle error if any. Add a check to not call set_memory_XX() on NULL pointers as some architectures may not like it allthough numpages is always 0 in that case. This also avoid a useless call to set_vm_flush_reset_perms(). Link: https://github.com/KSPP/linux/issues/7 Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
3559ad395b |
module: Change module_enable_{nx/x/ro}() to more explicit names
It's a bit puzzling to see a call to module_enable_nx() followed by a call to module_enable_x(). This is because one applies on text while the other applies on data. Change name to make that more clear. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
ac88ee7d2b |
module: Use set_memory_rox()
A couple of architectures seem concerned about calling set_memory_ro() and set_memory_x() too frequently and have implemented a version of set_memory_rox(), see commit |
||
|
|
d81f0d7b8b |
kunit: add KUNIT_INIT_TABLE to init linker section
Add KUNIT_INIT_TABLE to the INIT_DATA linker section. Alter the KUnit macros to create init tests: kunit_test_init_section_suites Update lib/kunit/executor.c to run both the suites in KUNIT_TABLE and KUNIT_INIT_TABLE. Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Rae Moar <rmoar@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> |
||
|
|
2abcc4b5a6 |
module: Expose module_init_layout_section()
module_init_layout_section() choses whether the core module loader
considers a section as init or not. This affects the placement of the
exit section when module unloading is disabled. This code will never run,
so it can be free()d once the module has been initialised.
arm and arm64 need to count the number of PLTs they need before applying
relocations based on the section name. The init PLTs are stored separately
so they can be free()d. arm and arm64 both use within_module_init() to
decide which list of PLTs to use when applying the relocation.
Because within_module_init()'s behaviour changes when module unloading
is disabled, both architecture would need to take this into account when
counting the PLTs.
Today neither architecture does this, meaning when module unloading is
disabled there are insufficient PLTs in the init section to load some
modules, resulting in warnings:
| WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc
| Modules linked in: crct10dif_common
| CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : module_emit_plt_entry+0x184/0x1cc
| lr : module_emit_plt_entry+0x94/0x1cc
| sp : ffffffc0803bba60
[...]
| Call trace:
| module_emit_plt_entry+0x184/0x1cc
| apply_relocate_add+0x2bc/0x8e4
| load_module+0xe34/0x1bd4
| init_module_from_file+0x84/0xc0
| __arm64_sys_finit_module+0x1b8/0x27c
| invoke_syscall.constprop.0+0x5c/0x104
| do_el0_svc+0x58/0x160
| el0_svc+0x38/0x110
| el0t_64_sync_handler+0xc0/0xc4
| el0t_64_sync+0x190/0x194
Instead of duplicating module_init_layout_section()s logic, expose it.
Reported-by: Adam Johnston <adam.johnston@arm.com>
Fixes:
|
||
|
|
9011e49d54 |
modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules
It has recently come to my attention that nvidia is circumventing the protection added in |
||
|
|
f196220715 |
module: fix init_module_from_file() error handling
Vegard Nossum pointed out two different problems with the error handling
in init_module_from_file():
(a) the idempotent loading code didn't clean up properly in some error
cases, leaving the on-stack 'struct idempotent' element still in
the hash table
(b) failure to read the module file would nonsensically update the
'invalid_kread_bytes' stat counter with the error value
The first error is quite nasty, in that it can then cause subsequent
idempotent loads of that same file to access stale stack contents of the
previous failure. The case may not happen in any normal situation
(explaining all the "Tested-by's on the original change), and requires
admin privileges, but syzkaller triggers random bad behavior as a
result:
BUG: soft lockup in sys_finit_module
BUG: unable to handle kernel paging request in init_module_from_file
general protection fault in init_module_from_file
INFO: task hung in init_module_from_file
KASAN: out-of-bounds Read in init_module_from_file
KASAN: slab-out-of-bounds Read in init_module_from_file
...
The second error is fairly benign and just leads to nonsensical stats
(and has been around since the debug stats were added).
Vegard also provided a patch for the idempotent loading issue, but I'd
rather re-organize the code and make it more legible using another level
of helper functions than add the usual "goto out" error handling.
Link: https://lore.kernel.org/lkml/20230704100852.23452-1-vegard.nossum@oracle.com/
Fixes:
|
||
|
|
4e3c09e954 |
v6.5-rc1-modules-next
The changes queued up for v6.5-rc1 for modules are pretty tame, mostly code removal of moving of code. Only two minor functional changes are made, the only one which stands out is Sebastian Andrzej Siewior's simplification of module reference counting by removing preempt_disable() and that has been tested on linux-next for well over a month without no regressions. I'm now, I guess, also a kitchen sink for some kallsyms changes. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmScfl0SHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoin8oMP/0DQK+r3BZimknzz6rF0EBStNZ/dIK2W 1Q/r/ER4VKQKYxklc1M74K+7IX8ZDCYxqlaDS9lvAkRDWNC+t69aNZEib2odJleC p6WB30P0JIwfZZC0DS/ct3vrWZTyUhw7aOtvABRmjBfiJ3lFlU092Glvk1w1aFbD UrNRomPu4CujzfmnGj3VGc+HVSOEK0F1/GLm9ClrsR8SzKEpQmH4ALI/ON69B0ea PmL+d1Wyt6WEoH0hlV1TOXNdHUb3ZO1riSSfDYQ7TiG2AM5w1t4n26YRusc16hYU 6Bx4OGt52ZJYR3btsRQlcylF4R5DUo+boDkM0NqEDU/3ciGMg6DgKdHnYCBN1w+X ZO8aXK1MIgF7W6CqSz+8HCsu5CuCos55FgM22dPbpZr3OEFCWemqnV+cYCu1DA+M Gbnn883ZLtt+R+qikD3135s+LxYIvxSuQrj+B3ZoQeIKEtAlyxuhrUJbU0tOns0j 05PrkI8J1FtIysdlNZeIFg752IPtjp/0QNB4R46m40mT16L0TSjEP7c+zcPDryMb 84SdLqh1gis0QZRkoH6JbMBDeT2dtuxqtQ5dTPka4s1mtg3SvRYr53sCJg+gQ8e2 CBW6jgrIf3F4RIMMiSfXpSf4yVVxXxJAEFnGLRXhQ2HkUnk3mdGEfsZc7ucrsnlK f/KwaEzmLD9c =gjKD -----END PGP SIGNATURE----- Merge tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module updates from Luis Chamberlain: "The changes queued up for modules are pretty tame, mostly code removal of moving of code. Only two minor functional changes are made, the only one which stands out is Sebastian Andrzej Siewior's simplification of module reference counting by removing preempt_disable() and that has been tested on linux-next for well over a month without no regressions. I'm now, I guess, also a kitchen sink for some kallsyms changes" [ There was a mis-communication about the concurrent module load changes that I had expected to come through Luis despite me authoring the patch. So some of the module updates were left hanging in the email ether, and I just committed them separately. It's my bad - I should have made it more clear that I expected my own patches to come through the module tree too. Now they missed linux-next, but hopefully that won't cause any issues - Linus ] * tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: kallsyms: make kallsyms_show_value() as generic function kallsyms: move kallsyms_show_value() out of kallsyms.c kallsyms: remove unsed API lookup_symbol_attrs kallsyms: remove unused arch_get_kallsym() helper module: Remove preempt_disable() from module reference counting. |
||
|
|
9b9879fc03 |
modules: catch concurrent module loads, treat them as idempotent
This is the new-and-improved attempt at avoiding huge memory load spikes
when the user space boot sequence tries to load hundreds (or even
thousands) of redundant duplicate modules in parallel.
See commit
|
||
|
|
054a73009c |
module: split up 'finit_module()' into init_module_from_file() helper
This will simplify the next step, where we can then key off the inode to do one idempotent module load. Let's do the obvious re-organization in one step, and then the new code in another. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
db3e33dd8b |
module: fix module load for ia64
Frank reported boot regression in ia64 as: ELILO v3.16 for EFI/IA-64 .. Uncompressing Linux... done Loading file AC100221.initrd.img...done [ 0.000000] Linux version 6.4.0-rc3 (root@x4270) (ia64-linux-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39) #1 SMP Thu May 25 15:52:20 CEST 2023 [ 0.000000] efi: EFI v1.1 by HP [ 0.000000] efi: SALsystab=0x3ee7a000 ACPI 2.0=0x3fe2a000 ESI=0x3ee7b000 SMBIOS=0x3ee7c000 HCDP=0x3fe28000 [ 0.000000] PCDP: v3 at 0x3fe28000 [ 0.000000] earlycon: uart8250 at MMIO 0x00000000f4050000 (options '9600n8') [ 0.000000] printk: bootconsole [uart8250] enabled [ 0.000000] ACPI: Early table checksum verification disabled [ 0.000000] ACPI: RSDP 0x000000003FE2A000 000028 (v02 HP ) [ 0.000000] ACPI: XSDT 0x000000003FE2A02C 0000CC (v01 HP rx2620 00000000 HP 00000000) [...] [ 3.793350] Run /init as init process Loading, please wait... Starting systemd-udevd version 252.6-1 [ 3.951100] ------------[ cut here ]------------ [ 3.951100] WARNING: CPU: 6 PID: 140 at kernel/module/main.c:1547 __layout_sections+0x370/0x3c0 [ 3.949512] Unable to handle kernel paging request at virtual address 1000000000000000 [ 3.951100] Modules linked in: [ 3.951100] CPU: 6 PID: 140 Comm: (udev-worker) Not tainted 6.4.0-rc3 #1 [ 3.956161] (udev-worker)[142]: Oops 11003706212352 [1] [ 3.951774] Hardware name: hp server rx2620 , BIOS 04.29 11/30/2007 [ 3.951774] [ 3.951774] Call Trace: [ 3.958339] Unable to handle kernel paging request at virtual address 1000000000000000 [ 3.956161] Modules linked in: [ 3.951774] [<a0000001000156d0>] show_stack.part.0+0x30/0x60 [ 3.951774] sp=e000000183a67b20 bsp=e000000183a61628 [ 3.956161] [ 3.956161] which bisect to module_memory change [1]. Debug showed that ia64 uses some special sections: __layout_sections: section .got (sh_flags 10000002) matched to MOD_INVALID __layout_sections: section .sdata (sh_flags 10000003) matched to MOD_INVALID __layout_sections: section .sbss (sh_flags 10000003) matched to MOD_INVALID All these sections are loaded to module core memory before [1]. Fix ia64 boot by loading these sections to MOD_DATA (core rw data). [1] commit |
||
|
|
cb0b50b813 |
module: Remove preempt_disable() from module reference counting.
The preempt_disable() section in module_put() was added in commit |
||
|
|
064f4536d1 |
module: avoid allocation if module is already present and ready
The finit_module() system call can create unnecessary virtual memory
pressure for duplicate modules. This is because load_module() can in
the worse case allocate more than twice the size of a module in virtual
memory. This saves at least a full size of the module in wasted vmalloc
space memory by trying to avoid duplicates as soon as we can validate
the module name in the read module structure.
This can only be an issue if a system is getting hammered with userspace
loading modules. There are two ways to load modules typically on systems,
one is the kernel moduile auto-loading (*request_module*() calls in-kernel)
and the other is things like udev. The auto-loading is in-kernel, but that
pings back to userspace to just call modprobe. We already have a way to
restrict the amount of concurrent kernel auto-loads in a given time, however
that still allows multiple requests for the same module to go through
and force two threads in userspace racing to call modprobe for the same
exact module. Even though libkmod which both modprobe and udev does check
if a module is already loaded prior calling finit_module() races are
still possible and this is clearly evident today when you have multiple
CPUs.
To avoid memory pressure for such stupid cases put a stop gap for them.
The *earliest* we can detect duplicates from the modules side of things
is once we have blessed the module name, sadly after the first vmalloc
allocation. We can check for the module being present *before* a secondary
vmalloc() allocation.
There is a linear relationship between wasted virtual memory bytes and
the number of CPU counts. The reason is that udev ends up racing to call
tons of the same modules for each of the CPUs.
We can see the different linear relationships between wasted virtual
memory and CPU count during after boot in the following graph:
+----------------------------------------------------------------------------+
14GB |-+ + + + + *+ +-|
| **** |
| *** |
| ** |
12GB |-+ ** +-|
| ** |
| ** |
| ** |
| ** |
10GB |-+ ** +-|
| ** |
| ** |
| ** |
8GB |-+ ** +-|
waste | ** ### |
| ** #### |
| ** ####### |
6GB |-+ **** #### +-|
| * #### |
| * #### |
| ***** #### |
4GB |-+ ** #### +-|
| ** #### |
| ** #### |
| ** #### |
2GB |-+ ** ##### +-|
| * #### |
| * #### Before ******* |
| **## + + + + After ####### |
+----------------------------------------------------------------------------+
0 50 100 150 200 250 300
CPUs count
On the y-axis we can see gigabytes of wasted virtual memory during boot
due to duplicate module requests which just end up failing. Trying to
infer the slope this ends up being about ~463 MiB per CPU lost prior
to this patch. After this patch we only loose about ~230 MiB per CPU, for
a total savings of about ~233 MiB per CPU. This is all *just on bootup*!
On a 8vcpu 8 GiB RAM system using kdevops and testing against selftests
kmod.sh -t 0008 I see a saving in the *highest* side of memory
consumption of up to ~ 84 MiB with the Linux kernel selftests kmod
test 0008. With the new stress-ng module test I see a 145 MiB difference
in max memory consumption with 100 ops. The stress-ng module ops tests can be
pretty pathalogical -- it is not realistic, however it was used to
finally successfully reproduce issues which are only reported to happen on
system with over 400 CPUs [0] by just usign 100 ops on a 8vcpu 8 GiB RAM
system. Running out of virtual memory space is no surprise given the
above graph, since at least on x86_64 we're capped at 128 MiB, eventually
we'd hit a series of errors and once can use the above graph to
guestimate when. This of course will vary depending on the features
you have enabled. So for instance, enabling KASAN seems to make this
much worse.
The results with kmod and stress-ng can be observed and visualized below.
The time it takes to run the test is also not affected.
The kmod tests 0008:
The gnuplot is set to a range from 400000 KiB (390 Mib) - 580000 (566 Mib)
given the tests peak around that range.
cat kmod.plot
set term dumb
set output fileout
set yrange [400000:580000]
plot filein with linespoints title "Memory usage (KiB)"
Before:
root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008
root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > log-0008-before.txt ^C
root@kmod ~ # sort -n -r log-0008-before.txt | head -1
528732
So ~516.33 MiB
After:
root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008
root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > log-0008-after.txt ^C
root@kmod ~ # sort -n -r log-0008-after.txt | head -1
442516
So ~432.14 MiB
That's about 84 ~MiB in savings in the worst case. The graphs:
root@kmod ~ # gnuplot -e "filein='log-0008-before.txt'; fileout='graph-0008-before.txt'" kmod.plot
root@kmod ~ # gnuplot -e "filein='log-0008-after.txt'; fileout='graph-0008-after.txt'" kmod.plot
root@kmod ~ # cat graph-0008-before.txt
580000 +-----------------------------------------------------------------+
| + + + + + + + |
560000 |-+ Memory usage (KiB) ***A***-|
| |
540000 |-+ +-|
| |
| *A *AA*AA*A*AA *A*AA A*A*A *AA*A*AA*A A |
520000 |-+A*A*AA *AA*A *A*AA*A*AA *A*A A *A+-|
|*A |
500000 |-+ +-|
| |
480000 |-+ +-|
| |
460000 |-+ +-|
| |
| |
440000 |-+ +-|
| |
420000 |-+ +-|
| + + + + + + + |
400000 +-----------------------------------------------------------------+
0 5 10 15 20 25 30 35 40
root@kmod ~ # cat graph-0008-after.txt
580000 +-----------------------------------------------------------------+
| + + + + + + + |
560000 |-+ Memory usage (KiB) ***A***-|
| |
540000 |-+ +-|
| |
| |
520000 |-+ +-|
| |
500000 |-+ +-|
| |
480000 |-+ +-|
| |
460000 |-+ +-|
| |
| *A *A*A |
440000 |-+A*A*AA*A A A*A*AA A*A*AA*A*AA*A*AA*A*AA*AA*A*AA*A*AA-|
|*A *A*AA*A |
420000 |-+ +-|
| + + + + + + + |
400000 +-----------------------------------------------------------------+
0 5 10 15 20 25 30 35 40
The stress-ng module tests:
This is used to run the test to try to reproduce the vmap issues
reported by David:
echo 0 > /proc/sys/vm/oom_dump_tasks
./stress-ng --module 100 --module-name xfs
Prior to this commit:
root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > baseline-stress-ng.txt
root@kmod ~ # sort -n -r baseline-stress-ng.txt | head -1
5046456
After this commit:
root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > after-stress-ng.txt
root@kmod ~ # sort -n -r after-stress-ng.txt | head -1
4896972
5046456 - 4896972
149484
149484/1024
145.98046875000000000000
So this commit using stress-ng reveals saving about 145 MiB in memory
using 100 ops from stress-ng which reproduced the vmap issue reported.
cat kmod.plot
set term dumb
set output fileout
set yrange [4700000:5070000]
plot filein with linespoints title "Memory usage (KiB)"
root@kmod ~ # gnuplot -e "filein='baseline-stress-ng.txt'; fileout='graph-stress-ng-before.txt'" kmod-simple-stress-ng.plot
root@kmod ~ # gnuplot -e "filein='after-stress-ng.txt'; fileout='graph-stress-ng-after.txt'" kmod-simple-stress-ng.plot
root@kmod ~ # cat graph-stress-ng-before.txt
+---------------------------------------------------------------+
5.05e+06 |-+ + A + + + + + + +-|
| * Memory usage (KiB) ***A*** |
| * A |
5e+06 |-+ ** ** +-|
| ** * * A |
4.95e+06 |-+ * * A * A* +-|
| * * A A * * * * A |
| * * * * * * *A * * * A * |
4.9e+06 |-+ * * * A*A * A*AA*A A *A **A **A*A *+-|
| A A*A A * A * * A A * A * ** |
| * ** ** * * * * * * * |
4.85e+06 |-+ A A A ** * * ** *-|
| * * * * ** * |
| * A * * * * |
4.8e+06 |-+ * * * A A-|
| * * * |
4.75e+06 |-+ * * * +-|
| * ** |
| * + + + + + + ** + |
4.7e+06 +---------------------------------------------------------------+
0 5 10 15 20 25 30 35 40
root@kmod ~ # cat graph-stress-ng-after.txt
+---------------------------------------------------------------+
5.05e+06 |-+ + + + + + + + +-|
| Memory usage (KiB) ***A*** |
| |
5e+06 |-+ +-|
| |
4.95e+06 |-+ +-|
| |
| |
4.9e+06 |-+ *AA +-|
| A*AA*A*A A A*AA*AA*A*AA*A A A A*A *AA*A*A A A*AA*AA |
| * * ** * * * ** * *** * |
4.85e+06 |-+* *** * * * * *** A * * +-|
| * A * * ** * * A * * |
| * * * * ** * * |
4.8e+06 |-+* * * A * * * +-|
| * * * A * * |
4.75e+06 |-* * * * * +-|
| * * * * * |
| * + * *+ + + + + * *+ |
4.7e+06 +---------------------------------------------------------------+
0 5 10 15 20 25 30 35 40
[0] https://lkml.kernel.org/r/20221013180518.217405-1-david@redhat.com
Reported-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
||
|
|
df3e764d8e |
module: add debug stats to help identify memory pressure
Loading modules with finit_module() can end up using vmalloc(), vmap()
and vmalloc() again, for a total of up to 3 separate allocations in the
worst case for a single module. We always kernel_read*() the module,
that's a vmalloc(). Then vmap() is used for the module decompression,
and if so the last read buffer is freed as we use the now decompressed
module buffer to stuff data into our copy module. The last allocation is
specific to each architectures but pretty much that's generally a series
of vmalloc() calls or a variation of vmalloc to handle ELF sections with
special permissions.
Evaluation with new stress-ng module support [1] with just 100 ops
is proving that you can end up using GiBs of data easily even with all
care we have in the kernel and userspace today in trying to not load modules
which are already loaded. 100 ops seems to resemble the sort of pressure a
system with about 400 CPUs can create on module loading. Although issues
relating to duplicate module requests due to each CPU inucurring a new
module reuest is silly and some of these are being fixed, we currently lack
proper tooling to help diagnose easily what happened, when it happened
and who likely is to blame -- userspace or kernel module autoloading.
Provide an initial set of stats which use debugfs to let us easily scrape
post-boot information about failed loads. This sort of information can
be used on production worklaods to try to optimize *avoiding* redundant
memory pressure using finit_module().
There's a few examples that can be provided:
A 255 vCPU system without the next patch in this series applied:
Startup finished in 19.143s (kernel) + 7.078s (userspace) = 26.221s
graphical.target reached after 6.988s in userspace
And 13.58 GiB of virtual memory space lost due to failed module loading:
root@big ~ # cat /sys/kernel/debug/modules/stats
Mods ever loaded 67
Mods failed on kread 0
Mods failed on decompress 0
Mods failed on becoming 0
Mods failed on load 1411
Total module size 11464704
Total mod text size 4194304
Failed kread bytes 0
Failed decompress bytes 0
Failed becoming bytes 0
Failed kmod bytes 14588526272
Virtual mem wasted bytes 14588526272
Average mod size 171115
Average mod text size 62602
Average fail load bytes 10339140
Duplicate failed modules:
module-name How-many-times Reason
kvm_intel 249 Load
kvm 249 Load
irqbypass 8 Load
crct10dif_pclmul 128 Load
ghash_clmulni_intel 27 Load
sha512_ssse3 50 Load
sha512_generic 200 Load
aesni_intel 249 Load
crypto_simd 41 Load
cryptd 131 Load
evdev 2 Load
serio_raw 1 Load
virtio_pci 3 Load
nvme 3 Load
nvme_core 3 Load
virtio_pci_legacy_dev 3 Load
virtio_pci_modern_dev 3 Load
t10_pi 3 Load
virtio 3 Load
crc32_pclmul 6 Load
crc64_rocksoft 3 Load
crc32c_intel 40 Load
virtio_ring 3 Load
crc64 3 Load
The following screen shot, of a simple 8vcpu 8 GiB KVM guest with the
next patch in this series applied, shows 226.53 MiB are wasted in virtual
memory allocations which due to duplicate module requests during boot.
It also shows an average module memory size of 167.10 KiB and an an
average module .text + .init.text size of 61.13 KiB. The end shows all
modules which were detected as duplicate requests and whether or not
they failed early after just the first kernel_read*() call or late after
we've already allocated the private space for the module in
layout_and_allocate(). A system with module decompression would reveal
more wasted virtual memory space.
We should put effort now into identifying the source of these duplicate
module requests and trimming these down as much possible. Larger systems
will obviously show much more wasted virtual memory allocations.
root@kmod ~ # cat /sys/kernel/debug/modules/stats
Mods ever loaded 67
Mods failed on kread 0
Mods failed on decompress 0
Mods failed on becoming 83
Mods failed on load 16
Total module size 11464704
Total mod text size 4194304
Failed kread bytes 0
Failed decompress bytes 0
Failed becoming bytes 228959096
Failed kmod bytes 8578080
Virtual mem wasted bytes 237537176
Average mod size 171115
Average mod text size 62602
Avg fail becoming bytes 2758544
Average fail load bytes 536130
Duplicate failed modules:
module-name How-many-times Reason
kvm_intel 7 Becoming
kvm 7 Becoming
irqbypass 6 Becoming & Load
crct10dif_pclmul 7 Becoming & Load
ghash_clmulni_intel 7 Becoming & Load
sha512_ssse3 6 Becoming & Load
sha512_generic 7 Becoming & Load
aesni_intel 7 Becoming
crypto_simd 7 Becoming & Load
cryptd 3 Becoming & Load
evdev 1 Becoming
serio_raw 1 Becoming
nvme 3 Becoming
nvme_core 3 Becoming
t10_pi 3 Becoming
virtio_pci 3 Becoming
crc32_pclmul 6 Becoming & Load
crc64_rocksoft 3 Becoming
crc32c_intel 3 Becoming
virtio_pci_modern_dev 2 Becoming
virtio_pci_legacy_dev 1 Becoming
crc64 2 Becoming
virtio 2 Becoming
virtio_ring 2 Becoming
[0] https://github.com/ColinIanKing/stress-ng.git
[1] echo 0 > /proc/sys/vm/oom_dump_tasks
./stress-ng --module 100 --module-name xfs
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
||
|
|
f71afa6a42 |
module: extract patient module check into helper
The patient module check inside add_unformed_module() is large enough as we need it. It is a bit hard to read too, so just move it to a helper and do the inverse checks first to help shift the code and make it easier to read. The new helper then is module_patient_check_exists(). To make this work we need to mvoe the finished_loading() up, we do that without making any functional changes to that routine. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
430bb0d1c3 |
module: fix kmemleak annotations for non init ELF sections
Commit |
||
|
|
33c951f629 |
module: already_uses() - reduce pr_debug output volume
already_uses() is unnecessarily chatty. `modprobe i915` yields 491 messages like: [ 64.108744] i915 uses drm! This is a normal situation, and isn't worth all the log entries. NOTE: I've preserved the "does not use %s" messages, which happens less often, but does happen. Its not clear to me what it tells a reader, or what info might improve the pr_debug's utility. [ 6847.584999] main:already_uses:569: amdgpu does not use ttm! [ 6847.585001] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585014] main:already_uses:569: amdgpu does not use drm! [ 6847.585016] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585024] main:already_uses:569: amdgpu does not use drm_display_helper! [ 6847.585025] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585084] main:already_uses:569: amdgpu does not use drm_kms_helper! [ 6847.585086] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585175] main:already_uses:569: amdgpu does not use drm_buddy! [ 6847.585176] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585202] main:already_uses:569: amdgpu does not use i2c_algo_bit! [ 6847.585204] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585249] main:already_uses:569: amdgpu does not use gpu_sched! [ 6847.585250] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585314] main:already_uses:569: amdgpu does not use video! [ 6847.585315] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585409] main:already_uses:569: amdgpu does not use iommu_v2! [ 6847.585410] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585816] main:already_uses:569: amdgpu does not use drm_ttm_helper! [ 6847.585818] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6848.762268] dyndbg: add-module: amdgpu.2533 sites no functional changes. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
66a2301edf |
module: add section-size to move_module pr_debug
move_module() pr_debug's "Final section addresses for $modname". Add section addresses to the message, for anyone looking at these. no functional changes. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
b10addf37b |
module: add symbol-name to pr_debug Absolute symbol
The pr_debug("Absolute symbol" ..) reports value, (which is usually
0), but not the name, which is more informative. So add it.
no functional changes
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
||
|
|
6ed81802d4 |
module: in layout_sections, move_module: add the modname
layout_sections() and move_module() each issue ~50 messages for each module loaded. Add mod-name into their 2 header lines, to help the reader find his module. no functional changes. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
3d40bb903e |
module: merge remnants of setup_load_info() to elf validation
The setup_load_info() was actually had ELF validation checks of its own. To later cache useful variables as an secondary step just means looping again over the ELF sections we just validated. We can simply keep tabs of the key sections of interest as we validate the module ELF section in one swoop, so do that and merge the two routines together. Expand a bit on the documentation / intent / goals. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
1bb49db991 |
module: move more elf validity checks to elf_validity_check()
The symbol and strings section validation currently happen in setup_load_info() but since they are also doing validity checks move this to elf_validity_check(). Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
c7ee8aebf6 |
module: add stop-grap sanity check on module memcpy()
The integrity of the struct module we load is important, and although our ELF validator already checks that the module section must match struct module, add a stop-gap check before we memcpy() the final minted module. This also makes those inspecting the code what the goal is. While at it, clarify the goal behind updating the sh_addr address. The current comment is pretty misleading. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
46752820f9 |
module: add sanity check for ELF module section
The ELF ".gnu.linkonce.this_module" section is special, it is what we use to construct the struct module __this_module, which THIS_MODULE points to. When userspace loads a module we always deal first with a copy of the userspace buffer, and twiddle with the userspace copy's version of the struct module. Eventually we allocate memory to do a memcpy() of that struct module, under the assumption that the module size is right. But we have no validity checks against the size or the requirements for the section. Add some validity checks for the special module section early and while at it, cache the module section index early, so we don't have to do that later. While at it, just move over the assigment of the info->mod to make the code clearer. The validity checker also adds an explicit size check to ensure the module section size matches the kernel's run time size for sizeof(struct module). This should prevent sloppy loads of modules which are built today *without* actually increasing the size of the struct module. A developer today can for example expand the size of struct module, rebuild a directoroy 'make fs/xfs/' for example and then try to insmode the driver there. That module would in effect have an incorrect size. This new size check would put a stop gap against such mistakes. This also makes the entire goal of ".gnu.linkonce.this_module" pretty clear. Before this patch verification of the goal / intent required some Indian Jones whips, torches and cleaning up big old spider webs. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
419e1a20f7 |
module: rename check_module_license_and_versions() to check_export_symbol_versions()
This makes the routine easier to understand what the check its checking for. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
72f08b3cc6 |
module: converge taint work together
Converge on a compromise: so long as we have a module hit our linked list of modules we taint. That is, the module was about to become live. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
c3bbf62ebf |
module: move signature taint to module_augment_kernel_taints()
Just move the signature taint into the helper: module_augment_kernel_taints() Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
a12b94511c |
module: move tainting until after a module hits our linked list
It is silly to have taints spread out all over, we can just compromise and add them if the module ever hit our linked list. Our sanity checkers should just prevent crappy drivers / bogus ELF modules / etc and kconfig options should be enough to let you *not* load things you don't want. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
437c1f9cc6 |
module: split taint adding with info checking
check_modinfo() actually does two things:
a) sanity checks, some of which are fatal, and so we
prevent the user from completing trying to load a module
b) taints the kernel
The taints are pretty heavy handed because we're tainting the kernel
*before* we ever even get to load the module into the modules linked
list. That is, it it can fail for other reasons later as we review the
module's structure.
But this commit makes no functional changes, it just makes the intent
clearer and splits the code up where needed to make that happen.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
||
|
|
ed52cabecb |
module: split taint work out of check_modinfo_livepatch()
The work to taint the kernel due to a module should be split up eventually. To aid with this, split up the tainting on check_modinfo_livepatch(). This let's us bring more early checks together which do return a value, and makes changes easier to read later where we stuff all the work to do the taints in one single routine. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
ad8d3a36e9 |
module: rename set_license() to module_license_taint_check()
The set_license() routine would seem to a reader to do some sort of setting, but it does not. It just adds a taint if the license is not set or proprietary. This makes what the code is doing clearer, so much we can remove the comment about it. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
02da2cbab4 |
module: move check_modinfo() early to early_mod_check()
This moves check_modinfo() to early_mod_check(). This doesn't make any functional changes either, as check_modinfo() was the first call on layout_and_allocate(), so we're just moving it back one routine and at the end. This let's us keep separate the checkers from the allocator. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
85e6f61c13 |
module: move early sanity checks into a helper
Move early sanity checkers for the module into a helper. This let's us make it clear when we are working with the local copy of the module prior to allocation. This produces no functional changes, it just makes subsequent changes easier to read. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
1e68417235 |
module: add a for_each_modinfo_entry()
Add a for_each_modinfo_entry() to make it easier to read and use. This produces no functional changes but makes this code easiert to read as we are used to with loops in the kernel and trims more lines of code. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
feb5b784a2 |
module: rename next_string() to module_next_tag_pair()
This makes it clearer what it is doing. While at it, make it available to other code other than main.c. This will be used in the subsequent patch and make the changes easier to read. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
b66973b82d |
module: move get_modinfo() helpers all above
Instead of forward declaring routines for get_modinfo() just move everything up. This makes no functional changes. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
|
|
7deabd6749 |
dyndbg: use the module notifier callbacks
Bring dynamic debug in line with other subsystems by using the module notifier callbacks. This results in a net decrease in core module code. Additionally, Jim Cromie has a new dynamic debug classmap feature, which requires that jump labels be initialized prior to dynamic debug. Specifically, the new feature toggles a jump label from the existing dynamic_debug_setup() function. However, this does not currently work properly, because jump labels are initialized via the 'module_notify_list' notifier chain, which is invoked after the current call to dynamic_debug_setup(). Thus, this patch ensures that jump labels are initialized prior to dynamic debug by setting the dynamic debug notifier priority to 0, while jump labels have the higher priority of 1. Tested by Jim using his new test case, and I've verfied the correct printing via: # modprobe test_dynamic_debug dyndbg. Link: https://lore.kernel.org/lkml/20230113193016.749791-21-jim.cromie@gmail.com/ Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202302190427.9iIK2NfJ-lkp@intel.com/ Tested-by: Jim Cromie <jim.cromie@gmail.com> Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> CC: Jim Cromie <jim.cromie@gmail.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |