mirror of
				git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
				synced 2025-09-04 20:19:47 +08:00 
			
		
		
		
	
			
				
					
						
					
					182c7bbfbf
				
			
			
		
	
	
		
			136 Commits
		
	
	
	| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|  Andrea Righi | a419beac4a | module/decompress: use vmalloc() for zstd decompression workspace Using kmalloc() to allocate the decompression workspace for zstd may
trigger the following warning when large modules are loaded (i.e., xfs):
[    2.961884] WARNING: CPU: 1 PID: 254 at mm/page_alloc.c:4453 __alloc_pages+0x2c3/0x350
...
[    2.989033] Call Trace:
[    2.989841]  <TASK>
[    2.990614]  ? show_regs+0x6d/0x80
[    2.991573]  ? __warn+0x89/0x160
[    2.992485]  ? __alloc_pages+0x2c3/0x350
[    2.993520]  ? report_bug+0x17e/0x1b0
[    2.994506]  ? handle_bug+0x51/0xa0
[    2.995474]  ? exc_invalid_op+0x18/0x80
[    2.996469]  ? asm_exc_invalid_op+0x1b/0x20
[    2.997530]  ? module_zstd_decompress+0xdc/0x2a0
[    2.998665]  ? __alloc_pages+0x2c3/0x350
[    2.999695]  ? module_zstd_decompress+0xdc/0x2a0
[    3.000821]  __kmalloc_large_node+0x7a/0x150
[    3.001920]  __kmalloc+0xdb/0x170
[    3.002824]  module_zstd_decompress+0xdc/0x2a0
[    3.003857]  module_decompress+0x37/0xc0
[    3.004688]  init_module_from_file+0xd0/0x100
[    3.005668]  idempotent_init_module+0x11c/0x2b0
[    3.006632]  __x64_sys_finit_module+0x64/0xd0
[    3.007568]  do_syscall_64+0x59/0x90
[    3.008373]  ? ksys_read+0x73/0x100
[    3.009395]  ? exit_to_user_mode_prepare+0x30/0xb0
[    3.010531]  ? syscall_exit_to_user_mode+0x37/0x60
[    3.011662]  ? do_syscall_64+0x68/0x90
[    3.012511]  ? do_syscall_64+0x68/0x90
[    3.013364]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
However, continuous physical memory does not seem to be required in
module_zstd_decompress(), so use vmalloc() instead, to prevent the
warning and avoid potential failures at loading compressed modules.
Fixes:  | ||
|  James Morse | 2abcc4b5a6 | module: Expose module_init_layout_section() module_init_layout_section() choses whether the core module loader
considers a section as init or not. This affects the placement of the
exit section when module unloading is disabled. This code will never run,
so it can be free()d once the module has been initialised.
arm and arm64 need to count the number of PLTs they need before applying
relocations based on the section name. The init PLTs are stored separately
so they can be free()d. arm and arm64 both use within_module_init() to
decide which list of PLTs to use when applying the relocation.
Because within_module_init()'s behaviour changes when module unloading
is disabled, both architecture would need to take this into account when
counting the PLTs.
Today neither architecture does this, meaning when module unloading is
disabled there are insufficient PLTs in the init section to load some
modules, resulting in warnings:
| WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc
| Modules linked in: crct10dif_common
| CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : module_emit_plt_entry+0x184/0x1cc
| lr : module_emit_plt_entry+0x94/0x1cc
| sp : ffffffc0803bba60
[...]
| Call trace:
|  module_emit_plt_entry+0x184/0x1cc
|  apply_relocate_add+0x2bc/0x8e4
|  load_module+0xe34/0x1bd4
|  init_module_from_file+0x84/0xc0
|  __arm64_sys_finit_module+0x1b8/0x27c
|  invoke_syscall.constprop.0+0x5c/0x104
|  do_el0_svc+0x58/0x160
|  el0_svc+0x38/0x110
|  el0t_64_sync_handler+0xc0/0xc4
|  el0t_64_sync+0x190/0x194
Instead of duplicating module_init_layout_section()s logic, expose it.
Reported-by: Adam Johnston <adam.johnston@arm.com>
Fixes:  | ||
|  Christoph Hellwig | 9011e49d54 | modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules It has recently come to my attention that nvidia is circumventing the protection added in | ||
|  Palmer Dabbelt | ff09f6fd29 | modpost, kallsyms: Treat add '$'-prefixed symbols as mapping symbols Trying to restrict the '$'-prefix change to RISC-V caused some fallout,
so let's just treat all those symbols as special.
Fixes:  | ||
|  Palmer Dabbelt | c05780ef3c | module: Ignore RISC-V mapping symbols too RISC-V has an extended form of mapping symbols that we use to encode the ISA when it changes in the middle of an ELF. This trips up modpost as a build failure, I haven't yet verified it yet but I believe the kallsyms difference should result in stacks looking sane again. Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/all/9d9e2902-5489-4bf0-d9cb-556c8e5d71c2@infradead.org/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Linus Torvalds | f196220715 | module: fix init_module_from_file() error handling Vegard Nossum pointed out two different problems with the error handling
in init_module_from_file():
 (a) the idempotent loading code didn't clean up properly in some error
     cases, leaving the on-stack 'struct idempotent' element still in
     the hash table
 (b) failure to read the module file would nonsensically update the
     'invalid_kread_bytes' stat counter with the error value
The first error is quite nasty, in that it can then cause subsequent
idempotent loads of that same file to access stale stack contents of the
previous failure.  The case may not happen in any normal situation
(explaining all the "Tested-by's on the original change), and requires
admin privileges, but syzkaller triggers random bad behavior as a
result:
    BUG: soft lockup in sys_finit_module
    BUG: unable to handle kernel paging request in init_module_from_file
    general protection fault in init_module_from_file
    INFO: task hung in init_module_from_file
    KASAN: out-of-bounds Read in init_module_from_file
    KASAN: slab-out-of-bounds Read in init_module_from_file
    ...
The second error is fairly benign and just leads to nonsensical stats
(and has been around since the debug stats were added).
Vegard also provided a patch for the idempotent loading issue, but I'd
rather re-organize the code and make it more legible using another level
of helper functions than add the usual "goto out" error handling.
Link: https://lore.kernel.org/lkml/20230704100852.23452-1-vegard.nossum@oracle.com/
Fixes:  | ||
|  Linus Torvalds | ad2885979e | Kbuild updates for v6.5 - Remove the deprecated rule to build *.dtbo from *.dts
 
  - Refactor section mismatch detection in modpost
 
  - Fix bogus ARM section mismatch detections
 
  - Fix error of 'make gtags' with O= option
 
  - Add Clang's target triple to KBUILD_CPPFLAGS to fix a build error with
    the latest LLVM version
 
  - Rebuild the built-in initrd when KBUILD_BUILD_TIMESTAMP is changed
 
  - Ignore more compiler-generated symbols for kallsyms
 
  - Fix 'make local*config' to handle the ${CONFIG_FOO} form in Makefiles
 
  - Enable more kernel-doc warnings with W=2
 
  - Refactor <linux/export.h> by generating KSYMTAB data by modpost
 
  - Deprecate <asm/export.h> and <asm-generic/export.h>
 
  - Remove the EXPORT_DATA_SYMBOL macro
 
  - Move the check for static EXPORT_SYMBOL back to modpost, which makes
    the build faster
 
  - Re-implement CONFIG_TRIM_UNUSED_KSYMS with one-pass algorithm
 
  - Warn missing MODULE_DESCRIPTION when building modules with W=1
 
  - Make 'make clean' robust against too long argument error
 
  - Exclude more objects from GCOV to fix CFI failures with GCOV
 
  - Allow 'make modules_install' to install modules.builtin and
    modules.builtin.modinfo even when CONFIG_MODULES is disabled
 
  - Include modules.builtin and modules.builtin.modinfo in the linux-image
    Debian package even when CONFIG_MODULES is disabled
 
  - Revive "Entering directory" logging for the latest Make version
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmSf6B0VHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGS2wP/1izzNJ/64XmQoyBDhZCbuOl7ODF
 n4wgVJnsJmRnD/RxXR/AZ0JZwQHhzpGISWQM61rVIf/RVFOB7Apx1HpmomKUUjrL
 Yc53wLfhTEizGgwttP6tusLM3RO6jkuMKhjC4rllc0tDLJ3zCcwAjSyiOQQ9PBcH
 txwAb8r4/TZUzDDCJ0d98WdhIsNDca/ISeRXKHMiIkfvHe+6yizDKu25Y4B6BL5g
 0VPJ9nVJZ+XVwRqdVR+UQoPYGZzZ/O2NqAtU7n4PpBKvFfLACILJW+aBDAz9SqN7
 RSxn1ahxwq0vrhlB9bSrQRj3N0g8zsi7/xShEZSnGLCbyxYilr5Gq8C59+QxOIJf
 5lGBwZlEgn5aWH+D9abwjEI/QOQbTI9kX09sVzweulGCN9iJlJqyIGsB0Ri0/S2R
 c/n7c8nLwnWnGF/+LXYvkrak8L9YRKori//YYf9zdvh4h1c2/0SS0nDoC29DhDru
 Am7YmhBAkJXXX3NUB2gLvtdp94GSumqefHeSJ5Sp9v/+f2Ft7ruY2ouJC81xDa4p
 nNpvolAq2txlZ9t5OU7x7DQiuCWYSws0W7PJ9FBhyHJchf21UHbcm97/HfDoU8rN
 ioLQGm+h+g6oZt8pArk45wccjkR3ydpEFDWenYbTEr2o3zLfeKigZps5uhCK3DW2
 gnVk50VNagkzrzvA
 =Rc1z
 -----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
 - Remove the deprecated rule to build *.dtbo from *.dts
 - Refactor section mismatch detection in modpost
 - Fix bogus ARM section mismatch detections
 - Fix error of 'make gtags' with O= option
 - Add Clang's target triple to KBUILD_CPPFLAGS to fix a build error
   with the latest LLVM version
 - Rebuild the built-in initrd when KBUILD_BUILD_TIMESTAMP is changed
 - Ignore more compiler-generated symbols for kallsyms
 - Fix 'make local*config' to handle the ${CONFIG_FOO} form in Makefiles
 - Enable more kernel-doc warnings with W=2
 - Refactor <linux/export.h> by generating KSYMTAB data by modpost
 - Deprecate <asm/export.h> and <asm-generic/export.h>
 - Remove the EXPORT_DATA_SYMBOL macro
 - Move the check for static EXPORT_SYMBOL back to modpost, which makes
   the build faster
 - Re-implement CONFIG_TRIM_UNUSED_KSYMS with one-pass algorithm
 - Warn missing MODULE_DESCRIPTION when building modules with W=1
 - Make 'make clean' robust against too long argument error
 - Exclude more objects from GCOV to fix CFI failures with GCOV
 - Allow 'make modules_install' to install modules.builtin and
   modules.builtin.modinfo even when CONFIG_MODULES is disabled
 - Include modules.builtin and modules.builtin.modinfo in the
   linux-image Debian package even when CONFIG_MODULES is disabled
 - Revive "Entering directory" logging for the latest Make version
* tag 'kbuild-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (72 commits)
  modpost: define more R_ARM_* for old distributions
  kbuild: revive "Entering directory" for Make >= 4.4.1
  kbuild: set correct abs_srctree and abs_objtree for package builds
  scripts/mksysmap: Ignore prefixed KCFI symbols
  kbuild: deb-pkg: remove the CONFIG_MODULES check in buildeb
  kbuild: builddeb: always make modules_install, to install modules.builtin*
  modpost: continue even with unknown relocation type
  modpost: factor out Elf_Sym pointer calculation to section_rel()
  modpost: factor out inst location calculation to section_rel()
  kbuild: Disable GCOV for *.mod.o
  kbuild: Fix CFI failures with GCOV
  kbuild: make clean rule robust against too long argument error
  script: modpost: emit a warning when the description is missing
  kbuild: make modules_install copy modules.builtin(.modinfo)
  linux/export.h: rename 'sec' argument to 'license'
  modpost: show offset from symbol for section mismatch warnings
  modpost: merge two similar section mismatch warnings
  kbuild: implement CONFIG_TRIM_UNUSED_KSYMS without recursion
  modpost: use null string instead of NULL pointer for default namespace
  modpost: squash sym_update_namespace() into sym_add_exported()
  ... | ||
|  Linus Torvalds | 4e3c09e954 | v6.5-rc1-modules-next The changes queued up for v6.5-rc1 for modules are pretty tame, mostly code removal of moving of code. Only two minor functional changes are made, the only one which stands out is Sebastian Andrzej Siewior's simplification of module reference counting by removing preempt_disable() and that has been tested on linux-next for well over a month without no regressions. I'm now, I guess, also a kitchen sink for some kallsyms changes. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmScfl0SHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoin8oMP/0DQK+r3BZimknzz6rF0EBStNZ/dIK2W 1Q/r/ER4VKQKYxklc1M74K+7IX8ZDCYxqlaDS9lvAkRDWNC+t69aNZEib2odJleC p6WB30P0JIwfZZC0DS/ct3vrWZTyUhw7aOtvABRmjBfiJ3lFlU092Glvk1w1aFbD UrNRomPu4CujzfmnGj3VGc+HVSOEK0F1/GLm9ClrsR8SzKEpQmH4ALI/ON69B0ea PmL+d1Wyt6WEoH0hlV1TOXNdHUb3ZO1riSSfDYQ7TiG2AM5w1t4n26YRusc16hYU 6Bx4OGt52ZJYR3btsRQlcylF4R5DUo+boDkM0NqEDU/3ciGMg6DgKdHnYCBN1w+X ZO8aXK1MIgF7W6CqSz+8HCsu5CuCos55FgM22dPbpZr3OEFCWemqnV+cYCu1DA+M Gbnn883ZLtt+R+qikD3135s+LxYIvxSuQrj+B3ZoQeIKEtAlyxuhrUJbU0tOns0j 05PrkI8J1FtIysdlNZeIFg752IPtjp/0QNB4R46m40mT16L0TSjEP7c+zcPDryMb 84SdLqh1gis0QZRkoH6JbMBDeT2dtuxqtQ5dTPka4s1mtg3SvRYr53sCJg+gQ8e2 CBW6jgrIf3F4RIMMiSfXpSf4yVVxXxJAEFnGLRXhQ2HkUnk3mdGEfsZc7ucrsnlK f/KwaEzmLD9c =gjKD -----END PGP SIGNATURE----- Merge tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module updates from Luis Chamberlain: "The changes queued up for modules are pretty tame, mostly code removal of moving of code. Only two minor functional changes are made, the only one which stands out is Sebastian Andrzej Siewior's simplification of module reference counting by removing preempt_disable() and that has been tested on linux-next for well over a month without no regressions. I'm now, I guess, also a kitchen sink for some kallsyms changes" [ There was a mis-communication about the concurrent module load changes that I had expected to come through Luis despite me authoring the patch. So some of the module updates were left hanging in the email ether, and I just committed them separately. It's my bad - I should have made it more clear that I expected my own patches to come through the module tree too. Now they missed linux-next, but hopefully that won't cause any issues - Linus ] * tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: kallsyms: make kallsyms_show_value() as generic function kallsyms: move kallsyms_show_value() out of kallsyms.c kallsyms: remove unsed API lookup_symbol_attrs kallsyms: remove unused arch_get_kallsym() helper module: Remove preempt_disable() from module reference counting. | ||
|  Linus Torvalds | 9b9879fc03 | modules: catch concurrent module loads, treat them as idempotent This is the new-and-improved attempt at avoiding huge memory load spikes
when the user space boot sequence tries to load hundreds (or even
thousands) of redundant duplicate modules in parallel.
See commit  | ||
|  Linus Torvalds | 054a73009c | module: split up 'finit_module()' into init_module_from_file() helper This will simplify the next step, where we can then key off the inode to do one idempotent module load. Let's do the obvious re-organization in one step, and then the new code in another. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | ||
|  Masahiro Yamada | ddb5cdbafa | kbuild: generate KSYMTAB entries by modpost Commit | ||
|  Lucas De Marchi | fadb74f9f2 | module/decompress: Fix error checking on zstd decompression While implementing support for in-kernel decompression in kmod,
finit_module() was returning a very suspicious value:
	finit_module(3, "", MODULE_INIT_COMPRESSED_FILE) = 18446744072717407296
It turns out the check for module_get_next_page() failing is wrong,
and hence the decompression was not really taking place. Invert
the condition to fix it.
Fixes:  | ||
|  Song Liu | db3e33dd8b | module: fix module load for ia64 Frank reported boot regression in ia64 as: ELILO v3.16 for EFI/IA-64 .. Uncompressing Linux... done Loading file AC100221.initrd.img...done [ 0.000000] Linux version 6.4.0-rc3 (root@x4270) (ia64-linux-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39) #1 SMP Thu May 25 15:52:20 CEST 2023 [ 0.000000] efi: EFI v1.1 by HP [ 0.000000] efi: SALsystab=0x3ee7a000 ACPI 2.0=0x3fe2a000 ESI=0x3ee7b000 SMBIOS=0x3ee7c000 HCDP=0x3fe28000 [ 0.000000] PCDP: v3 at 0x3fe28000 [ 0.000000] earlycon: uart8250 at MMIO 0x00000000f4050000 (options '9600n8') [ 0.000000] printk: bootconsole [uart8250] enabled [ 0.000000] ACPI: Early table checksum verification disabled [ 0.000000] ACPI: RSDP 0x000000003FE2A000 000028 (v02 HP ) [ 0.000000] ACPI: XSDT 0x000000003FE2A02C 0000CC (v01 HP rx2620 00000000 HP 00000000) [...] [ 3.793350] Run /init as init process Loading, please wait... Starting systemd-udevd version 252.6-1 [ 3.951100] ------------[ cut here ]------------ [ 3.951100] WARNING: CPU: 6 PID: 140 at kernel/module/main.c:1547 __layout_sections+0x370/0x3c0 [ 3.949512] Unable to handle kernel paging request at virtual address 1000000000000000 [ 3.951100] Modules linked in: [ 3.951100] CPU: 6 PID: 140 Comm: (udev-worker) Not tainted 6.4.0-rc3 #1 [ 3.956161] (udev-worker)[142]: Oops 11003706212352 [1] [ 3.951774] Hardware name: hp server rx2620 , BIOS 04.29 11/30/2007 [ 3.951774] [ 3.951774] Call Trace: [ 3.958339] Unable to handle kernel paging request at virtual address 1000000000000000 [ 3.956161] Modules linked in: [ 3.951774] [<a0000001000156d0>] show_stack.part.0+0x30/0x60 [ 3.951774] sp=e000000183a67b20 bsp=e000000183a61628 [ 3.956161] [ 3.956161] which bisect to module_memory change [1]. Debug showed that ia64 uses some special sections: __layout_sections: section .got (sh_flags 10000002) matched to MOD_INVALID __layout_sections: section .sdata (sh_flags 10000003) matched to MOD_INVALID __layout_sections: section .sbss (sh_flags 10000003) matched to MOD_INVALID All these sections are loaded to module core memory before [1]. Fix ia64 boot by loading these sections to MOD_DATA (core rw data). [1] commit | ||
|  Maninder Singh | 4f521bab5b | kallsyms: remove unsed API lookup_symbol_attrs with commit '7878c231dae0 ("slab: remove /proc/slab_allocators")'
lookup_symbol_attrs usage is removed.
Thus removing redundant API.
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Sebastian Andrzej Siewior | cb0b50b813 | module: Remove preempt_disable() from module reference counting. The preempt_disable() section in module_put() was added in commit | ||
|  Harshit Mogalapalli | d36f6efbe0 | module: Fix use-after-free bug in read_file_mod_stats() Smatch warns:
	kernel/module/stats.c:394 read_file_mod_stats()
	warn: passing freed memory 'buf'
We are passing 'buf' to simple_read_from_buffer() after freeing it.
Fix this by changing the order of 'simple_read_from_buffer' and 'kfree'.
Fixes:  | ||
|  Arnd Bergmann | 0b891c83d8 | module: include internal.h in module/dups.c Two newly introduced functions are declared in a header that is not
included before the definition, causing a warning with sparse or
'make W=1':
kernel/module/dups.c:118:6: error: no previous prototype for 'kmod_dup_request_exists_wait' [-Werror=missing-prototypes]
  118 | bool kmod_dup_request_exists_wait(char *module_name, bool wait, int *dup_ret)
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/module/dups.c:220:6: error: no previous prototype for 'kmod_dup_request_announce' [-Werror=missing-prototypes]
  220 | void kmod_dup_request_announce(char *module_name, int ret)
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~
Add an explicit include to ensure the prototypes match.
Fixes:  | ||
|  Linus Torvalds | b6a7828502 | modules-6.4-rc1 The summary of the changes for this pull requests is:
 
  * Song Liu's new struct module_memory replacement
  * Nick Alcock's MODULE_LICENSE() removal for non-modules
  * My cleanups and enhancements to reduce the areas where we vmalloc
    module memory for duplicates, and the respective debug code which
    proves the remaining vmalloc pressure comes from userspace.
 
 Most of the changes have been in linux-next for quite some time except
 the minor fixes I made to check if a module was already loaded
 prior to allocating the final module memory with vmalloc and the
 respective debug code it introduces to help clarify the issue. Although
 the functional change is small it is rather safe as it can only *help*
 reduce vmalloc space for duplicates and is confirmed to fix a bootup
 issue with over 400 CPUs with KASAN enabled. I don't expect stable
 kernels to pick up that fix as the cleanups would have also had to have
 been picked up. Folks on larger CPU systems with modules will want to
 just upgrade if vmalloc space has been an issue on bootup.
 
 Given the size of this request, here's some more elaborate details
 on this pull request.
 
 The functional change change in this pull request is the very first
 patch from Song Liu which replaces the struct module_layout with a new
 struct module memory. The old data structure tried to put together all
 types of supported module memory types in one data structure, the new
 one abstracts the differences in memory types in a module to allow each
 one to provide their own set of details. This paves the way in the
 future so we can deal with them in a cleaner way. If you look at changes
 they also provide a nice cleanup of how we handle these different memory
 areas in a module. This change has been in linux-next since before the
 merge window opened for v6.3 so to provide more than a full kernel cycle
 of testing. It's a good thing as quite a bit of fixes have been found
 for it.
 
 Jason Baron then made dynamic debug a first class citizen module user by
 using module notifier callbacks to allocate / remove module specific
 dynamic debug information.
 
 Nick Alcock has done quite a bit of work cross-tree to remove module
 license tags from things which cannot possibly be module at my request
 so to:
 
   a) help him with his longer term tooling goals which require a
      deterministic evaluation if a piece a symbol code could ever be
      part of a module or not. But quite recently it is has been made
      clear that tooling is not the only one that would benefit.
      Disambiguating symbols also helps efforts such as live patching,
      kprobes and BPF, but for other reasons and R&D on this area
      is active with no clear solution in sight.
 
   b) help us inch closer to the now generally accepted long term goal
      of automating all the MODULE_LICENSE() tags from SPDX license tags
 
 In so far as a) is concerned, although module license tags are a no-op
 for non-modules, tools which would want create a mapping of possible
 modules can only rely on the module license tag after the commit
  | ||
|  Linus Torvalds | 6e98b09da9 | Networking changes for 6.4. Core
 ----
 
  - Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
    default value allows for better BIG TCP performances.
 
  - Reduce compound page head access for zero-copy data transfers.
 
  - RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when possible.
 
  - Threaded NAPI improvements, adding defer skb free support and unneeded
    softirq avoidance.
 
  - Address dst_entry reference count scalability issues, via false
    sharing avoidance and optimize refcount tracking.
 
  - Add lockless accesses annotation to sk_err[_soft].
 
  - Optimize again the skb struct layout.
 
  - Extends the skb drop reasons to make it usable by multiple
    subsystems.
 
  - Better const qualifier awareness for socket casts.
 
 BPF
 ---
 
  - Add skb and XDP typed dynptrs which allow BPF programs for more
    ergonomic and less brittle iteration through data and variable-sized
    accesses.
 
  - Add a new BPF netfilter program type and minimal support to hook
    BPF programs to netfilter hooks such as prerouting or forward.
 
  - Add more precise memory usage reporting for all BPF map types.
 
  - Adds support for using {FOU,GUE} encap with an ipip device operating
    in collect_md mode and add a set of BPF kfuncs for controlling encap
    params.
 
  - Allow BPF programs to detect at load time whether a particular kfunc
    exists or not, and also add support for this in light skeleton.
 
  - Bigger batch of BPF verifier improvements to prepare for upcoming BPF
    open-coded iterators allowing for less restrictive looping capabilities.
 
  - Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF
    programs to NULL-check before passing such pointers into kfunc.
 
  - Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in
    local storage maps.
 
  - Enable RCU semantics for task BPF kptrs and allow referenced kptr
    tasks to be stored in BPF maps.
 
  - Add support for refcounted local kptrs to the verifier for allowing
    shared ownership, useful for adding a node to both the BPF list and
    rbtree.
 
  - Add BPF verifier support for ST instructions in convert_ctx_access()
    which will help new -mcpu=v4 clang flag to start emitting them.
 
  - Add ARM32 USDT support to libbpf.
 
  - Improve bpftool's visual program dump which produces the control
    flow graph in a DOT format by adding C source inline annotations.
 
 Protocols
 ---------
 
  - IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
    indicates the provenance of the IP address.
 
  - IPv6: optimize route lookup, dropping unneeded R/W lock acquisition.
 
  - Add the handshake upcall mechanism, allowing the user-space
    to implement generic TLS handshake on kernel's behalf.
 
  - Bridge: support per-{Port, VLAN} neighbor suppression, increasing
    resilience to nodes failures.
 
  - SCTP: add support for Fair Capacity and Weighted Fair Queueing
    schedulers.
 
  - MPTCP: delay first subflow allocation up to its first usage. This
    will allow for later better LSM interaction.
 
  - xfrm: Remove inner/outer modes from input/output path. These are
    not needed anymore.
 
  - WiFi:
    - reduced neighbor report (RNR) handling for AP mode
    - HW timestamping support
    - support for randomized auth/deauth TA for PASN privacy
    - per-link debugfs for multi-link
    - TC offload support for mac80211 drivers
    - mac80211 mesh fast-xmit and fast-rx support
    - enable Wi-Fi 7 (EHT) mesh support
 
 Netfilter
 ---------
 
  - Add nf_tables 'brouting' support, to force a packet to be routed
    instead of being bridged.
 
  - Update bridge netfilter and ovs conntrack helpers to handle
    IPv6 Jumbo packets properly, i.e. fetch the packet length
    from hop-by-hop extension header. This is needed for BIT TCP
    support.
 
  - The iptables 32bit compat interface isn't compiled in by default
    anymore.
 
  - Move ip(6)tables builtin icmp matches to the udptcp one.
    This has the advantage that icmp/icmpv6 match doesn't load the
    iptables/ip6tables modules anymore when iptables-nft is used.
 
  - Extended netlink error report for netdevice in flowtables and
    netdev/chains. Allow for incrementally add/delete devices to netdev
    basechain. Allow to create netdev chain without device.
 
 Driver API
 ----------
 
  - Remove redundant Device Control Error Reporting Enable, as PCI core
    has already error reporting enabled at enumeration time.
 
  - Move Multicast DB netlink handlers to core, allowing devices other
    then bridge to use them.
 
  - Allow the page_pool to directly recycle the pages from safely
    localized NAPI.
 
  - Implement lockless TX queue stop/wake combo macros, allowing for
    further code de-duplication and sanitization.
 
  - Add YNL support for user headers and struct attrs.
 
  - Add partial YNL specification for devlink.
 
  - Add partial YNL specification for ethtool.
 
  - Add tc-mqprio and tc-taprio support for preemptible traffic classes.
 
  - Add tx push buf len param to ethtool, specifies the maximum number
    of bytes of a transmitted packet a driver can push directly to the
    underlying device.
 
  - Add basic LED support for switch/phy.
 
  - Add NAPI documentation, stop relaying on external links.
 
  - Convert dsa_master_ioctl() to netdev notifier. This is a preparatory
    work to make the hardware timestamping layer selectable by user
    space.
 
  - Add transceiver support and improve the error messages for CAN-FD
    controllers.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - AMD/Pensando core device support
    - MediaTek MT7981 SoC
    - MediaTek MT7988 SoC
    - Broadcom BCM53134 embedded switch
    - Texas Instruments CPSW9G ethernet switch
    - Qualcomm EMAC3 DWMAC ethernet
    - StarFive JH7110 SoC
    - NXP CBTX ethernet PHY
 
  - WiFi:
    - Apple M1 Pro/Max devices
    - RealTek rtl8710bu/rtl8188gu
    - RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
 
  - Bluetooth:
    - Realtek RTL8821CS, RTL8851B, RTL8852BS
    - Mediatek MT7663, MT7922
    - NXP w8997
    - Actions Semi ATS2851
    - QTI WCN6855
    - Marvell 88W8997
 
  - Can:
    - STMicroelectronics bxcan stm32f429
 
 Drivers
 -------
  - Ethernet NICs:
    - Intel (1G, icg):
      - add tracking and reporting of QBV config errors.
      - add support for configuring max SDU for each Tx queue.
    - Intel (100G, ice):
      - refactor mailbox overflow detection to support Scalable IOV
      - GNSS interface optimization
    - Intel (i40e):
      - support XDP multi-buffer
    - nVidia/Mellanox:
      - add the support for linux bridge multicast offload
      - enable TC offload for egress and engress MACVLAN over bond
      - add support for VxLAN GBP encap/decap flows offload
      - extend packet offload to fully support libreswan
      - support tunnel mode in mlx5 IPsec packet offload
      - extend XDP multi-buffer support
      - support MACsec VLAN offload
      - add support for dynamic msix vectors allocation
      - drop RX page_cache and fully use page_pool
      - implement thermal zone to report NIC temperature
    - Netronome/Corigine:
      - add support for multi-zone conntrack offload
    - Solarflare/Xilinx:
      - support offloading TC VLAN push/pop actions to the MAE
      - support TC decap rules
      - support unicast PTP
 
  - Other NICs:
    - Broadcom (bnxt): enforce software based freq adjustments only
 		on shared PHC NIC
    - RealTek (r8169): refactor to addess ASPM issues during NAPI poll.
    - Micrel (lan8841): add support for PTP_PF_PEROUT
    - Cadence (macb): enable PTP unicast
    - Engleder (tsnep): add XDP socket zero-copy support
    - virtio-net: implement exact header length guest feature
    - veth: add page_pool support for page recycling
    - vxlan: add MDB data path support
    - gve: add XDP support for GQI-QPL format
    - geneve: accept every ethertype
    - macvlan: allow some packets to bypass broadcast queue
    - mana: add support for jumbo frame
 
  - Ethernet high-speed switches:
    - Microchip (sparx5): Add support for TC flower templates.
 
  - Ethernet embedded switches:
    - Broadcom (b54):
      - configure 6318 and 63268 RGMII ports
    - Marvell (mv88e6xxx):
      - faster C45 bus scan
    - Microchip:
      - lan966x:
        - add support for IS1 VCAP
        - better TX/RX from/to CPU performances
      - ksz9477: add ETS Qdisc support
      - ksz8: enhance static MAC table operations and error handling
      - sama7g5: add PTP capability
    - NXP (ocelot):
      - add support for external ports
      - add support for preemptible traffic classes
    - Texas Instruments:
      - add CPSWxG SGMII support for J7200 and J721E
 
  - Intel WiFi (iwlwifi):
    - preparation for Wi-Fi 7 EHT and multi-link support
    - EHT (Wi-Fi 7) sniffer support
    - hardware timestamping support for some devices/firwmares
    - TX beacon protection on newer hardware
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - MU-MIMO parameters support
    - ack signal support for management packets
 
  - RealTek WiFi (rtw88):
    - SDIO bus support
    - better support for some SDIO devices
      (e.g. MAC address from efuse)
 
  - RealTek WiFi (rtw89):
    - HW scan support for 8852b
    - better support for 6 GHz scanning
    - support for various newer firmware APIs
    - framework firmware backwards compatibility
 
  - MediaTek WiFi (mt76):
    - P2P support
    - mesh A-MSDU support
    - EHT (Wi-Fi 7) support
    - coredump support
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmRI/mUSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkgO0QAJGxpuN67YgYV0BIM+/atWKEEexJYG7B
 9MMpU4jMO3EW/pUS5t7VRsBLUybLYVPmqCZoHodObDfnu59jiPOegb6SikJv/ZwJ
 Zw62PVk5MvDnQjlu4e6kDcGwkplteN08TlgI+a49BUTedpdFitrxHAYGW8f2fRO6
 cK2XSld+ZucMoym5vRwf8yWS1BwdxnslPMxDJ+/8ZbWBZv44qAnG2vMB/kIx7ObC
 Vel/4m6MzTwVsLYBsRvcwMVbNNlZ9GuhztlTzEbfGA4ZhTadIAMgb5VTWXB84Ws7
 Aic5wTdli+q+x6/2cxhbyeoVuB9HHObYmLBAciGg4GNljP5rnQBY3X3+KVZ/x9TI
 HQB7CmhxmAZVrO9pLARFV+ECrMTH2/dy3NyrZ7uYQ3WPOXJi8hJZjOTO/eeEGL7C
 eTjdz0dZBWIBK2gON/6s4nExXVQUTEF2ZsPi52jTTClKjfe5pz/ddeFQIWaY1DTm
 pInEiWPAvd28JyiFmhFNHsuIBCjX/Zqe2JuMfMBeBibDAC09o/OGdKJYUI15AiRf
 F46Pdb7use/puqfrYW44kSAfaPYoBiE+hj1RdeQfen35xD9HVE4vdnLNeuhRlFF9
 aQfyIRHYQofkumRDr5f8JEY66cl9NiKQ4IVW1xxQfYDNdC6wQqREPG1md7rJVMrJ
 vP7ugFnttneg
 =ITVa
 -----END PGP SIGNATURE-----
Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
 "Core:
   - Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
     default value allows for better BIG TCP performances
   - Reduce compound page head access for zero-copy data transfers
   - RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
     possible
   - Threaded NAPI improvements, adding defer skb free support and
     unneeded softirq avoidance
   - Address dst_entry reference count scalability issues, via false
     sharing avoidance and optimize refcount tracking
   - Add lockless accesses annotation to sk_err[_soft]
   - Optimize again the skb struct layout
   - Extends the skb drop reasons to make it usable by multiple
     subsystems
   - Better const qualifier awareness for socket casts
  BPF:
   - Add skb and XDP typed dynptrs which allow BPF programs for more
     ergonomic and less brittle iteration through data and
     variable-sized accesses
   - Add a new BPF netfilter program type and minimal support to hook
     BPF programs to netfilter hooks such as prerouting or forward
   - Add more precise memory usage reporting for all BPF map types
   - Adds support for using {FOU,GUE} encap with an ipip device
     operating in collect_md mode and add a set of BPF kfuncs for
     controlling encap params
   - Allow BPF programs to detect at load time whether a particular
     kfunc exists or not, and also add support for this in light
     skeleton
   - Bigger batch of BPF verifier improvements to prepare for upcoming
     BPF open-coded iterators allowing for less restrictive looping
     capabilities
   - Rework RCU enforcement in the verifier, add kptr_rcu and enforce
     BPF programs to NULL-check before passing such pointers into kfunc
   - Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
     in local storage maps
   - Enable RCU semantics for task BPF kptrs and allow referenced kptr
     tasks to be stored in BPF maps
   - Add support for refcounted local kptrs to the verifier for allowing
     shared ownership, useful for adding a node to both the BPF list and
     rbtree
   - Add BPF verifier support for ST instructions in
     convert_ctx_access() which will help new -mcpu=v4 clang flag to
     start emitting them
   - Add ARM32 USDT support to libbpf
   - Improve bpftool's visual program dump which produces the control
     flow graph in a DOT format by adding C source inline annotations
  Protocols:
   - IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
     indicates the provenance of the IP address
   - IPv6: optimize route lookup, dropping unneeded R/W lock acquisition
   - Add the handshake upcall mechanism, allowing the user-space to
     implement generic TLS handshake on kernel's behalf
   - Bridge: support per-{Port, VLAN} neighbor suppression, increasing
     resilience to nodes failures
   - SCTP: add support for Fair Capacity and Weighted Fair Queueing
     schedulers
   - MPTCP: delay first subflow allocation up to its first usage. This
     will allow for later better LSM interaction
   - xfrm: Remove inner/outer modes from input/output path. These are
     not needed anymore
   - WiFi:
      - reduced neighbor report (RNR) handling for AP mode
      - HW timestamping support
      - support for randomized auth/deauth TA for PASN privacy
      - per-link debugfs for multi-link
      - TC offload support for mac80211 drivers
      - mac80211 mesh fast-xmit and fast-rx support
      - enable Wi-Fi 7 (EHT) mesh support
  Netfilter:
   - Add nf_tables 'brouting' support, to force a packet to be routed
     instead of being bridged
   - Update bridge netfilter and ovs conntrack helpers to handle IPv6
     Jumbo packets properly, i.e. fetch the packet length from
     hop-by-hop extension header. This is needed for BIT TCP support
   - The iptables 32bit compat interface isn't compiled in by default
     anymore
   - Move ip(6)tables builtin icmp matches to the udptcp one. This has
     the advantage that icmp/icmpv6 match doesn't load the
     iptables/ip6tables modules anymore when iptables-nft is used
   - Extended netlink error report for netdevice in flowtables and
     netdev/chains. Allow for incrementally add/delete devices to netdev
     basechain. Allow to create netdev chain without device
  Driver API:
   - Remove redundant Device Control Error Reporting Enable, as PCI core
     has already error reporting enabled at enumeration time
   - Move Multicast DB netlink handlers to core, allowing devices other
     then bridge to use them
   - Allow the page_pool to directly recycle the pages from safely
     localized NAPI
   - Implement lockless TX queue stop/wake combo macros, allowing for
     further code de-duplication and sanitization
   - Add YNL support for user headers and struct attrs
   - Add partial YNL specification for devlink
   - Add partial YNL specification for ethtool
   - Add tc-mqprio and tc-taprio support for preemptible traffic classes
   - Add tx push buf len param to ethtool, specifies the maximum number
     of bytes of a transmitted packet a driver can push directly to the
     underlying device
   - Add basic LED support for switch/phy
   - Add NAPI documentation, stop relaying on external links
   - Convert dsa_master_ioctl() to netdev notifier. This is a
     preparatory work to make the hardware timestamping layer selectable
     by user space
   - Add transceiver support and improve the error messages for CAN-FD
     controllers
  New hardware / drivers:
   - Ethernet:
      - AMD/Pensando core device support
      - MediaTek MT7981 SoC
      - MediaTek MT7988 SoC
      - Broadcom BCM53134 embedded switch
      - Texas Instruments CPSW9G ethernet switch
      - Qualcomm EMAC3 DWMAC ethernet
      - StarFive JH7110 SoC
      - NXP CBTX ethernet PHY
   - WiFi:
      - Apple M1 Pro/Max devices
      - RealTek rtl8710bu/rtl8188gu
      - RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
   - Bluetooth:
      - Realtek RTL8821CS, RTL8851B, RTL8852BS
      - Mediatek MT7663, MT7922
      - NXP w8997
      - Actions Semi ATS2851
      - QTI WCN6855
      - Marvell 88W8997
   - Can:
      - STMicroelectronics bxcan stm32f429
  Drivers:
   - Ethernet NICs:
      - Intel (1G, icg):
         - add tracking and reporting of QBV config errors
         - add support for configuring max SDU for each Tx queue
      - Intel (100G, ice):
         - refactor mailbox overflow detection to support Scalable IOV
         - GNSS interface optimization
      - Intel (i40e):
         - support XDP multi-buffer
      - nVidia/Mellanox:
         - add the support for linux bridge multicast offload
         - enable TC offload for egress and engress MACVLAN over bond
         - add support for VxLAN GBP encap/decap flows offload
         - extend packet offload to fully support libreswan
         - support tunnel mode in mlx5 IPsec packet offload
         - extend XDP multi-buffer support
         - support MACsec VLAN offload
         - add support for dynamic msix vectors allocation
         - drop RX page_cache and fully use page_pool
         - implement thermal zone to report NIC temperature
      - Netronome/Corigine:
         - add support for multi-zone conntrack offload
      - Solarflare/Xilinx:
         - support offloading TC VLAN push/pop actions to the MAE
         - support TC decap rules
         - support unicast PTP
   - Other NICs:
      - Broadcom (bnxt): enforce software based freq adjustments only on
        shared PHC NIC
      - RealTek (r8169): refactor to addess ASPM issues during NAPI poll
      - Micrel (lan8841): add support for PTP_PF_PEROUT
      - Cadence (macb): enable PTP unicast
      - Engleder (tsnep): add XDP socket zero-copy support
      - virtio-net: implement exact header length guest feature
      - veth: add page_pool support for page recycling
      - vxlan: add MDB data path support
      - gve: add XDP support for GQI-QPL format
      - geneve: accept every ethertype
      - macvlan: allow some packets to bypass broadcast queue
      - mana: add support for jumbo frame
   - Ethernet high-speed switches:
      - Microchip (sparx5): Add support for TC flower templates
   - Ethernet embedded switches:
      - Broadcom (b54):
         - configure 6318 and 63268 RGMII ports
      - Marvell (mv88e6xxx):
         - faster C45 bus scan
      - Microchip:
         - lan966x:
            - add support for IS1 VCAP
            - better TX/RX from/to CPU performances
         - ksz9477: add ETS Qdisc support
         - ksz8: enhance static MAC table operations and error handling
         - sama7g5: add PTP capability
      - NXP (ocelot):
         - add support for external ports
         - add support for preemptible traffic classes
      - Texas Instruments:
         - add CPSWxG SGMII support for J7200 and J721E
   - Intel WiFi (iwlwifi):
      - preparation for Wi-Fi 7 EHT and multi-link support
      - EHT (Wi-Fi 7) sniffer support
      - hardware timestamping support for some devices/firwmares
      - TX beacon protection on newer hardware
   - Qualcomm 802.11ax WiFi (ath11k):
      - MU-MIMO parameters support
      - ack signal support for management packets
   - RealTek WiFi (rtw88):
      - SDIO bus support
      - better support for some SDIO devices (e.g. MAC address from
        efuse)
   - RealTek WiFi (rtw89):
      - HW scan support for 8852b
      - better support for 6 GHz scanning
      - support for various newer firmware APIs
      - framework firmware backwards compatibility
   - MediaTek WiFi (mt76):
      - P2P support
      - mesh A-MSDU support
      - EHT (Wi-Fi 7) support
      - coredump support"
* tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
  net: phy: hide the PHYLIB_LEDS knob
  net: phy: marvell-88x2222: remove unnecessary (void*) conversions
  tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
  net: amd: Fix link leak when verifying config failed
  net: phy: marvell: Fix inconsistent indenting in led_blink_set
  lan966x: Don't use xdp_frame when action is XDP_TX
  tsnep: Add XDP socket zero-copy TX support
  tsnep: Add XDP socket zero-copy RX support
  tsnep: Move skb receive action to separate function
  tsnep: Add functions for queue enable/disable
  tsnep: Rework TX/RX queue initialization
  tsnep: Replace modulo operation with mask
  net: phy: dp83867: Add led_brightness_set support
  net: phy: Fix reading LED reg property
  drivers: nfc: nfcsim: remove return value check of `dev_dir`
  net: phy: dp83867: Remove unnecessary (void*) conversions
  net: ethtool: coalesce: try to make user settings stick twice
  net: mana: Check if netdev/napi_alloc_frag returns single page
  net: mana: Rename mana_refill_rxoob and remove some empty lines
  net: veth: add page_pool stats
  ... | ||
|  Luis Chamberlain | 8660484ed1 | module: add debugging auto-load duplicate module support The finit_module() system call can in the worst case use up to more than twice of a module's size in virtual memory. Duplicate finit_module() system calls are non fatal, however they unnecessarily strain virtual memory during bootup and in the worst case can cause a system to fail to boot. This is only known to currently be an issue on systems with larger number of CPUs. To help debug this situation we need to consider the different sources for finit_module(). Requests from the kernel that rely on module auto-loading, ie, the kernel's *request_module() API, are one source of calls. Although modprobe checks to see if a module is already loaded prior to calling finit_module() there is a small race possible allowing userspace to trigger multiple modprobe calls racing against modprobe and this not seeing the module yet loaded. This adds debugging support to the kernel module auto-loader (*request_module() calls) to easily detect duplicate module requests. To aid with possible bootup failure issues incurred by this, it will converge duplicates requests to a single request. This avoids any possible strain on virtual memory during bootup which could be incurred by duplicate module autoloading requests. Folks debugging virtual memory abuse on bootup can and should enable this to see what pr_warn()s come on, to see if module auto-loading is to blame for their wores. If they see duplicates they can further debug this by enabling the module.enable_dups_trace kernel parameter or by enabling CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE. Current evidence seems to point to only a few duplicates for module auto-loading. And so the source for other duplicates creating heavy virtual memory pressure due to larger number of CPUs should becoming from another place (likely udev). Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Arnd Bergmann | a81b1fc8ea | module: stats: fix invalid_mod_bytes typo This was caught by randconfig builds but does not show up in
build testing without CONFIG_MODULE_DECOMPRESS:
kernel/module/stats.c: In function 'mod_stat_bump_invalid':
kernel/module/stats.c:229:42: error: 'invalid_mod_byte' undeclared (first use in this function); did you mean 'invalid_mod_bytes'?
  229 |   atomic_long_add(info->compressed_len, &invalid_mod_byte);
      |                                          ^~~~~~~~~~~~~~~~
      |                                          invalid_mod_bytes
Fixes:  | ||
|  Tom Rix | 9f5cab173e | module: remove use of uninitialized variable len clang build reports
kernel/module/stats.c:307:34: error: variable
  'len' is uninitialized when used here [-Werror,-Wuninitialized]
        len = scnprintf(buf + 0, size - len,
                                        ^~~
At the start of this sequence, neither the '+ 0', nor the '- len' are needed.
So remove them and fix using 'len' uninitalized.
Fixes:  | ||
|  Arnd Bergmann | 719ccd803e | module: fix building stats for 32-bit targets The new module statistics code mixes 64-bit types and wordsized 'long'
variables, which leads to build failures on 32-bit architectures:
kernel/module/stats.c: In function 'read_file_mod_stats':
kernel/module/stats.c:291:29: error: passing argument 1 of 'atomic64_read' from incompatible pointer type [-Werror=incompatible-pointer-types]
  291 |  total_size = atomic64_read(&total_mod_size);
x86_64-linux-ld: kernel/module/stats.o: in function `read_file_mod_stats':
stats.c:(.text+0x2b2): undefined reference to `__udivdi3'
To fix this, the code has to use one of the two types consistently.
Change them all to word-size types here.
Fixes:  | ||
|  Arnd Bergmann | 635dc38314 | module: stats: include uapi/linux/module.h MODULE_INIT_COMPRESSED_FILE is defined in the uapi header, which
is not included indirectly from the normal linux/module.h, but
has to be pulled in explicitly:
kernel/module/stats.c: In function 'mod_stat_bump_invalid':
kernel/module/stats.c:227:14: error: 'MODULE_INIT_COMPRESSED_FILE' undeclared (first use in this function)
  227 |  if (flags & MODULE_INIT_COMPRESSED_FILE)
      |              ^~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | 064f4536d1 | module: avoid allocation if module is already present and ready The finit_module() system call can create unnecessary virtual memory
pressure for duplicate modules. This is because load_module() can in
the worse case allocate more than twice the size of a module in virtual
memory. This saves at least a full size of the module in wasted vmalloc
space memory by trying to avoid duplicates as soon as we can validate
the module name in the read module structure.
This can only be an issue if a system is getting hammered with userspace
loading modules. There are two ways to load modules typically on systems,
one is the kernel moduile auto-loading (*request_module*() calls in-kernel)
and the other is things like udev. The auto-loading is in-kernel, but that
pings back to userspace to just call modprobe. We already have a way to
restrict the amount of concurrent kernel auto-loads in a given time, however
that still allows multiple requests for the same module to go through
and force two threads in userspace racing to call modprobe for the same
exact module. Even though libkmod which both modprobe and udev does check
if a module is already loaded prior calling finit_module() races are
still possible and this is clearly evident today when you have multiple
CPUs.
To avoid memory pressure for such stupid cases put a stop gap for them.
The *earliest* we can detect duplicates from the modules side of things
is once we have blessed the module name, sadly after the first vmalloc
allocation. We can check for the module being present *before* a secondary
vmalloc() allocation.
There is a linear relationship between wasted virtual memory bytes and
the number of CPU counts. The reason is that udev ends up racing to call
tons of the same modules for each of the CPUs.
We can see the different linear relationships between wasted virtual
memory and CPU count during after boot in the following graph:
         +----------------------------------------------------------------------------+
    14GB |-+          +            +            +           +           *+          +-|
         |                                                          ****              |
         |                                                       ***                  |
         |                                                     **                     |
    12GB |-+                                                 **                     +-|
         |                                                 **                         |
         |                                               **                           |
         |                                             **                             |
         |                                           **                               |
    10GB |-+                                       **                               +-|
         |                                       **                                   |
         |                                     **                                     |
         |                                   **                                       |
     8GB |-+                               **                                       +-|
waste    |                               **                             ###           |
         |                             **                           ####              |
         |                           **                      #######                  |
     6GB |-+                     ****                    ####                       +-|
         |                      *                    ####                             |
         |                     *                 ####                                 |
         |                *****              ####                                     |
     4GB |-+            **               ####                                       +-|
         |            **             ####                                             |
         |          **           ####                                                 |
         |        **         ####                                                     |
     2GB |-+    **      #####                                                       +-|
         |     *    ####                                                              |
         |    * ####                                                   Before ******* |
         |  **##      +            +            +           +           After ####### |
         +----------------------------------------------------------------------------+
         0            50          100          150         200          250          300
                                          CPUs count
On the y-axis we can see gigabytes of wasted virtual memory during boot
due to duplicate module requests which just end up failing. Trying to
infer the slope this ends up being about ~463 MiB per CPU lost prior
to this patch. After this patch we only loose about ~230 MiB per CPU, for
a total savings of about ~233 MiB per CPU. This is all *just on bootup*!
On a 8vcpu 8 GiB RAM system using kdevops and testing against selftests
kmod.sh -t 0008 I see a saving in the *highest* side of memory
consumption of up to ~ 84 MiB with the Linux kernel selftests kmod
test 0008. With the new stress-ng module test I see a 145 MiB difference
in max memory consumption with 100 ops. The stress-ng module ops tests can be
pretty pathalogical -- it is not realistic, however it was used to
finally successfully reproduce issues which are only reported to happen on
system with over 400 CPUs [0] by just usign 100 ops on a 8vcpu 8 GiB RAM
system. Running out of virtual memory space is no surprise given the
above graph, since at least on x86_64 we're capped at 128 MiB, eventually
we'd hit a series of errors and once can use the above graph to
guestimate when. This of course will vary depending on the features
you have enabled. So for instance, enabling KASAN seems to make this
much worse.
The results with kmod and stress-ng can be observed and visualized below.
The time it takes to run the test is also not affected.
The kmod tests 0008:
The gnuplot is set to a range from 400000 KiB (390 Mib) - 580000 (566 Mib)
given the tests peak around that range.
cat kmod.plot
set term dumb
set output fileout
set yrange [400000:580000]
plot filein with linespoints title "Memory usage (KiB)"
Before:
root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008
root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > log-0008-before.txt ^C
root@kmod ~ # sort -n -r log-0008-before.txt | head -1
528732
So ~516.33 MiB
After:
root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008
root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > log-0008-after.txt ^C
root@kmod ~ # sort -n -r log-0008-after.txt | head -1
442516
So ~432.14 MiB
That's about 84 ~MiB in savings in the worst case. The graphs:
root@kmod ~ # gnuplot -e "filein='log-0008-before.txt'; fileout='graph-0008-before.txt'" kmod.plot
root@kmod ~ # gnuplot -e "filein='log-0008-after.txt';  fileout='graph-0008-after.txt'"  kmod.plot
root@kmod ~ # cat graph-0008-before.txt
  580000 +-----------------------------------------------------------------+
         |       +        +       +       +       +        +       +       |
  560000 |-+                                    Memory usage (KiB) ***A***-|
         |                                                                 |
  540000 |-+                                                             +-|
         |                                                                 |
         |        *A     *AA*AA*A*AA          *A*AA    A*A*A *AA*A*AA*A  A |
  520000 |-+A*A*AA  *AA*A           *A*AA*A*AA     *A*A     A          *A+-|
         |*A                                                               |
  500000 |-+                                                             +-|
         |                                                                 |
  480000 |-+                                                             +-|
         |                                                                 |
  460000 |-+                                                             +-|
         |                                                                 |
         |                                                                 |
  440000 |-+                                                             +-|
         |                                                                 |
  420000 |-+                                                             +-|
         |       +        +       +       +       +        +       +       |
  400000 +-----------------------------------------------------------------+
         0       5        10      15      20      25       30      35      40
root@kmod ~ # cat graph-0008-after.txt
  580000 +-----------------------------------------------------------------+
         |       +        +       +       +       +        +       +       |
  560000 |-+                                    Memory usage (KiB) ***A***-|
         |                                                                 |
  540000 |-+                                                             +-|
         |                                                                 |
         |                                                                 |
  520000 |-+                                                             +-|
         |                                                                 |
  500000 |-+                                                             +-|
         |                                                                 |
  480000 |-+                                                             +-|
         |                                                                 |
  460000 |-+                                                             +-|
         |                                                                 |
         |          *A              *A*A                                   |
  440000 |-+A*A*AA*A  A       A*A*AA    A*A*AA*A*AA*A*AA*A*AA*AA*A*AA*A*AA-|
         |*A           *A*AA*A                                             |
  420000 |-+                                                             +-|
         |       +        +       +       +       +        +       +       |
  400000 +-----------------------------------------------------------------+
         0       5        10      15      20      25       30      35      40
The stress-ng module tests:
This is used to run the test to try to reproduce the vmap issues
reported by David:
  echo 0 > /proc/sys/vm/oom_dump_tasks
  ./stress-ng --module 100 --module-name xfs
Prior to this commit:
root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > baseline-stress-ng.txt
root@kmod ~ # sort -n -r baseline-stress-ng.txt | head -1
5046456
After this commit:
root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > after-stress-ng.txt
root@kmod ~ # sort -n -r after-stress-ng.txt | head -1
4896972
5046456 - 4896972
149484
149484/1024
145.98046875000000000000
So this commit using stress-ng reveals saving about 145 MiB in memory
using 100 ops from stress-ng which reproduced the vmap issue reported.
cat kmod.plot
set term dumb
set output fileout
set yrange [4700000:5070000]
plot filein with linespoints title "Memory usage (KiB)"
root@kmod ~ # gnuplot -e "filein='baseline-stress-ng.txt'; fileout='graph-stress-ng-before.txt'"  kmod-simple-stress-ng.plot
root@kmod ~ # gnuplot -e "filein='after-stress-ng.txt'; fileout='graph-stress-ng-after.txt'"  kmod-simple-stress-ng.plot
root@kmod ~ # cat graph-stress-ng-before.txt
           +---------------------------------------------------------------+
  5.05e+06 |-+     + A     +       +       +       +       +       +     +-|
           |         *                          Memory usage (KiB) ***A*** |
           |         *                             A                       |
     5e+06 |-+      **                            **                     +-|
           |        **                            * *    A                 |
  4.95e+06 |-+      * *                          A  *   A*               +-|
           |        * *      A       A           *  *  *  *             A  |
           |       *  *     * *     * *        *A   *  *  *      A      *  |
   4.9e+06 |-+     *  *     * A*A   * A*AA*A  A      *A    **A   **A*A  *+-|
           |       A  A*A  A    *  A       *  *      A     A *  A    * **  |
           |      *      **      **         * *              *  *    * * * |
  4.85e+06 |-+   A       A       A          **               *  *     ** *-|
           |     *                           *               * *      ** * |
           |     *                           A               * *      *  * |
   4.8e+06 |-+   *                                           * *      A  A-|
           |     *                                           * *           |
  4.75e+06 |-+  *                                            * *         +-|
           |    *                                            **            |
           |    *  +       +       +       +       +       + **    +       |
   4.7e+06 +---------------------------------------------------------------+
           0       5       10      15      20      25      30      35      40
root@kmod ~ # cat graph-stress-ng-after.txt
           +---------------------------------------------------------------+
  5.05e+06 |-+     +       +       +       +       +       +       +     +-|
           |                                    Memory usage (KiB) ***A*** |
           |                                                               |
     5e+06 |-+                                                           +-|
           |                                                               |
  4.95e+06 |-+                                                           +-|
           |                                                               |
           |                                                               |
   4.9e+06 |-+                                      *AA                  +-|
           |  A*AA*A*A  A  A*AA*AA*A*AA*A  A  A  A*A   *AA*A*A  A  A*AA*AA |
           |  *      * **  *            *  *  ** *            ***  *       |
  4.85e+06 |-+*       ***  *            * * * ***             A *  *     +-|
           |  *       A *  *             ** * * A               *  *       |
           |  *         *  *             *  **                  *  *       |
   4.8e+06 |-+*         *  *             A   *                  *  *     +-|
           | *          * *                  A                  * *        |
  4.75e+06 |-*          * *                                     * *      +-|
           | *          * *                                     * *        |
           | *     +    * *+       +       +       +       +    * *+       |
   4.7e+06 +---------------------------------------------------------------+
           0       5       10      15      20      25      30      35      40
[0] https://lkml.kernel.org/r/20221013180518.217405-1-david@redhat.com
Reported-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | df3e764d8e | module: add debug stats to help identify memory pressure Loading modules with finit_module() can end up using vmalloc(), vmap()
and vmalloc() again, for a total of up to 3 separate allocations in the
worst case for a single module. We always kernel_read*() the module,
that's a vmalloc(). Then vmap() is used for the module decompression,
and if so the last read buffer is freed as we use the now decompressed
module buffer to stuff data into our copy module. The last allocation is
specific to each architectures but pretty much that's generally a series
of vmalloc() calls or a variation of vmalloc to handle ELF sections with
special permissions.
Evaluation with new stress-ng module support [1] with just 100 ops
is proving that you can end up using GiBs of data easily even with all
care we have in the kernel and userspace today in trying to not load modules
which are already loaded. 100 ops seems to resemble the sort of pressure a
system with about 400 CPUs can create on module loading. Although issues
relating to duplicate module requests due to each CPU inucurring a new
module reuest is silly and some of these are being fixed, we currently lack
proper tooling to help diagnose easily what happened, when it happened
and who likely is to blame -- userspace or kernel module autoloading.
Provide an initial set of stats which use debugfs to let us easily scrape
post-boot information about failed loads. This sort of information can
be used on production worklaods to try to optimize *avoiding* redundant
memory pressure using finit_module().
There's a few examples that can be provided:
A 255 vCPU system without the next patch in this series applied:
Startup finished in 19.143s (kernel) + 7.078s (userspace) = 26.221s
graphical.target reached after 6.988s in userspace
And 13.58 GiB of virtual memory space lost due to failed module loading:
root@big ~ # cat /sys/kernel/debug/modules/stats
         Mods ever loaded       67
     Mods failed on kread       0
Mods failed on decompress       0
  Mods failed on becoming       0
      Mods failed on load       1411
        Total module size       11464704
      Total mod text size       4194304
       Failed kread bytes       0
  Failed decompress bytes       0
    Failed becoming bytes       0
        Failed kmod bytes       14588526272
 Virtual mem wasted bytes       14588526272
         Average mod size       171115
    Average mod text size       62602
  Average fail load bytes       10339140
Duplicate failed modules:
              module-name        How-many-times                    Reason
                kvm_intel                   249                      Load
                      kvm                   249                      Load
                irqbypass                     8                      Load
         crct10dif_pclmul                   128                      Load
      ghash_clmulni_intel                    27                      Load
             sha512_ssse3                    50                      Load
           sha512_generic                   200                      Load
              aesni_intel                   249                      Load
              crypto_simd                    41                      Load
                   cryptd                   131                      Load
                    evdev                     2                      Load
                serio_raw                     1                      Load
               virtio_pci                     3                      Load
                     nvme                     3                      Load
                nvme_core                     3                      Load
    virtio_pci_legacy_dev                     3                      Load
    virtio_pci_modern_dev                     3                      Load
                   t10_pi                     3                      Load
                   virtio                     3                      Load
             crc32_pclmul                     6                      Load
           crc64_rocksoft                     3                      Load
             crc32c_intel                    40                      Load
              virtio_ring                     3                      Load
                    crc64                     3                      Load
The following screen shot, of a simple 8vcpu 8 GiB KVM guest with the
next patch in this series applied, shows 226.53 MiB are wasted in virtual
memory allocations which due to duplicate module requests during boot.
It also shows an average module memory size of 167.10 KiB and an an
average module .text + .init.text size of 61.13 KiB. The end shows all
modules which were detected as duplicate requests and whether or not
they failed early after just the first kernel_read*() call or late after
we've already allocated the private space for the module in
layout_and_allocate(). A system with module decompression would reveal
more wasted virtual memory space.
We should put effort now into identifying the source of these duplicate
module requests and trimming these down as much possible. Larger systems
will obviously show much more wasted virtual memory allocations.
root@kmod ~ # cat /sys/kernel/debug/modules/stats
         Mods ever loaded       67
     Mods failed on kread       0
Mods failed on decompress       0
  Mods failed on becoming       83
      Mods failed on load       16
        Total module size       11464704
      Total mod text size       4194304
       Failed kread bytes       0
  Failed decompress bytes       0
    Failed becoming bytes       228959096
        Failed kmod bytes       8578080
 Virtual mem wasted bytes       237537176
         Average mod size       171115
    Average mod text size       62602
  Avg fail becoming bytes       2758544
  Average fail load bytes       536130
Duplicate failed modules:
              module-name        How-many-times                    Reason
                kvm_intel                     7                  Becoming
                      kvm                     7                  Becoming
                irqbypass                     6           Becoming & Load
         crct10dif_pclmul                     7           Becoming & Load
      ghash_clmulni_intel                     7           Becoming & Load
             sha512_ssse3                     6           Becoming & Load
           sha512_generic                     7           Becoming & Load
              aesni_intel                     7                  Becoming
              crypto_simd                     7           Becoming & Load
                   cryptd                     3           Becoming & Load
                    evdev                     1                  Becoming
                serio_raw                     1                  Becoming
                     nvme                     3                  Becoming
                nvme_core                     3                  Becoming
                   t10_pi                     3                  Becoming
               virtio_pci                     3                  Becoming
             crc32_pclmul                     6           Becoming & Load
           crc64_rocksoft                     3                  Becoming
             crc32c_intel                     3                  Becoming
    virtio_pci_modern_dev                     2                  Becoming
    virtio_pci_legacy_dev                     1                  Becoming
                    crc64                     2                  Becoming
                   virtio                     2                  Becoming
              virtio_ring                     2                  Becoming
[0] https://github.com/ColinIanKing/stress-ng.git
[1] echo 0 > /proc/sys/vm/oom_dump_tasks
    ./stress-ng --module 100 --module-name xfs
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | f71afa6a42 | module: extract patient module check into helper The patient module check inside add_unformed_module() is large enough as we need it. It is a bit hard to read too, so just move it to a helper and do the inverse checks first to help shift the code and make it easier to read. The new helper then is module_patient_check_exists(). To make this work we need to mvoe the finished_loading() up, we do that without making any functional changes to that routine. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | 25a1b5b518 | modules/kmod: replace implementation with a semaphore Simplify the concurrency delimiter we use for kmod with the semaphore. I had used the kmod strategy to try to implement a similar concurrency delimiter for the kernel_read*() calls from the finit_module() path so to reduce vmalloc() memory pressure. That effort didn't provide yet conclusive results, but one thing that became clear is we can use the suggested alternative solution with semaphores which Linus hinted at instead of using the atomic / wait strategy. I've stress tested this with kmod test 0008: time /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008 And I get only a *slight* delay. That delay however is small, a few seconds for a full test loop run that runs 150 times, for about ~30-40 seconds. The small delay is worth the simplfication IMHO. Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | 430bb0d1c3 | module: fix kmemleak annotations for non init ELF sections Commit | ||
|  Tiezhu Yang | 0a3bf86092 | module: Ignore L0 and rename is_arm_mapping_symbol() The L0 symbol is generated when build module on LoongArch, ignore it in
modpost and when looking at module symbols, otherwise we can not see the
expected call trace.
Now is_arm_mapping_symbol() is not only for ARM, in order to reflect the
reality, rename is_arm_mapping_symbol() to is_mapping_symbol().
This is related with commit  | ||
|  Tiezhu Yang | 987d2e0aaa | module: Move is_arm_mapping_symbol() to module_symbol.h In order to avoid duplicated code, move is_arm_mapping_symbol() to include/linux/module_symbol.h, then remove is_arm_mapping_symbol() in the other places. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Tiezhu Yang | 87e5b1e8f2 | module: Sync code of is_arm_mapping_symbol() After commit | ||
|  Jiri Olsa | d099f594ad | kallsyms: Disable preemption for find_kallsyms_symbol_value Artem reported suspicious RCU usage [1]. The reason is that verifier
calls find_kallsyms_symbol_value with preemption enabled which will
trigger suspicious RCU usage warning in rcu_dereference_sched call.
Disabling preemption in find_kallsyms_symbol_value and adding
__find_kallsyms_symbol_value function.
Fixes:  | ||
|  Jim Cromie | 33c951f629 | module: already_uses() - reduce pr_debug output volume already_uses() is unnecessarily chatty. `modprobe i915` yields 491 messages like: [ 64.108744] i915 uses drm! This is a normal situation, and isn't worth all the log entries. NOTE: I've preserved the "does not use %s" messages, which happens less often, but does happen. Its not clear to me what it tells a reader, or what info might improve the pr_debug's utility. [ 6847.584999] main:already_uses:569: amdgpu does not use ttm! [ 6847.585001] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585014] main:already_uses:569: amdgpu does not use drm! [ 6847.585016] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585024] main:already_uses:569: amdgpu does not use drm_display_helper! [ 6847.585025] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585084] main:already_uses:569: amdgpu does not use drm_kms_helper! [ 6847.585086] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585175] main:already_uses:569: amdgpu does not use drm_buddy! [ 6847.585176] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585202] main:already_uses:569: amdgpu does not use i2c_algo_bit! [ 6847.585204] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585249] main:already_uses:569: amdgpu does not use gpu_sched! [ 6847.585250] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585314] main:already_uses:569: amdgpu does not use video! [ 6847.585315] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585409] main:already_uses:569: amdgpu does not use iommu_v2! [ 6847.585410] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585816] main:already_uses:569: amdgpu does not use drm_ttm_helper! [ 6847.585818] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6848.762268] dyndbg: add-module: amdgpu.2533 sites no functional changes. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Jim Cromie | 66a2301edf | module: add section-size to move_module pr_debug move_module() pr_debug's "Final section addresses for $modname". Add section addresses to the message, for anyone looking at these. no functional changes. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Jim Cromie | b10addf37b | module: add symbol-name to pr_debug Absolute symbol The pr_debug("Absolute symbol" ..) reports value, (which is usually
0), but not the name, which is more informative.  So add it.
no functional changes
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Jim Cromie | 6ed81802d4 | module: in layout_sections, move_module: add the modname layout_sections() and move_module() each issue ~50 messages for each module loaded. Add mod-name into their 2 header lines, to help the reader find his module. no functional changes. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | 25be451aa4 | module: fold usermode helper kmod into modules directory The kernel/kmod.c is already only built if we enabled modules, so just stuff it under kernel/module/kmod.c and unify the MAINTAINERS file for it. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | 3d40bb903e | module: merge remnants of setup_load_info() to elf validation The setup_load_info() was actually had ELF validation checks of its own. To later cache useful variables as an secondary step just means looping again over the ELF sections we just validated. We can simply keep tabs of the key sections of interest as we validate the module ELF section in one swoop, so do that and merge the two routines together. Expand a bit on the documentation / intent / goals. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | 1bb49db991 | module: move more elf validity checks to elf_validity_check() The symbol and strings section validation currently happen in setup_load_info() but since they are also doing validity checks move this to elf_validity_check(). Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | c7ee8aebf6 | module: add stop-grap sanity check on module memcpy() The integrity of the struct module we load is important, and although our ELF validator already checks that the module section must match struct module, add a stop-gap check before we memcpy() the final minted module. This also makes those inspecting the code what the goal is. While at it, clarify the goal behind updating the sh_addr address. The current comment is pretty misleading. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | 46752820f9 | module: add sanity check for ELF module section The ELF ".gnu.linkonce.this_module" section is special, it is what we use to construct the struct module __this_module, which THIS_MODULE points to. When userspace loads a module we always deal first with a copy of the userspace buffer, and twiddle with the userspace copy's version of the struct module. Eventually we allocate memory to do a memcpy() of that struct module, under the assumption that the module size is right. But we have no validity checks against the size or the requirements for the section. Add some validity checks for the special module section early and while at it, cache the module section index early, so we don't have to do that later. While at it, just move over the assigment of the info->mod to make the code clearer. The validity checker also adds an explicit size check to ensure the module section size matches the kernel's run time size for sizeof(struct module). This should prevent sloppy loads of modules which are built today *without* actually increasing the size of the struct module. A developer today can for example expand the size of struct module, rebuild a directoroy 'make fs/xfs/' for example and then try to insmode the driver there. That module would in effect have an incorrect size. This new size check would put a stop gap against such mistakes. This also makes the entire goal of ".gnu.linkonce.this_module" pretty clear. Before this patch verification of the goal / intent required some Indian Jones whips, torches and cleaning up big old spider webs. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | 419e1a20f7 | module: rename check_module_license_and_versions() to check_export_symbol_versions() This makes the routine easier to understand what the check its checking for. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | 72f08b3cc6 | module: converge taint work together Converge on a compromise: so long as we have a module hit our linked list of modules we taint. That is, the module was about to become live. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | c3bbf62ebf | module: move signature taint to module_augment_kernel_taints() Just move the signature taint into the helper: module_augment_kernel_taints() Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | a12b94511c | module: move tainting until after a module hits our linked list It is silly to have taints spread out all over, we can just compromise and add them if the module ever hit our linked list. Our sanity checkers should just prevent crappy drivers / bogus ELF modules / etc and kconfig options should be enough to let you *not* load things you don't want. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | 437c1f9cc6 | module: split taint adding with info checking check_modinfo() actually does two things:
 a) sanity checks, some of which are fatal, and so we
    prevent the user from completing trying to load a module
 b) taints the kernel
The taints are pretty heavy handed because we're tainting the kernel
*before* we ever even get to load the module into the modules linked
list. That is, it it can fail for other reasons later as we review the
module's structure.
But this commit makes no functional changes, it just makes the intent
clearer and splits the code up where needed to make that happen.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | ed52cabecb | module: split taint work out of check_modinfo_livepatch() The work to taint the kernel due to a module should be split up eventually. To aid with this, split up the tainting on check_modinfo_livepatch(). This let's us bring more early checks together which do return a value, and makes changes easier to read later where we stuff all the work to do the taints in one single routine. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | ad8d3a36e9 | module: rename set_license() to module_license_taint_check() The set_license() routine would seem to a reader to do some sort of setting, but it does not. It just adds a taint if the license is not set or proprietary. This makes what the code is doing clearer, so much we can remove the comment about it. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | ||
|  Luis Chamberlain | 02da2cbab4 | module: move check_modinfo() early to early_mod_check() This moves check_modinfo() to early_mod_check(). This doesn't make any functional changes either, as check_modinfo() was the first call on layout_and_allocate(), so we're just moving it back one routine and at the end. This let's us keep separate the checkers from the allocator. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |