mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-09-04 20:19:47 +08:00
d8f26717c9
3771 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
538c429a4b |
tracing fixes:
- Fix regression of waiting a long time on updating trace event filters When the faultable trace points were added, it needed task trace RCU synchronization. This was added to the tracepoint_synchronize_unregister() function. The filter logic always called this function whenever it updated the trace event filters before freeing the old filters. This increased the time of "trace-cmd record" from taking 13 seconds to running over 2 minutes to complete. Move the freeing of the filters to call_rcu*() logic, which brings the time back down to 13 seconds. - Fix ring_buffer_subbuf_order_set() error path lock protection The error path of the ring_buffer_subbuf_order_set() released the mutex too early and allowed subsequent accesses to setting the subbuffer size to corrupt the data and cause a bug. By moving the mutex locking to the end of the error path, it prevents the reentrant access to the critical data and also allows the function to convert the taking of the mutex over to the guard() logic. - Remove unused power management clock events The clock events were added in 2010 for power management. In 2011 arm used them. In 2013 the code they were used in was removed. These events have been wasting memory since then. - Fix sparse warnings There was a few places that sparse warned about trace_events_filter.c where file->filter was referenced directly, but it is annotated with an __rcu tag. Use the helper functions and fix them up to use rcu_dereference() properly. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaEST0xQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qgdSAPoD7L17oeiP5KQkM0wPuPBz0tmJF7XE 2VmHp1lBu5rYwgEAyHTD7SqWvInMMp9sGt5tzkByXpOsYC65/RprkbFpXwA= =s4wK -----END PGP SIGNATURE----- Merge tag 'trace-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull more tracing fixes from Steven Rostedt: - Fix regression of waiting a long time on updating trace event filters When the faultable trace points were added, it needed task trace RCU synchronization. This was added to the tracepoint_synchronize_unregister() function. The filter logic always called this function whenever it updated the trace event filters before freeing the old filters. This increased the time of "trace-cmd record" from taking 13 seconds to running over 2 minutes to complete. Move the freeing of the filters to call_rcu*() logic, which brings the time back down to 13 seconds. - Fix ring_buffer_subbuf_order_set() error path lock protection The error path of the ring_buffer_subbuf_order_set() released the mutex too early and allowed subsequent accesses to setting the subbuffer size to corrupt the data and cause a bug. By moving the mutex locking to the end of the error path, it prevents the reentrant access to the critical data and also allows the function to convert the taking of the mutex over to the guard() logic. - Remove unused power management clock events The clock events were added in 2010 for power management. In 2011 arm used them. In 2013 the code they were used in was removed. These events have been wasting memory since then. - Fix sparse warnings There was a few places that sparse warned about trace_events_filter.c where file->filter was referenced directly, but it is annotated with an __rcu tag. Use the helper functions and fix them up to use rcu_dereference() properly. * tag 'trace-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Add rcu annotation around file->filter accesses tracing: PM: Remove unused clock events ring-buffer: Fix buffer locking in ring_buffer_subbuf_order_set() tracing: Fix regression of filter waiting a long time on RCU synchronization |
||
![]() |
de6fdc076d |
tracing: PM: Remove unused clock events
The events clock_enable, clock_disable, and clock_set_rate were added back
in 2010. In 2011 they were used by the arm architecture but removed in
2013. These events add around 7K of memory which was wasted for the last 12
years.
Remove them.
Link: https://lore.kernel.org/all/20250529130138.544ffec4@gandalf.local.home/
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kajetan Puchalski <kajetan.puchalski@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/20250605162106.1a459dad@gandalf.local.home
Fixes:
|
||
![]() |
5abc7438f1 |
NFS Clent Updates for Linux 6.16
New Features: * Implement the Sunrpc rfc2203 rpcsec_gss sequence number cache * Add support for FALLOC_FL_ZERO_RANGE on NFS v4.2 * Add a localio sysfs attribute Stable Fixes: * Fix double-unlock bug in nfs_return_empty_folio() * Don't check for OPEN feature support in v4.1 * Always probe for LOCALIO support asynchronously * Prevent hang on NFS mounts with xprtsec=[m]tls Other Bugfixes: * xattr handlers should check for absent nfs filehandles * Fix setattr caching of TIME_[MODIFY|ACCESS]_SET when timestamps are delegated * Fix listxattr to return selinux security labels * Connect to NFSv3 DS using TLS if MDS connection uses TLS * Clear SB_RDONLY before getting a superblock, and ignore when remounting * Fix incorrect handling of NFS error codes in nfs4_do_mkdir() * Various nfs_localio fixes from Neil Brown that include fixing an rcu compilation error found by older gcc versions. * Update stats on flexfiles pNFS DSes when receiving NFS4ERR_DELAY Cleanups: * Add a refcount tracker for struct net in the nfs_client * Allow FREE_STATEID to clean up delegations * Always set NLINK even if the server doesn't support it * Cleanups to the NFS folio writeback code * Remove dead code from xs_tcp_tls_setup_socket() -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmg/YkAACgkQ18tUv7Cl QOuGpQ/+OuG/xkVX6j7FerUcdbVhcZ+5jDUKC0cNe6EeFeFRjgqsdFB0uqH+AgJh DlxEJuXTMq+9mcptl0rjrOn0tj7dlTpgZowp3kWdK3bX1zSI2jBEJjnz3xVzjBQx 3lbmF/UAIaHv5bPVc9aF8mioaj5DSRKWTBLTg7iOM1ol1DqgHK/M0q2D7d2n1yB4 WYGI7LlAWSBGV4PvEkhHW6PwVPDSqECPBvIxd1obq8TSNl+YZlmVxCoJ99+zVqWf dvaDOwfs5x+YEQH/+N/XWdc38QiCGfu7H79qGHShWB8t/KT4axxmjVs2fT7xtUsv yN3fb77rlFOCJaPLRF549/4EJqHYMWmFDKIMUZ7YC1vEBCG4B1kQUqarA5eCbsAi s/rxBs1VNKeev/RecDaViAeH3XZoVU1rNyIBJjOuWgNlC5wnbF+An3zE0m8MAXxO Vh7wQSH3GZEY+VCR6ljwLhIv6+tvSVQxEZKUUjfVQXp5UuNwN3wKa+sW6li+FBl6 uV6lJcmdUffrurNhvSghIiSQGDkerHUVhSltgtj5FnmRp/AM95Z850t5a7qqc7Cv duks9siLLaeC4K5W+AOcKLWXho1dJMIPWUej3ErCiHWnA20QiNXsQN4QoimkDKqf 9SYdcl6UECqV5MzIa/L7cW96S3K0acrq+8ofJCjN3A8M0pcTGgU= =5DFQ -----END PGP SIGNATURE----- Merge tag 'nfs-for-6.16-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS clent updates from Anna Schumaker: "New Features: - Implement the Sunrpc rfc2203 rpcsec_gss sequence number cache - Add support for FALLOC_FL_ZERO_RANGE on NFS v4.2 - Add a localio sysfs attribute Stable Fixes: - Fix double-unlock bug in nfs_return_empty_folio() - Don't check for OPEN feature support in v4.1 - Always probe for LOCALIO support asynchronously - Prevent hang on NFS mounts with xprtsec=[m]tls Other Bugfixes: - xattr handlers should check for absent nfs filehandles - Fix setattr caching of TIME_[MODIFY|ACCESS]_SET when timestamps are delegated - Fix listxattr to return selinux security labels - Connect to NFSv3 DS using TLS if MDS connection uses TLS - Clear SB_RDONLY before getting a superblock, and ignore when remounting - Fix incorrect handling of NFS error codes in nfs4_do_mkdir() - Various nfs_localio fixes from Neil Brown that include fixing an rcu compilation error found by older gcc versions. - Update stats on flexfiles pNFS DSes when receiving NFS4ERR_DELAY Cleanups: - Add a refcount tracker for struct net in the nfs_client - Allow FREE_STATEID to clean up delegations - Always set NLINK even if the server doesn't support it - Cleanups to the NFS folio writeback code - Remove dead code from xs_tcp_tls_setup_socket()" * tag 'nfs-for-6.16-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (30 commits) flexfiles/pNFS: update stats on NFS4ERR_DELAY for v4.1 DSes nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh() nfs_localio: duplicate nfs_close_local_fh() nfs_localio: simplify interface to nfsd for getting nfsd_file nfs_localio: always hold nfsd net ref with nfsd_file ref nfs_localio: use cmpxchg() to install new nfs_file_localio SUNRPC: Remove dead code from xs_tcp_tls_setup_socket() SUNRPC: Prevent hang on NFS mount with xprtsec=[m]tls nfs: fix incorrect handling of large-number NFS errors in nfs4_do_mkdir() nfs: ignore SB_RDONLY when remounting nfs nfs: clear SB_RDONLY before getting superblock NFS: always probe for LOCALIO support asynchronously pnfs/flexfiles: connect to NFSv3 DS using TLS if MDS connection uses TLS NFS: add localio to sysfs nfs: use writeback_iter directly nfs: refactor nfs_do_writepage nfs: don't return AOP_WRITEPAGE_ACTIVATE from nfs_do_writepage nfs: fold nfs_page_async_flush into nfs_do_writepage NFSv4: Always set NLINK even if the server doesn't support it ... |
||
![]() |
70087d2200 |
tracing fixes:
- Fix UAF in module unload in ftrace when there's a bug in the module If a module is buggy and triggers ftrace_disable which is set when an anomaly is detected, when it gets unloaded it doesn't free the hooks into kallsyms, and when a kallsyms lookup is performed it may access the mod->modname field and crash via UAF. Fix this by still freeing the mod_maps that are attached to kallsyms on module unload regardless if ftrace_disable is set or not. - Do not bother allocating mod_maps for kallsyms if ftrace_disable is set - Remove unused trace events When a trace event or tracepoint is created but not used, it still creates the code and data structures needed for that trace event. This just wastes memory. A patch is being worked on to warn when a trace event is created but not used: https://lore.kernel.org/linux-trace-kernel/20250529130138.544ffec4@gandalf.local.home/ Remove the trace events that are created but not used. This does not remove trace events that are created but are not used due configs not being set. That will be handled later. This only removes events that have no user under any config. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaD9LohQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qvrRAP4xRH01dQ3HkNF3mtKXuHEh8NbTlCEE 8wYyiI8ttjVdGAEAzq5sx2BQN2Of4RLOwYtxJSigZgmJjYYGmobeHISPjwc= =d2Cp -----END PGP SIGNATURE----- Merge tag 'trace-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix UAF in module unload in ftrace when there's a bug in the module If a module is buggy and triggers ftrace_disable which is set when an anomaly is detected, when it gets unloaded it doesn't free the hooks into kallsyms, and when a kallsyms lookup is performed it may access the mod->modname field and crash via UAF. Fix this by still freeing the mod_maps that are attached to kallsyms on module unload regardless if ftrace_disable is set or not. - Do not bother allocating mod_maps for kallsyms if ftrace_disable is set - Remove unused trace events When a trace event or tracepoint is created but not used, it still creates the code and data structures needed for that trace event. This just wastes memory. Remove the trace events that are created but not used. This does not remove trace events that are created but are not used due configs not being set. That will be handled later. This only removes events that have no user under any config. * tag 'trace-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: fsdax: Remove unused trace events for dax insert mapping genirq/matrix: Remove unused irq_matrix_alloc_reserved tracepoint xdp: Remove unused mem_return_failed event ftrace: Don't allocate ftrace module map if ftrace is disabled ftrace: Fix UAF when lookup kallsym after ftrace disabled |
||
![]() |
a4a45a9a72 |
fsdax: Remove unused trace events for dax insert mapping
When the dax_fault_actor() helper was factored out, it removed the calls
to the dax_pmd_insert_mapping and dax_insert_mapping events but never
removed the events themselves. As each event created takes up memory
(roughly 5K each), this is a waste as it is never used.
Remove the unused dax_pmd_insert_mapping and dax_insert_mapping trace
events.
Link: https://lore.kernel.org/all/20250529130138.544ffec4@gandalf.local.home/
Cc: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250529152211.688800c9@gandalf.local.home
Fixes:
|
||
![]() |
0fb34422b5 |
vfs-6.16-rc1.netfs
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPUAAKCRCRxhvAZXjc ouMEAQCrviYPG/WMtPTH7nBIbfVQTfNEXt/TvN7u7OjXb+RwRAEAwe9tLy4GrS/t GuvUPWAthbhs77LTvxj6m3Gf49BOVgQ= =6FqN -----END PGP SIGNATURE----- Merge tag 'vfs-6.16-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull netfs updates from Christian Brauner: - The main API document has been extensively updated/rewritten - Fix an oops in write-retry due to mis-resetting the I/O iterator - Fix the recording of transferred bytes for short DIO reads - Fix a request's work item to not require a reference, thereby avoiding the need to get rid of it in BH/IRQ context - Fix waiting and waking to be consistent about the waitqueue used - Remove NETFS_SREQ_SEEK_DATA_READ, NETFS_INVALID_WRITE, NETFS_ICTX_WRITETHROUGH, NETFS_READ_HOLE_CLEAR, NETFS_RREQ_DONT_UNLOCK_FOLIOS, and NETFS_RREQ_BLOCKED - Reorder structs to eliminate holes - Remove netfs_io_request::ractl - Only provide proc_link field if CONFIG_PROC_FS=y - Remove folio_queue::marks3 - Fix undifferentiation of DIO reads from unbuffered reads * tag 'vfs-6.16-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: netfs: Fix undifferentiation of DIO reads from unbuffered reads netfs: Fix wait/wake to be consistent about the waitqueue used netfs: Fix the request's work item to not require a ref netfs: Fix setting of transferred bytes with short DIO reads netfs: Fix oops in write-retry from mis-resetting the subreq iterator fs/netfs: remove unused flag NETFS_RREQ_BLOCKED fs/netfs: remove unused flag NETFS_RREQ_DONT_UNLOCK_FOLIOS folio_queue: remove unused field `marks3` fs/netfs: declare field `proc_link` only if CONFIG_PROC_FS=y fs/netfs: remove `netfs_io_request.ractl` fs/netfs: reorder struct fields to eliminate holes fs/netfs: remove unused enum choice NETFS_READ_HOLE_CLEAR fs/netfs: remove unused flag NETFS_ICTX_WRITETHROUGH fs/netfs: remove unused source NETFS_INVALID_WRITE fs/netfs: remove unused flag NETFS_SREQ_SEEK_DATA_READ |
||
![]() |
167d7ede00 |
genirq/matrix: Remove unused irq_matrix_alloc_reserved tracepoint
The tracepoint irq_matrix_alloc_reserved was added but never used.
Remove it.
Link: https://lore.kernel.org/all/20250529130138.544ffec4@gandalf.local.home/
Cc: Juergen Gross <jgross@suse.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250529135739.26e5c075@gandalf.local.home
Fixes:
|
||
![]() |
e27e43a5cb |
xdp: Remove unused mem_return_failed event
The change to allow page_pool to handle its own page destruction instead
of relying on XDP removed the trace_mem_return_failed() tracepoint caller,
but did not remove the mem_return_failed trace event. As trace events take
up memory when they are created regardless of if they are used or not,
having this unused event around wastes around 5K of memory.
Remove the unused event.
Link: https://lore.kernel.org/all/20250529130138.544ffec4@gandalf.local.home/
Cc: netdev <netdev@vger.kernel.org>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250529160550.1f888b15@gandalf.local.home
Fixes:
|
||
![]() |
00c010e130 |
- The 11 patch series "Add folio_mk_pte()" from Matthew Wilcox
simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - The 8 patch series "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - The 3 patch series "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - The 2 patch series "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - The 8 patch series "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - The 6 patch series "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - The 3 patch series "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - The 2 patch series "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - The 3 patch series "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - The 3 patch series "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - The 12 patch series "Always call constructor for kernel page tables" from Kevin Brodsky "ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables". This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - The 9 patch series "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - The 3 patch series "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - The 4 patch series "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - The 6 patch series "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - The 3 patch series "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - The 3 patch series ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - The 7 patch series "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - The 5 patch series "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - The 2 patch series "vmscan: enforce mems_effective during demotion" from Gregory Price "changes reclaim to respect cpuset.mems_effective during demotion when possible". because "presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated." "This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently." - The 2 patch series ""Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - The 3 patch series "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - The 4 patch series "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - The 17 patch series "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - The 7 patch series "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - The 2 patch series "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - The 2 patch series "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - The 4 patch series "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - The 6 patch series "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - The 2 patch series "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - The 8 patch series "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - The 3 patch series "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - The 4 patch series "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - The 6 patch series "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is "yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents." - The 7 patch series "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - The 4 patch series "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaDt5qgAKCRDdBJ7gKXxA ju6XAP9nTiSfRz8Cz1n5LJZpFKEGzLpSihCYyR6P3o1L9oe3mwEAlZ5+XAwk2I5x Qqb/UGMEpilyre1PayQqOnct3aSL9Ao= =tYYm -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ... |
||
![]() |
d8441523f2 |
f2fs-for-6.16-rc1
In this round, Matthew converted most of page operations to using folio. Beyond the work, we've applied some performance tunings such as GC and linear lookup, in addition to enhancing fault injection and sanity checks. Enhancement: - large number of folio conversions - add a control to turn on/off the linear lookup for performance - tune GC logics for zoned block device - improve fault injection and sanity checks Bug fix: - handle error cases of memory donation - fix to correct check conditions in f2fs_cross_rename - fix to skip f2fs_balance_fs() if checkpoint is disabled - don't over-report free space or inodes in statvfs - prevent the current section from being selected as a victim during GC - fix to calculate first_zoned_segno correctly - fix to avoid inconsistence in between SIT and SSA for zoned block device As usual, there are several debugging patches and clean-ups as well. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmg3PdcACgkQQBSofoJI UNL/mQ/9Hkru4XSCokhxt8+/HoFRnTliAlzfD45Vzkkhz1YP7J8VdvWOzJV/WEai D3Ib50Q6/y2ptxu7cwOpmToR3fI3RzAlgQsYooFAiZOBnyUkBOLA1oaVuT4s/EYg u85xxLx0SW/IMX5CKKbYzhbXnocGAvRUkp/k30kjKJxpCeQ7pw/mLhw/2XeNIb9h FxJbECWPpf4PA6ot22YUNvQn0plF/s9873PPhv50vpGyXTHIlTbDCSMeEC1r1E5v xWsPcWmTkyPIyBhNFEONWJw1l3wcVIVKNBfBqwMEDr+Tgqi5UDEREeTDV9q5C6y+ vw3KnsOqX7RTdLExGfefTOnBsTqqMwSZQSH2HL5/Poayg5obXf3D/fUqAQajJpt/ FbAtfKaXElJcC7l3DJQU3Trh+WpdEPbuMiJo43OzX0YGvMfkA/sYrAHTYm5Q4nsC wrRLaWiBgG6nQDKNXz+amD9kL1SMxp+Vsf6ybtChH3gvMqDAJsR7DY1F/Cxe3ry8 8JoJiGRYq70lw5xNACfJNQwWwRbtySy63nIwMA7FGR9zaXBQJx+cSPhEeLsS+0hI zgijgtgRjbfuojlh7qvfFArHEIL4A67Um3RhjHbLWSFhREPaTB0665ElUNTGPe+y hVdYtkb0X2ngsYdV/Xdmp/OThpSxI8x1ZCXVsrElawVIMpjP+nA= =G8sl -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, Matthew converted most of page operations to using folio. Beyond the work, we've applied some performance tunings such as GC and linear lookup, in addition to enhancing fault injection and sanity checks. Enhancements: - large number of folio conversions - add a control to turn on/off the linear lookup for performance - tune GC logics for zoned block device - improve fault injection and sanity checks Bug fixes: - handle error cases of memory donation - fix to correct check conditions in f2fs_cross_rename - fix to skip f2fs_balance_fs() if checkpoint is disabled - don't over-report free space or inodes in statvfs - prevent the current section from being selected as a victim during GC - fix to calculate first_zoned_segno correctly - fix to avoid inconsistence between SIT and SSA for zoned block device As usual, there are several debugging patches and clean-ups as well" * tag 'f2fs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (195 commits) f2fs: fix to correct check conditions in f2fs_cross_rename f2fs: use d_inode(dentry) cleanup dentry->d_inode f2fs: fix to skip f2fs_balance_fs() if checkpoint is disabled f2fs: clean up to check bi_status w/ BLK_STS_OK f2fs: introduce is_{meta,node}_folio f2fs: add ckpt_valid_blocks to the section entry f2fs: add a method for calculating the remaining blocks in the current segment in LFS mode. f2fs: introduce FAULT_VMALLOC f2fs: use vmalloc instead of kvmalloc in .init_{,de}compress_ctx f2fs: add f2fs_bug_on() in f2fs_quota_read() f2fs: add f2fs_bug_on() to detect potential bug f2fs: remove unused sbi argument from checksum functions f2fs: fix 32-bits hexademical number in fault injection doc f2fs: don't over-report free space or inodes in statvfs f2fs: return bool from __write_node_folio f2fs: simplify return value handling in f2fs_fsync_node_pages f2fs: always unlock the page in f2fs_write_single_data_page f2fs: remove wbc->for_reclaim handling f2fs: return bool from __f2fs_write_meta_folio f2fs: fix to return correct error number in f2fs_sync_node_pages() ... |
||
![]() |
ae5ec8adb8 |
tsm for 6.16
- Add a general sysfs scheme for publishing "Measurement" values provided by the architecture's TEE Security Manager. Use it to publish TDX "Runtime Measurement Registers" ("RTMRs") that either maintain a hash of stored values (similar to a TPM PCR) or provide statically provisioned data. These measurements are validated by a relying party. - Reorganize the drivers/virt/coco/ directory for "host" and "guest" shared infrastructure. - Fix a configfs-tsm-report unregister bug - With CONFIG_TSM_MEASUREMENTS joining CONFIG_TSM_REPORTS and in anticipation of more shared "TSM" infrastructure arriving, rename the maintainer entry to "TRUSTED SECURITY MODULE (TSM) INFRASTRUCTURE". -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCaDj38gAKCRDfioYZHlFs Z3EKAQC2K7RgoufBlLv4C79W8IGiUirKKQvtY9aiC7s/W8R4UwEApwV5gXQx2ImN cEIIkAkVI2h9wJ9LHxyr3R5XfZPBGgA= =2fTp -----END PGP SIGNATURE----- Merge tag 'tsm-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm Pull trusted security manager (TSM) updates from Dan Williams: - Add a general sysfs scheme for publishing "Measurement" values provided by the architecture's TEE Security Manager. Use it to publish TDX "Runtime Measurement Registers" ("RTMRs") that either maintain a hash of stored values (similar to a TPM PCR) or provide statically provisioned data. These measurements are validated by a relying party. - Reorganize the drivers/virt/coco/ directory for "host" and "guest" shared infrastructure. - Fix a configfs-tsm-report unregister bug - With CONFIG_TSM_MEASUREMENTS joining CONFIG_TSM_REPORTS and in anticipation of more shared "TSM" infrastructure arriving, rename the maintainer entry to "TRUSTED SECURITY MODULE (TSM) INFRASTRUCTURE". * tag 'tsm-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm: tsm-mr: Fix init breakage after bin_attrs constification by scoping non-const pointers to init phase sample/tsm-mr: Fix missing static for sample_report virt: tdx-guest: Transition to scoped_cond_guard for mutex operations virt: tdx-guest: Refactor and streamline TDREPORT generation virt: tdx-guest: Expose TDX MRs as sysfs attributes x86/tdx: tdx_mcall_get_report0: Return -EBUSY on TDCALL_OPERAND_BUSY error x86/tdx: Add tdx_mcall_extend_rtmr() interface tsm-mr: Add tsm-mr sample code tsm-mr: Add TVM Measurement Register support configfs-tsm-report: Fix NULL dereference of tsm_ops coco/guest: Move shared guest CC infrastructure to drivers/virt/coco/guest/ configfs-tsm: Namespace TSM report symbols |
||
![]() |
b78f1293f9 |
tracing updates for v6.16:
- Have module addresses get updated in the persistent ring buffer The addresses of the modules from the previous boot are saved in the persistent ring buffer. If the same modules are loaded and an address is in the old buffer points to an address that was both saved in the persistent ring buffer and is loaded in memory, shift the address to point to the address that is loaded in memory in the trace event. - Print function names for irqs off and preempt off callsites When ignoring the print fmt of a trace event and just printing the fields directly, have the fields for preempt off and irqs off events still show the function name (via kallsyms) instead of just showing the raw address. - Clean ups of the histogram code The histogram functions saved over 800 bytes on the stack to process events as they come in. Instead, create per-cpu buffers that can hold this information and have a separate location for each context level (thread, softirq, IRQ and NMI). Also add some more comments to the code. - Add "common_comm" field for histograms Add "common_comm" that uses the current->comm as a field in an event histogram and acts like any of the other fields of the event. - Show "subops" in the enabled_functions file When the function graph infrastructure is used, a subsystem has a "subops" that it attaches its callback function to. Instead of the enabled_functions just showing a function calling the function that calls the subops functions, also show the subops functions that will get called for that function too. - Add "copy_trace_marker" option to instances There are cases where an instance is created for tooling to write into, but the old tooling has the top level instance hardcoded into the application. New tools want to consume the data from an instance and not the top level buffer. By adding a copy_trace_marker option, whenever the top instance trace_marker is written into, a copy of it is also written into the instance with this option set. This allows new tools to read what old tools are writing into the top buffer. If this option is cleared by the top instance, then what is written into the trace_marker is not written into the top instance. This is a way to redirect the trace_marker writes into another instance. - Have tracepoints created by DECLARE_TRACE() use trace_<name>_tp() If a tracepoint is created by DECLARE_TRACE() instead of TRACE_EVENT(), then it will not be exposed via tracefs. Currently there's no way to differentiate in the kernel the tracepoint functions between those that are exposed via tracefs or not. A calling convention has been made manually to append a "_tp" prefix for events created by DECLARE_TRACE(). Instead of doing this manually, force it so that all DECLARE_TRACE() events have this notation. - Use __string() for task->comm in some sched events Instead of hardcoding the comm to be TASK_COMM_LEN in some of the scheduler events use __string() which makes it dynamic. Note, if these events are parsed by user space it they may break, and the event may have to be converted back to the hardcoded size. - Have function graph "depth" be unsigned to the user Internally to the kernel, the "depth" field of the function graph event is signed due to -1 being used for end of boundary. What actually gets recorded in the event itself is zero or positive. Reflect this to user space by showing "depth" as unsigned int and be consistent across all events. - Allow an arbitrary long CPU string to osnoise_cpus_write() The filtering of which CPUs to write to can exceed 256 bytes. If a machine has 256 CPUs, and the filter is to filter every other CPU, the write would take a string larger than 256 bytes. Instead of using a fixed size buffer on the stack that is 256 bytes, allocate it to handle what is passed in. - Stop having ftrace check the per-cpu data "disabled" flag The "disabled" flag in the data structure passed to most ftrace functions is checked to know if tracing has been disabled or not. This flag was added back in 2008 before the ring buffer had its own way to disable tracing. The "disable" flag is now not always set when needed, and the ring buffer flag should be used in all locations where the disabled is needed. Since the "disable" flag is redundant and incorrect, stop using it. Fix up some locations that use the "disable" flag to use the ring buffer info. - Use a new tracer_tracing_disable/enable() instead of data->disable flag There's a few cases that set the data->disable flag to stop tracing, but this flag is not consistently used. It is also an on/off switch where if a function set it and calls another function that sets it, the called function may incorrectly enable it. Use a new trace_tracing_disable() and tracer_tracing_enable() that uses a counter and can be nested. These use the ring buffer flags which are always checked making the disabling more consistent. - Save the trace clock in the persistent ring buffer Save what clock was used for tracing in the persistent ring buffer and set it back to that clock after a reboot. - Remove unused reference to a per CPU data pointer in mmiotrace functions - Remove unused buffer_page field from trace_array_cpu structure - Remove more strncpy() instances - Other minor clean ups and fixes -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaDhiqRQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qkheAQDpyRHoXF1AIoEqyahDax8f3vpZQeCH B/mn+YJmU1wuVgEA7AFALov5SHKv4IzoARz68GXtR0jGhP5D8uebUhUqDAQ= =WmFG -----END PGP SIGNATURE----- Merge tag 'trace-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Have module addresses get updated in the persistent ring buffer The addresses of the modules from the previous boot are saved in the persistent ring buffer. If the same modules are loaded and an address is in the old buffer points to an address that was both saved in the persistent ring buffer and is loaded in memory, shift the address to point to the address that is loaded in memory in the trace event. - Print function names for irqs off and preempt off callsites When ignoring the print fmt of a trace event and just printing the fields directly, have the fields for preempt off and irqs off events still show the function name (via kallsyms) instead of just showing the raw address. - Clean ups of the histogram code The histogram functions saved over 800 bytes on the stack to process events as they come in. Instead, create per-cpu buffers that can hold this information and have a separate location for each context level (thread, softirq, IRQ and NMI). Also add some more comments to the code. - Add "common_comm" field for histograms Add "common_comm" that uses the current->comm as a field in an event histogram and acts like any of the other fields of the event. - Show "subops" in the enabled_functions file When the function graph infrastructure is used, a subsystem has a "subops" that it attaches its callback function to. Instead of the enabled_functions just showing a function calling the function that calls the subops functions, also show the subops functions that will get called for that function too. - Add "copy_trace_marker" option to instances There are cases where an instance is created for tooling to write into, but the old tooling has the top level instance hardcoded into the application. New tools want to consume the data from an instance and not the top level buffer. By adding a copy_trace_marker option, whenever the top instance trace_marker is written into, a copy of it is also written into the instance with this option set. This allows new tools to read what old tools are writing into the top buffer. If this option is cleared by the top instance, then what is written into the trace_marker is not written into the top instance. This is a way to redirect the trace_marker writes into another instance. - Have tracepoints created by DECLARE_TRACE() use trace_<name>_tp() If a tracepoint is created by DECLARE_TRACE() instead of TRACE_EVENT(), then it will not be exposed via tracefs. Currently there's no way to differentiate in the kernel the tracepoint functions between those that are exposed via tracefs or not. A calling convention has been made manually to append a "_tp" prefix for events created by DECLARE_TRACE(). Instead of doing this manually, force it so that all DECLARE_TRACE() events have this notation. - Use __string() for task->comm in some sched events Instead of hardcoding the comm to be TASK_COMM_LEN in some of the scheduler events use __string() which makes it dynamic. Note, if these events are parsed by user space it they may break, and the event may have to be converted back to the hardcoded size. - Have function graph "depth" be unsigned to the user Internally to the kernel, the "depth" field of the function graph event is signed due to -1 being used for end of boundary. What actually gets recorded in the event itself is zero or positive. Reflect this to user space by showing "depth" as unsigned int and be consistent across all events. - Allow an arbitrary long CPU string to osnoise_cpus_write() The filtering of which CPUs to write to can exceed 256 bytes. If a machine has 256 CPUs, and the filter is to filter every other CPU, the write would take a string larger than 256 bytes. Instead of using a fixed size buffer on the stack that is 256 bytes, allocate it to handle what is passed in. - Stop having ftrace check the per-cpu data "disabled" flag The "disabled" flag in the data structure passed to most ftrace functions is checked to know if tracing has been disabled or not. This flag was added back in 2008 before the ring buffer had its own way to disable tracing. The "disable" flag is now not always set when needed, and the ring buffer flag should be used in all locations where the disabled is needed. Since the "disable" flag is redundant and incorrect, stop using it. Fix up some locations that use the "disable" flag to use the ring buffer info. - Use a new tracer_tracing_disable/enable() instead of data->disable flag There's a few cases that set the data->disable flag to stop tracing, but this flag is not consistently used. It is also an on/off switch where if a function set it and calls another function that sets it, the called function may incorrectly enable it. Use a new trace_tracing_disable() and tracer_tracing_enable() that uses a counter and can be nested. These use the ring buffer flags which are always checked making the disabling more consistent. - Save the trace clock in the persistent ring buffer Save what clock was used for tracing in the persistent ring buffer and set it back to that clock after a reboot. - Remove unused reference to a per CPU data pointer in mmiotrace functions - Remove unused buffer_page field from trace_array_cpu structure - Remove more strncpy() instances - Other minor clean ups and fixes * tag 'trace-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (36 commits) tracing: Fix compilation warning on arm32 tracing: Record trace_clock and recover when reboot tracing/sched: Use __string() instead of fixed lengths for task->comm tracepoint: Have tracepoints created with DECLARE_TRACE() have _tp suffix tracing: Cleanup upper_empty() in pid_list tracing: Allow the top level trace_marker to write into another instances tracing: Add a helper function to handle the dereference arg in verifier tracing: Remove unnecessary "goto out" that simply returns ret is trigger code tracing: Fix error handling in event_trigger_parse() tracing: Rename event_trigger_alloc() to trigger_data_alloc() tracing: Replace deprecated strncpy() with strscpy() for stack_trace_filter_buf tracing: Remove unused buffer_page field from trace_array_cpu structure tracing: Use atomic_inc_return() for updating "disabled" counter in irqsoff tracer tracing: Convert the per CPU "disabled" counter to local from atomic tracing: branch: Use trace_tracing_is_on_cpu() instead of "disabled" field ring-buffer: Add ring_buffer_record_is_on_cpu() tracing: Do not use per CPU array_buffer.data->disabled for cpumask ftrace: Do not disabled function graph based on "disabled" field tracing: kdb: Use tracer_tracing_on/off() instead of setting per CPU disabled tracing: Use tracer_tracing_disable() instead of "disabled" field for ftrace_dump_one() ... |
||
![]() |
1b98f357da |
Networking changes for 6.16.
Core ---- - Implement the Device Memory TCP transmit path, allowing zero-copy data transmission on top of TCP from e.g. GPU memory to the wire. - Move all the IPv6 routing tables management outside the RTNL scope, under its own lock and RCU. The route control path is now 3x times faster. - Convert queue related netlink ops to instance lock, reducing again the scope of the RTNL lock. This improves the control plane scalability. - Refactor the software crc32c implementation, removing unneeded abstraction layers and improving significantly the related micro-benchmarks. - Optimize the GRO engine for UDP-tunneled traffic, for a 10% performance improvement in related stream tests. - Cover more per-CPU storage with local nested BH locking; this is a prep work to remove the current per-CPU lock in local_bh_disable() on PREMPT_RT. - Introduce and use nlmsg_payload helper, combining buffer bounds verification with accessing payload carried by netlink messages. Netfilter --------- - Rewrite the procfs conntrack table implementation, improving considerably the dump performance. A lot of user-space tools still use this interface. - Implement support for wildcard netdevice in netdev basechain and flowtables. - Integrate conntrack information into nft trace infrastructure. - Export set count and backend name to userspace, for better introspection. BPF --- - BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops programs and can be controlled in similar way to traditional qdiscs using the "tc qdisc" command. - Refactor the UDP socket iterator, addressing long standing issues WRT duplicate hits or missed sockets. Protocols --------- - Improve TCP receive buffer auto-tuning and increase the default upper bound for the receive buffer; overall this improves the single flow maximum thoughput on 200Gbs link by over 60%. - Add AFS GSSAPI security class to AF_RXRPC; it provides transport security for connections to the AFS fileserver and VL server. - Improve TCP multipath routing, so that the sources address always matches the nexthop device. - Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS, and thus preventing DoS caused by passing around problematic FDs. - Retire DCCP socket. DCCP only receives updates for bugs, and major distros disable it by default. Its removal allows for better organisation of TCP fields to reduce the number of cache lines hit in the fast path. - Extend TCP drop-reason support to cover PAWS checks. Driver API ---------- - Reorganize PTP ioctl flag support to require an explicit opt-in for the drivers, avoiding the problem of drivers not rejecting new unsupported flags. - Converted several device drivers to timestamping APIs. - Introduce per-PHY ethtool dump helpers, improving the support for dump operations targeting PHYs. Tests and tooling ----------------- - Add support for classic netlink in user space C codegen, so that ynl-c can now read, create and modify links, routes addresses and qdisc layer configuration. - Add ynl sub-types for binary attributes, allowing ynl-c to output known struct instead of raw binary data, clarifying the classic netlink output. - Extend MPTCP selftests to improve the code-coverage. - Add tests for XDP tail adjustment in AF_XDP. New hardware / drivers ---------------------- - OpenVPN virtual driver: offload OpenVPN data channels processing to the kernel-space, increasing the data transfer throughput WRT the user-space implementation. - Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC. - Broadcom asp-v3.0 ethernet driver. - AMD Renoir ethernet device. - ReakTek MT9888 2.5G ethernet PHY driver. - Aeonsemi 10G C45 PHYs driver. Drivers ------- - Ethernet high-speed NICs: - nVidia/Mellanox (mlx5): - refactor the stearing table handling to reduce significantly the amount of memory used - add support for complex matches in H/W flow steering - improve flow streeing error handling - convert to netdev instance locking - Intel (100G, ice, igb, ixgbe, idpf): - ice: add switchdev support for LLDP traffic over VF - ixgbe: add firmware manipulation and regions devlink support - igb: introduce support for frame transmission premption - igb: adds persistent NAPI configuration - idpf: introduce RDMA support - idpf: add initial PTP support - Meta (fbnic): - extend hardware stats coverage - add devlink dev flash support - Broadcom (bnxt): - add support for RX-side device memory TCP - Wangxun (txgbe): - implement support for udp tunnel offload - complete PTP and SRIOV support for AML 25G/10G devices - Ethernet NICs embedded and virtual: - Google (gve): - add device memory TCP TX support - Amazon (ena): - support persistent per-NAPI config - Airoha: - add H/W support for L2 traffic offload - add per flow stats for flow offloading - RealTek (rtl8211): add support for WoL magic packet - Synopsys (stmmac): - dwmac-socfpga 1000BaseX support - add Loongson-2K3000 support - introduce support for hardware-accelerated VLAN stripping - Broadcom (bcmgenet): - expose more H/W stats - Freescale (enetc, dpaa2-eth): - enetc: add MAC filter, VLAN filter RSS and loopback support - dpaa2-eth: convert to H/W timestamping APIs - vxlan: convert FDB table to rhashtable, for better scalabilty - veth: apply qdisc backpressure on full ring to reduce TX drops - Ethernet switches: - Microchip (kzZ88x3): add ETS scheduler support - Ethernet PHYs: - RealTek (rtl8211): - add support for WoL magic packet - add support for PHY LEDs - CAN: - Adds RZ/G3E CANFD support to the rcar_canfd driver. - Preparatory work for CAN-XL support. - Add self-tests framework with support for CAN physical interfaces. - WiFi: - mac80211: - scan improvements with multi-link operation (MLO) - Qualcomm (ath12k): - enable AHB support for IPQ5332 - add monitor interface support to QCN9274 - add multi-link operation support to WCN7850 - add 802.11d scan offload support to WCN7850 - monitor mode for WCN7850, better 6 GHz regulatory - Qualcomm (ath11k): - restore hibernation support - MediaTek (mt76): - WiFi-7 improvements - implement support for mt7990 - Intel (iwlwifi): - enhanced multi-link single-radio (EMLSR) support on 5 GHz links - rework device configuration - RealTek (rtw88): - improve throughput for RTL8814AU - RealTek (rtw89): - add multi-link operation support - STA/P2P concurrency improvements - support different SAR configs by antenna - Bluetooth: - introduce HCI Driver protocol - btintel_pcie: do not generate coredump for diagnostic events - btusb: add HCI Drv commands for configuring altsetting - btusb: add RTL8851BE device 0x0bda:0xb850 - btusb: add new VID/PID 13d3/3584 for MT7922 - btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925 - btnxpuart: implement host-wakeup feature Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmg3D64SHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkcIsQAK2eEc+BxQer975wzvtMg6gF9eoex4a+ rZ7jxfDzDtNvTauoQsrpehDZp0FnySaVGCU36lHGB2OvDnhCpPc5hXzKDWQpOuqQ SHrGG3/6FTbdTG/HfHUcbNyrUzIf53SADSObiQ3qg4gyEQ3sCpcOKtVtMcU8rvsY /HqMnsJWFaROUMjMtCcnUSgjmeY9kBvha3sTXUqgeRugEOCvZD7z4rpqFIcQqHw7 e2Fi8dwIXEYNxqPp6MRq2qdyUTewCRruE8ZIMAFuhtfYeMElUZMPlqlMENX3AzTQ cr0EgwcFOUxRA7oZRxhoBNBsVXavtSpQr4ZDoWplxP4aQ37n5tc1E9Q72axpB/Og FbJRl6GvWYnCd8071BczgmfHlKaTAigPvt2Z4r6JjM5I/Bij/IZ3k+On1OTuOAj/ EqfFkdZ0a5cfKrwUMP+oSGtSAywkMVUtnIKJlZeRbjSj2432sCfe2jVAlS8ELM43 3LUgXYrAKtA87g171LlsRu5EEpI5QmqPb+i5LpPlEXe2TJEgPisyfecJ3NafF/2+ j575lm+TFNm9NTNhGGjDPEvw0djI5wSGGMe9J4gC74eWi6s5t6C4cuUf84TKWdwR x+9H0IB7rfFncAwXHJuUUtzd+fPHaYzs5dDGbSgMQOXr1cr1wlubCK8mQ1r/Wt/a 3GjFIOQKW2Q5 =t/Tz -----END PGP SIGNATURE----- Merge tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Implement the Device Memory TCP transmit path, allowing zero-copy data transmission on top of TCP from e.g. GPU memory to the wire. - Move all the IPv6 routing tables management outside the RTNL scope, under its own lock and RCU. The route control path is now 3x times faster. - Convert queue related netlink ops to instance lock, reducing again the scope of the RTNL lock. This improves the control plane scalability. - Refactor the software crc32c implementation, removing unneeded abstraction layers and improving significantly the related micro-benchmarks. - Optimize the GRO engine for UDP-tunneled traffic, for a 10% performance improvement in related stream tests. - Cover more per-CPU storage with local nested BH locking; this is a prep work to remove the current per-CPU lock in local_bh_disable() on PREMPT_RT. - Introduce and use nlmsg_payload helper, combining buffer bounds verification with accessing payload carried by netlink messages. Netfilter: - Rewrite the procfs conntrack table implementation, improving considerably the dump performance. A lot of user-space tools still use this interface. - Implement support for wildcard netdevice in netdev basechain and flowtables. - Integrate conntrack information into nft trace infrastructure. - Export set count and backend name to userspace, for better introspection. BPF: - BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops programs and can be controlled in similar way to traditional qdiscs using the "tc qdisc" command. - Refactor the UDP socket iterator, addressing long standing issues WRT duplicate hits or missed sockets. Protocols: - Improve TCP receive buffer auto-tuning and increase the default upper bound for the receive buffer; overall this improves the single flow maximum thoughput on 200Gbs link by over 60%. - Add AFS GSSAPI security class to AF_RXRPC; it provides transport security for connections to the AFS fileserver and VL server. - Improve TCP multipath routing, so that the sources address always matches the nexthop device. - Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS, and thus preventing DoS caused by passing around problematic FDs. - Retire DCCP socket. DCCP only receives updates for bugs, and major distros disable it by default. Its removal allows for better organisation of TCP fields to reduce the number of cache lines hit in the fast path. - Extend TCP drop-reason support to cover PAWS checks. Driver API: - Reorganize PTP ioctl flag support to require an explicit opt-in for the drivers, avoiding the problem of drivers not rejecting new unsupported flags. - Converted several device drivers to timestamping APIs. - Introduce per-PHY ethtool dump helpers, improving the support for dump operations targeting PHYs. Tests and tooling: - Add support for classic netlink in user space C codegen, so that ynl-c can now read, create and modify links, routes addresses and qdisc layer configuration. - Add ynl sub-types for binary attributes, allowing ynl-c to output known struct instead of raw binary data, clarifying the classic netlink output. - Extend MPTCP selftests to improve the code-coverage. - Add tests for XDP tail adjustment in AF_XDP. New hardware / drivers: - OpenVPN virtual driver: offload OpenVPN data channels processing to the kernel-space, increasing the data transfer throughput WRT the user-space implementation. - Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC. - Broadcom asp-v3.0 ethernet driver. - AMD Renoir ethernet device. - ReakTek MT9888 2.5G ethernet PHY driver. - Aeonsemi 10G C45 PHYs driver. Drivers: - Ethernet high-speed NICs: - nVidia/Mellanox (mlx5): - refactor the steering table handling to significantly reduce the amount of memory used - add support for complex matches in H/W flow steering - improve flow streeing error handling - convert to netdev instance locking - Intel (100G, ice, igb, ixgbe, idpf): - ice: add switchdev support for LLDP traffic over VF - ixgbe: add firmware manipulation and regions devlink support - igb: introduce support for frame transmission premption - igb: adds persistent NAPI configuration - idpf: introduce RDMA support - idpf: add initial PTP support - Meta (fbnic): - extend hardware stats coverage - add devlink dev flash support - Broadcom (bnxt): - add support for RX-side device memory TCP - Wangxun (txgbe): - implement support for udp tunnel offload - complete PTP and SRIOV support for AML 25G/10G devices - Ethernet NICs embedded and virtual: - Google (gve): - add device memory TCP TX support - Amazon (ena): - support persistent per-NAPI config - Airoha: - add H/W support for L2 traffic offload - add per flow stats for flow offloading - RealTek (rtl8211): add support for WoL magic packet - Synopsys (stmmac): - dwmac-socfpga 1000BaseX support - add Loongson-2K3000 support - introduce support for hardware-accelerated VLAN stripping - Broadcom (bcmgenet): - expose more H/W stats - Freescale (enetc, dpaa2-eth): - enetc: add MAC filter, VLAN filter RSS and loopback support - dpaa2-eth: convert to H/W timestamping APIs - vxlan: convert FDB table to rhashtable, for better scalabilty - veth: apply qdisc backpressure on full ring to reduce TX drops - Ethernet switches: - Microchip (kzZ88x3): add ETS scheduler support - Ethernet PHYs: - RealTek (rtl8211): - add support for WoL magic packet - add support for PHY LEDs - CAN: - Adds RZ/G3E CANFD support to the rcar_canfd driver. - Preparatory work for CAN-XL support. - Add self-tests framework with support for CAN physical interfaces. - WiFi: - mac80211: - scan improvements with multi-link operation (MLO) - Qualcomm (ath12k): - enable AHB support for IPQ5332 - add monitor interface support to QCN9274 - add multi-link operation support to WCN7850 - add 802.11d scan offload support to WCN7850 - monitor mode for WCN7850, better 6 GHz regulatory - Qualcomm (ath11k): - restore hibernation support - MediaTek (mt76): - WiFi-7 improvements - implement support for mt7990 - Intel (iwlwifi): - enhanced multi-link single-radio (EMLSR) support on 5 GHz links - rework device configuration - RealTek (rtw88): - improve throughput for RTL8814AU - RealTek (rtw89): - add multi-link operation support - STA/P2P concurrency improvements - support different SAR configs by antenna - Bluetooth: - introduce HCI Driver protocol - btintel_pcie: do not generate coredump for diagnostic events - btusb: add HCI Drv commands for configuring altsetting - btusb: add RTL8851BE device 0x0bda:0xb850 - btusb: add new VID/PID 13d3/3584 for MT7922 - btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925 - btnxpuart: implement host-wakeup feature" * tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits) selftests/bpf: Fix bpf selftest build warning selftests: netfilter: Fix skip of wildcard interface test net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames net: openvswitch: Fix the dead loop of MPLS parse calipso: Don't call calipso functions for AF_INET sk. selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem net_sched: hfsc: Address reentrant enqueue adding class to eltree twice octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback octeontx2-pf: QOS: Perform cache sync on send queue teardown net: mana: Add support for Multi Vports on Bare metal net: devmem: ncdevmem: remove unused variable net: devmem: ksft: upgrade rx test to send 1K data net: devmem: ksft: add 5 tuple FS support net: devmem: ksft: add exit_wait to make rx test pass net: devmem: ksft: add ipv4 support net: devmem: preserve sockc_err page_pool: fix ugly page_pool formatting net: devmem: move list_add to net_devmem_bind_dmabuf. selftests: netfilter: nft_queue.sh: include file transfer duration in log message net: phy: mscc: Fix memory leak when using one step timestamping ... |
||
![]() |
2c26b68cd5 |
NFSD 6.16 Release Notes
The marquee feature for this release is that the limit on the maximum rsize and wsize has been raised to 4MB. The default remains at 1MB, but risk-seeking administrators now have the ability to try larger I/O sizes with NFS clients that support them. Eventually the default setting will be increased when we have confidence that this change will not have negative impact. With v6.16, NFSD now has its own debugfs file system where we can add experimental features and make them available outside of our development community without impacting production deployments. The first experimental setting added is one that makes all NFS READ operations use vfs_iter_read() instead of the NFSD splice actor. The plan is to eventually retire the splice actor, as that will enable a number of new capabilities such as the use of struct bio_vec from the top to the bottom of the NFSD stack. Jeff Layton contributed a number of observability improvements. The use of dprintk() in a number of high-traffic code paths has been replaced with static trace points. This release sees the continuation of efforts to harden the NFSv4.2 COPY operation. Soon, the restriction on async COPY operations can be lifted. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.16 development cycle. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmg1yGUACgkQM2qzM29m f5frMA//TJTbSWiM7qBX1GhVMNr1lxQcjU4BPKo0qZfEtwV06F2BB9mWgDU+BIQh AcGfMZUmNWAnhOTOYvwqyW6dnX+1yt8sBsCZ/1ctY30A4JH4AgG5sdZS7BUrlEEr bGDMUCaPnvQ3maeDjMlefe7Xv/rUhj9TVhXmDkt4vf/jCde2JODTB/z8n7WeAxYJ eOvmr/n5z6VI5Q67M7b5/xqofBEaEoq9P5UEgn61ThfeR0bMlrklm/avDCbbNIH8 6n7Z3tjzllK1CAjEmwHalq4LRbMX5FHWzNkyJw+wtviXS18J5vCAvRe+JDoykusu L2bgXT8bBUqy46eO4WKEOJtEqVQhIsRFx/8ku1iTLrpDWlwrR4mHVyObEDkkdlMX EyBQ4svg2OxCXSyy5O8oggzU0TWVJStIjbIEHbJYusWLU7HxxFveBwqwzYHXLtip WKm6N2ANqQi1du+Pc6xmgXo9svA5Vk+DQjljm1Y5up9dhi2K9cvCIHjwFsZ+E0VL XqXJ2YgIQb3oXK7FttzLOiDrpX1OX82sTIbgdcPcfT7lP+ej7uiHMBPmdPwgaZIU EbIp0ThoTkh8/VRMDcWIt+B6SEhmb5vY3Zgz9Lcf2J0PM1fuYJ67L7xGTviFX7Ci DpohiCgceb6PHYeIuarayF86tPJGF8Vb7XvQZej2Ybv8QdxLFg8= =FbeG -----END PGP SIGNATURE----- Merge tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: "The marquee feature for this release is that the limit on the maximum rsize and wsize has been raised to 4MB. The default remains at 1MB, but risk-seeking administrators now have the ability to try larger I/O sizes with NFS clients that support them. Eventually the default setting will be increased when we have confidence that this change will not have negative impact. With v6.16, NFSD now has its own debugfs file system where we can add experimental features and make them available outside of our development community without impacting production deployments. The first experimental setting added is one that makes all NFS READ operations use vfs_iter_read() instead of the NFSD splice actor. The plan is to eventually retire the splice actor, as that will enable a number of new capabilities such as the use of struct bio_vec from the top to the bottom of the NFSD stack. Jeff Layton contributed a number of observability improvements. The use of dprintk() in a number of high-traffic code paths has been replaced with static trace points. This release sees the continuation of efforts to harden the NFSv4.2 COPY operation. Soon, the restriction on async COPY operations can be lifted. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.16 development cycle" * tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (60 commits) xdrgen: Fix code generated for counted arrays SUNRPC: Bump the maximum payload size for the server NFSD: Add a "default" block size NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macro NFSD: Remove NFSD_BUFSIZE sunrpc: Remove the RPCSVC_MAXPAGES macro svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages sunrpc: Adjust size of socket's receive page array dynamically SUNRPC: Remove svc_rqst :: rq_vec SUNRPC: Remove svc_fill_write_vector() NFSD: Use rqstp->rq_bvec in nfsd_iter_write() SUNRPC: Export xdr_buf_to_bvec() NFSD: De-duplicate the svc_fill_write_vector() call sites NFSD: Use rqstp->rq_bvec in nfsd_iter_read() sunrpc: Replace the rq_bvec array with dynamically-allocated memory sunrpc: Replace the rq_pages array with dynamically-allocated memory sunrpc: Remove backchannel check in svc_init_buffer() sunrpc: Add a helper to derive maxpages from sv_max_mesg svcrdma: Reduce the number of rdma_rw contexts per-QP ... |
||
![]() |
3b66e6b3c0 |
cgroup: Changes for v6.16
- cgroup rstat shared the tracking tree across all controlers with the rationale being that a cgroup which is using one resource is likely to be using other resources at the same time (ie. if something is allocating memory, it's probably consuming CPU cycles). However, this turned out to not scale very well especially with memcg using rstat for internal operations which made memcg stat read and flush patterns substantially different from other controllers. JP Kobryn split the rstat tree per controller. - cgroup BPF support was hooking into cgroup init/exit paths directly. Convert them to use a notifier chain instead so that other usages can be added easily. The two of the patches which implement this are mislabeled as belonging to sched_ext instead of cgroup. Sorry. - Relatively minor cpuset updates. - Documentation updates. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaDYUmA4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGRhbAP90v8QwUkWEKGQSam8JY3by7PvrW6pV5ot+BGuM 4xu3BAEAjsJ9FdiwYLwKYqG7y59xhhBFOo6GpcP52kPp3znl+QQ= =6MIT -----END PGP SIGNATURE----- Merge tag 'cgroup-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - cgroup rstat shared the tracking tree across all controllers with the rationale being that a cgroup which is using one resource is likely to be using other resources at the same time (ie. if something is allocating memory, it's probably consuming CPU cycles). However, this turned out to not scale very well especially with memcg using rstat for internal operations which made memcg stat read and flush patterns substantially different from other controllers. JP Kobryn split the rstat tree per controller. - cgroup BPF support was hooking into cgroup init/exit paths directly. Convert them to use a notifier chain instead so that other usages can be added easily. The two of the patches which implement this are mislabeled as belonging to sched_ext instead of cgroup. Sorry. - Relatively minor cpuset updates - Documentation updates * tag 'cgroup-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (23 commits) sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier sched_ext: Introduce cgroup_lifetime_notifier cgroup: Minor reorganization of cgroup_create() cgroup, docs: cpu controller's interaction with various scheduling policies cgroup, docs: convert space indentation to tab indentation cgroup: avoid per-cpu allocation of size zero rstat cpu locks cgroup, docs: be specific about bandwidth control of rt processes cgroup: document the rstat per-cpu initialization cgroup: helper for checking rstat participation of css cgroup: use subsystem-specific rstat locks to avoid contention cgroup: use separate rstat trees for each subsystem cgroup: compare css to cgroup::self in helper for distingushing css cgroup: warn on rstat usage by early init subsystems cgroup/cpuset: drop useless cpumask_empty() in compute_effective_exclusive_cpumask() cgroup/rstat: Improve cgroup_rstat_push_children() documentation cgroup: fix goto ordering in cgroup_init() cgroup: fix pointer check in css_rstat_init() cgroup/cpuset: Add warnings to catch inconsistency in exclusive CPUs cgroup/cpuset: Fix obsolete comment in cpuset_css_offline() cgroup/cpuset: Always use cpu_active_mask ... |
||
![]() |
0aee061726 |
Move the x86 page fault tracepoints to generic code, because
other architectures would like to make use of them as well. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy+RARHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jTHA//eIBOFKJdxmhpJ95kzA0tRXue+FUSTAX+ j9rMZOJpR9hnVkr0pBxH8bU42lji4+6b2vujMHaT59n5i2kH5tPFHW1xfEnpbVNw thSRsFxrUKsNnKPBju0vK9WQs9e1cn2ZvVBbh2SHrATKQrcTCmJroEERZDX0cdnn VrPeGoc7UUAjxE23c3vnZOzAJDapIc9zPAdfVGRa7xHqlq5grryG+SfHFzT/fd08 5Qwu8TN37jo1HU5v2I4RYIh4Alc1lXtWTfJAc0bks0Cpryu+Et9+N2XANu/VatVw cve/Ubwdou9m0QxQtUTULttEbMSBB8Ylc7DJ1PdGkhULxNM8cCb+Yx9C8Gk0+8Rf SP8/ZSVK8EE+3ETP+J8r8VXoXrNgTPSjMeI1s4rZD/b9QpRKE4g/Khu+R9UA8JBV yuYdy2xkeRbfFVzoGDSVnZItk18MuAoq4hSNqgAxl9/S33HWG84KHQAnjzixCqb4 9Ai7n3/FBEe1edLJXKoqWK96mTa5P/vpGjMnL8wQ0rAnSYI+V2OSwPpZ9HHviw3g qYYMqsmiU6ChbfcUnuub/YwdJFdRieVSOa7wh3H6mfKAuakpS0At8fIyD5mBtFtA /qeSD9INII/guT1gdTgqGsirXeObbmNpC+HJjz8hRvsoP6hdoT2L/UZsUH89LcDl qd8MKeV1Kew= =xi0h -----END PGP SIGNATURE----- Merge tag 'x86-debug-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 debug updates from Ingo Molnar: "Move the x86 page fault tracepoints to generic code, because other architectures would like to make use of them as well" * tag 'x86-debug-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tracing, x86/mm: Move page fault tracepoints to generic x86/tracing, x86/mm: Remove redundant trace_pagefault_key |
||
![]() |
eaed94d1f6 |
Scheduler updates for v6.16:
Core & fair scheduler changes: - Tweak wait_task_inactive() to force dequeue sched_delayed tasks (John Stultz) - Adhere to place_entity() constraints (Peter Zijlstra) - Allow decaying util_est when util_avg > CPU capacity (Pierre Gondois) - Fix up wake_up_sync() vs DELAYED_DEQUEUE (Xuewen Yan) Energy management: - Introduce sched_update_asym_prefer_cpu() (K Prateek Nayak) - cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings change (K Prateek Nayak) - Align uclamp and util_est and call before freq update (Xuewen Yan) CPU isolation: - Make use of more than one housekeeping CPU (Phil Auld) RT scheduler: - Fix race in push_rt_task() (Harshit Agarwal) - Add kernel cmdline option for rt_group_sched (Michal Koutný) Scheduler topology support: - Improve topology_span_sane speed (Steve Wahl) Scheduler debugging: - Move and extend the sched_process_exit() tracepoint (Andrii Nakryiko) - Add RT_GROUP WARN checks for non-root task_groups (Michal Koutný) - Fix trace_sched_switch(.prev_state) (Peter Zijlstra) - Untangle cond_resched() and live-patching (Peter Zijlstra) Fixes and cleanups: - Misc fixes and cleanups (K Prateek Nayak, Michal Koutný, Peter Zijlstra, Xuewen Yan) Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy50ARHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jFQQ/+KXl2XDg1V/VVmMG8GmtDlR29V3M3ricy D7/2s0D1Y1ErHb+pRMBG31EubT9/bXjUshWIuuf51DciSLBmpELHxY5J+AevRa0L /pHFwSvP6H5pDakI/xZ01FlYt7PxZGs+1m1o2615Mbwq6J2bjZTan54CYzrdpLOy Nqb3OT4tSqU1+7SV7hVForBpZp9u3CvVBRt/wE6vcHltW/I486bM8OCOd2XrUlnb QoIRliGI9KHpqCpbAeKPRSKXpf9tZv/AijZ+0WUu2yY8iwSN4p3RbbbwdCipjVQj w5I5oqKI6cylFfl2dEFWXVO+tLBihs06w8KSQrhYmQ9DUu4RGBVM9ORINGDBPejL bvoQh1mAkqvIL+oodujdbMDIqLupvOEtVSvwzR7SJn8BJSB00js88ngCWLjo/CcU imLbWy9FSBLvOswLBzQthgAJEj+ejCkOIbcvM2lINWhX/zNsMFaaqYcO1wRunGGR SavTI1s+ZksCQY6vCwRkwPrOZjyg91TA/q4FK102fHL1IcthH6xubE4yi4lTIUYs L56HuGm8e7Shc8M2Y5rAYsVG3GoIHFLXnptOn2HnCRWaAAJYsBaLUlzoBy9MxCfw I2YVDCylkQxevosSi2XxXo3tbM6auISU9SelAT/dAz32V1rsjWQojRJXeGYKIbu7 KBuN/dLItW0= =s/ra -----END PGP SIGNATURE----- Merge tag 'sched-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core & fair scheduler changes: - Tweak wait_task_inactive() to force dequeue sched_delayed tasks (John Stultz) - Adhere to place_entity() constraints (Peter Zijlstra) - Allow decaying util_est when util_avg > CPU capacity (Pierre Gondois) - Fix up wake_up_sync() vs DELAYED_DEQUEUE (Xuewen Yan) Energy management: - Introduce sched_update_asym_prefer_cpu() (K Prateek Nayak) - cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings change (K Prateek Nayak) - Align uclamp and util_est and call before freq update (Xuewen Yan) CPU isolation: - Make use of more than one housekeeping CPU (Phil Auld) RT scheduler: - Fix race in push_rt_task() (Harshit Agarwal) - Add kernel cmdline option for rt_group_sched (Michal Koutný) Scheduler topology support: - Improve topology_span_sane speed (Steve Wahl) Scheduler debugging: - Move and extend the sched_process_exit() tracepoint (Andrii Nakryiko) - Add RT_GROUP WARN checks for non-root task_groups (Michal Koutný) - Fix trace_sched_switch(.prev_state) (Peter Zijlstra) - Untangle cond_resched() and live-patching (Peter Zijlstra) Fixes and cleanups: - Misc fixes and cleanups (K Prateek Nayak, Michal Koutný, Peter Zijlstra, Xuewen Yan)" * tag 'sched-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) sched/uclamp: Align uclamp and util_est and call before freq update sched/util_est: Simplify condition for util_est_{en,de}queue() sched/fair: Fixup wake_up_sync() vs DELAYED_DEQUEUE sched,livepatch: Untangle cond_resched() and live-patching sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks sched/fair: Adhere to place_entity() constraints sched/debug: Print the local group's asym_prefer_cpu cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings change sched/topology: Introduce sched_update_asym_prefer_cpu() sched/fair: Use READ_ONCE() to read sg->asym_prefer_cpu sched/isolation: Make use of more than one housekeeping cpu sched/rt: Fix race in push_rt_task sched: Add annotations to RT_GROUP_SCHED fields sched: Add RT_GROUP WARN checks for non-root task_groups sched: Do not construct nor expose RT_GROUP_SCHED structures if disabled sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED sched: Skip non-root task_groups with disabled RT_GROUP_SCHED sched: Add commadline option for RT_GROUP_SCHED toggling sched: Always initialize rt_rq's task_group sched: Remove unneeed macro wrap ... |
||
![]() |
79b98edf91 |
Changes since last update:
- Add a `fsoffset` mount option to specify the filesystem offset; - Support Intel QAT accelerators to boost up the DEFLATE algorithm; - Initialize per-CPU workers and CPU hotplug hooks lazily to avoid unnecessary overhead when EROFS is not mounted; - Fix file handle encoding for 64-bit NIDs; - Minor cleanups. -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmgz6IQRHHhpYW5nQGtl cm5lbC5vcmcACgkQUXZn5Zlu5qrp8w//V8rQpo/jQwUXP2xZDWUGe/iS5APQU/w+ IQRn8LRt1RYLD1ssShW60y5mc40pa/PktxLlddIfcDDFfhAv4zYEK7Iosrd5FeGX vDawKcFvjzozpveqtjWR63QPO0Ff/ldSsnl9FsdQopffWNFw+X7D+/4fgUJah+CF p5jnyp6D7RvNMHdLIjQjiqvvmmAdllqb+nbyLy0jGQkzjIGR2RdJtqrM5gdsE/B1 zKQRzs6NwYaBQ2MO6XmLAd2P0603RBGplR9OyLEpfFmUHX877pUxuGLQW2o+NbRY TodevQdzSJPlvHNrO0T+ztistwRhKGkCmyrP7+Vl4ackgRmA5ozT23CUxFX2hwQM GhE24aXyqO/vIA/RCsy+Tb8vxVY3ysNd4fz001HtWq0tOqLVyFkVEhvaZwLGqi1A PAV6WHqtYo/gjc8nrvq88GMGTUH0orIwlJpS9YQHhStzexyePDjl3cgQlmS0Q8J3 JHtf8S+pnaModsvqKJJ9LQW0bHrbry9Bfo0M6yQ5sirehcrqGeDFZ0m+ny16Ki9N bv8Mx811KNtAVoeuwAidH2NqUxnz1/faiIs0yYE/2Vg2QfuEKjVXbpkDo2wfQj1i TVsQ9gPJB9mZpvnuaGYGdgzxN/lQAIo3JxWAHvHhMz/1suike97vqKms4W4lSoBY JPbJjs/4uUA= =+2IX -----END PGP SIGNATURE----- Merge tag 'erofs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "In this cycle, Intel QAT hardware accelerators are supported to improve DEFLATE decompression performance. I've tested it with the enwik9 dataset of 1 MiB pclusters on our Intel Sapphire Rapids bare-metal server and a PL0 ESSD, and the sequential read performance even surpasses LZ4 software decompression on this setup. In addition, a `fsoffset` mount option is introduced for file-backed mounts to specify the filesystem offset in order to adapt customized container formats. And other improvements and minor cleanups. Summary: - Add a `fsoffset` mount option to specify the filesystem offset - Support Intel QAT accelerators to boost up the DEFLATE algorithm - Initialize per-CPU workers and CPU hotplug hooks lazily to avoid unnecessary overhead when EROFS is not mounted - Fix file handle encoding for 64-bit NIDs - Minor cleanups" * tag 'erofs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: support DEFLATE decompression by using Intel QAT erofs: clean up erofs_{init,exit}_sysfs() erofs: add 'fsoffset' mount option to specify filesystem offset erofs: lazily initialize per-CPU workers and CPU hotplug hooks erofs: refine readahead tracepoint erofs: avoid using multiple devices with different type erofs: fix file handle encoding for 64-bit NIDs |
||
![]() |
5e82ed5ca4 |
for-6.16-tag
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmgtuJgACgkQxWXV+ddt WDt79g//YndozUasOP0raqNVvod4wYvmG/CX1yHOkFQpfRQSVG4av0KlTWnupXKG oEQvFbZ639tmXbBYlKlK8Ts8fy1dpj+2iG4ValukA4L7xkY8ML5DrGQfKYbPEm2i Ab9lp4qnZZutYVH2/5UGQqkEUA3/YIiOZ0hsZWir//zbkTCL9cuHwl2FUYbmFlHi Hxkd30QC0kZuxINdMxXGauF4JkFJFyiNnmI5dMjj07xMMWk1cv8vunoZ3LVjAlbW gX16+4rUmtJl33HbYqofee4Dcovvcuvt/fEM1LX0rGbKXOnKA2dQPoMQsjMAV82B mjhma5T709MgVHQiDdJduh86seaul4Cuv/E/OqoDj7Kfkoew/YquHEfU4TB4bvCX KmONEyJFd9QDq5CUyvfow7HENja6QbU31Fw6akrbfpsVcla0MKAUWPi+Vqpqf+pe qIWNcovorD2g/EVJV6y+w0K+kXTarPtXXmVnJnJPYtOkBWpARI3Y8wVxDCKX8Nfo 7Kpi/h9K87+d9opjjEajydNONDL9GQa4AY4u/oeiwcSuJHvCt/rsKKwHZRyycRiI q+nGwsNcmY/ih/EVUzLgYomGG08H9nOcKvZOQkfHpOTI1EgvILAeV9SpGMex7du1 PiPqVtv9Z60dKy6OValh7ttMpt7LszAK4Dk7XiyHrN1Q3sYDyrs= =bDOD -----END PGP SIGNATURE----- Merge tag 'for-6.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Apart from numerous cleanups, there are some performance improvements and one minor mount option update. There's one more radix-tree conversion (one remaining), and continued work towards enabling large folios (almost finished). Performance: - extent buffer conversion to xarray gains throughput and runtime improvements on metadata heavy operations doing writeback (sample test shows +50% throughput, -33% runtime) - extent io tree cleanups lead to performance improvements by avoiding unnecessary searches or repeated searches - more efficient extent unpinning when committing transaction (estimated run time improvement 3-5%) User visible changes: - remove standalone mount option 'nologreplay', deprecated in 5.9, replacement is 'rescue=nologreplay' - in scrub, update reporting, add back device stats message after detected errors (accidentally removed during recent refactoring) Core: - convert extent buffer radix tree to xarray - in subpage mode, move block perfect compression out of experimental build - in zoned mode, introduce sub block groups to allow managing special block groups, like the one for relocation or tree-log, to handle some corner cases of ENOSPC - in scrub, simplify bitmaps for block tracking status - continued preparations for large folios: - remove assertions for folio order 0 - add support where missing: compression, buffered write, defrag, hole punching, subpage, send - fix fsync of files with no hard links not persisting deletion - reject tree blocks which are not nodesize aligned, a precaution from 4.9 times - move transaction abort calls closer to the error sites - remove usage of some struct bio_vec internals - simplifications in extent map - extent IO cleanups and optimizations - error handling improvements - enhanced ASSERT() macro with optional format strings - cleanups: - remove unused code - naming unifications, dropped __, added prefix - merge similar functions - use common helpers for various data structures" * tag 'for-6.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (198 commits) btrfs: move misplaced comment of btrfs_path::keep_locks btrfs: remove standalone "nologreplay" mount option btrfs: use a single variable to track return value at btrfs_page_mkwrite() btrfs: don't return VM_FAULT_SIGBUS on failure to set delalloc for mmap write btrfs: simplify early error checking in btrfs_page_mkwrite() btrfs: pass true to btrfs_delalloc_release_space() at btrfs_page_mkwrite() btrfs: fix wrong start offset for delalloc space release during mmap write btrfs: fix harmless race getting delayed ref head count when running delayed refs btrfs: log error codes during failures when writing super blocks btrfs: simplify error return logic when getting folio at prepare_one_folio() btrfs: return real error from __filemap_get_folio() calls btrfs: remove superfluous return value check at btrfs_dio_iomap_begin() btrfs: fix invalid data space release when truncating block in NOCOW mode btrfs: update Kconfig option descriptions btrfs: update list of features built under experimental config btrfs: send: remove btrfs_debug() calls btrfs: use boolean for delalloc argument to btrfs_free_reserved_extent() btrfs: use boolean for delalloc argument to btrfs_free_reserved_bytes() btrfs: fold error checks when allocating ordered extent and update comments btrfs: check we grabbed inode reference when allocating an ordered extent ... |
||
![]() |
49fffac983 |
for-6.16/io_uring-20250523
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmgwnDgQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpgHZEADA1ym0ihHRjU2kTlXXOdkLLOl+o1RCHUjr KNf6sELGgyDC5FL/hAWdsjonInY4MLbJW0eNHEuuK8iFcn3wSHuHPXhRJXx/4cOs GGVLTd+Jm8ih4UL/GeLrBe3ehW9UUOtz1TCYzho0bdXHQWjruCFTqB5OzPQFMGQW R/lwXVNfjgGno5JhBnsrwz3ZnAfAnJhxqmc0GFHaa/nVF1OREYW/HS75EPFNiFgp Aevilw5QyrA2gDlZ+zCUwaGKAEl32yZCI6LZpI4kMtPK1reEbgFTrzIaCZ/OZCYM DVdBVEeuOmcBYIKbitD/+fcLNXHMrSJSWvUSXR4GuRNVkCTIAcEMKM2bX8VY7gmJ 7ZQIo0EL2mSwmewHIYnvf9w/qrNYR0NyUt2v4U4rA2wj6e5w1EYMriP94wKdBGvD RNxja429N3fg3aBIkdQ6iYSVJRgE7DCo7dnKrEqglZPb32LOiNoOoou9shI5tb25 8X7u0HzbpwKY/XByXZ2IaX7PYK2iFqkJjFYlGehtF97W85LGEvkDFU6fcBdjBO8r umgeE5O+lR+cf68JTJ6P34A7bBg71AXO3ytIuWunG56/0yu/FHDCjhBWE5ZjEhGR u2YhAGPRDQsJlSlxx8TXoKyYWP55NqdeyxYrmku/fZLn5WNVXOFeRlUDAZsF7mU7 nuiOt9j4WA== =k8SF -----END PGP SIGNATURE----- Merge tag 'for-6.16/io_uring-20250523' of git://git.kernel.dk/linux Pull io_uring updates from Jens Axboe: - Avoid indirect function calls in io-wq for executing and freeing work. The design of io-wq is such that it can be a generic mechanism, but as it's just used by io_uring now, may as well avoid these indirect calls - Clean up registered buffers for networking - Add support for IORING_OP_PIPE. Pretty straight forward, allows creating pipes with io_uring, particularly useful for having these be instantiated as direct descriptors - Clean up the coalescing support fore registered buffers - Add support for multiple interface queues for zero-copy rx networking. As this feature was merged for 6.15 it supported just a single ifq per ring - Clean up the eventfd support - Add dma-buf support to zero-copy rx - Clean up and improving the request draining support - Clean up provided buffer support, most notably with an eye toward making the legacy support less intrusive - Minor fdinfo cleanups, dropping support for dumping what credentials are registered - Improve support for overflow CQE handling, getting rid of GFP_ATOMIC for allocating overflow entries where possible - Improve detection of cases where io-wq doesn't need to spawn a new worker unnecessarily - Various little cleanups * tag 'for-6.16/io_uring-20250523' of git://git.kernel.dk/linux: (59 commits) io_uring/cmd: warn on reg buf imports by ineligible cmds io_uring/io-wq: only create a new worker if it can make progress io_uring/io-wq: ignore non-busy worker going to sleep io_uring/io-wq: move hash helpers to the top trace/io_uring: fix io_uring_local_work_run ctx documentation io_uring: finish IOU_OK -> IOU_COMPLETE transition io_uring: add new helpers for posting overflows io_uring: pass in struct io_big_cqe to io_alloc_ocqe() io_uring: make io_alloc_ocqe() take a struct io_cqe pointer io_uring: split alloc and add of overflow io_uring: open code io_req_cqe_overflow() io_uring/fdinfo: get rid of dumping credentials io_uring/fdinfo: only compile if CONFIG_PROC_FS is set io_uring/kbuf: unify legacy buf provision and removal io_uring/kbuf: refactor __io_remove_buffers io_uring/kbuf: don't compute size twice on prep io_uring/kbuf: drop extra vars in io_register_pbuf_ring io_uring/kbuf: use mem_is_zero() io_uring/kbuf: account ring io_buffer_list memory io_uring: drain based on allocates reqs ... |
||
![]() |
6f59de9bc0 |
for-6.16/block-20250523
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmgwnGYQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpq9aD/4iqOts77xhWWLrOJWkkhOcV5rREeyppq8X MKYul9S4cc4Uin9Xou9a+nab31QBQEk3nsN3kX9o3yAXvkh6yUm36HD8qYNW/46q IUkwRQQJ0COyTnexMZQNTbZPQDIYcenXmQxOcrEJ5jC1Jcz0sOKHsgekL+ab3kCy fLnuz2ozvjGDMala/NmE8fN5qSlj4qQABHgbamwlwfo4aWu07cwfqn5G/FCYJgDO xUvsnTVclom2g4G+7eSSvGQI1QyAxl5QpviPnj/TEgfFBFnhbCSoBTEY6ecqhlfW 6u59MF/Uw8E+weiuGY4L87kDtBhjQs3UMSLxCuwH7MxXb25ff7qB4AIkcFD0kKFH 3V5NtwqlU7aQT0xOjGxaHhfPwjLD+FVss4ARmuHS09/Kn8egOW9yROPyetnuH84R Oz0Ctnt1IPLFjvGeg3+rt9fjjS9jWOXLITb9Q6nX9gnCt7orCwIYke8YCpmnJyhn i+fV4CWYIQBBRKxIT0E/GhJxZOmL0JKpomnbpP2dH8npemnsTCuvtfdrK9gfhH2X chBVqCPY8MNU5zKfzdEiavPqcm9392lMzOoOXW2pSC1eAKqnAQ86ZT3r7rLntqE8 75LxHcvaQIsnpyG+YuJVHvoiJ83TbqZNpyHwNaQTYhDmdYpp2d/wTtTQywX4DuXb Y6NDJw5+kQ== =1PNK -----END PGP SIGNATURE----- Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - ublk updates: - Add support for updating the size of a ublk instance - Zero-copy improvements - Auto-registering of buffers for zero-copy - Series simplifying and improving GET_DATA and request lookup - Series adding quiesce support - Lots of selftests additions - Various cleanups - NVMe updates via Christoph: - add per-node DMA pools and use them for PRP/SGL allocations (Caleb Sander Mateos, Keith Busch) - nvme-fcloop refcounting fixes (Daniel Wagner) - support delayed removal of the multipath node and optionally support the multipath node for private namespaces (Nilay Shroff) - support shared CQs in the PCI endpoint target code (Wilfred Mallawa) - support admin-queue only authentication (Hannes Reinecke) - use the crc32c library instead of the crypto API (Eric Biggers) - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes Reinecke, Leon Romanovsky, Gustavo A. R. Silva) - MD updates via Yu: - Fix that normal IO can be starved by sync IO, found by mkfs on newly created large raid5, with some clean up patches for bdev inflight counters - Clean up brd, getting rid of atomic kmaps and bvec poking - Add loop driver specifically for zoned IO testing - Eliminate blk-rq-qos calls with a static key, if not enabled - Improve hctx locking for when a plug has IO for multiple queues pending - Remove block layer bouncing support, which in turn means we can remove the per-node bounce stat as well - Improve blk-throttle support - Improve delay support for blk-throttle - Improve brd discard support - Unify IO scheduler switching. This should also fix a bunch of lockdep warnings we've been seeing, after enabling lockdep support for queue freezing/unfreezeing - Add support for block write streams via FDP (flexible data placement) on NVMe - Add a bunch of block helpers, facilitating the removal of a bunch of duplicated boilerplate code - Remove obsolete BLK_MQ pci and virtio Kconfig options - Add atomic/untorn write support to blktrace - Various little cleanups and fixes * tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits) selftests: ublk: add test for UBLK_F_QUIESCE ublk: add feature UBLK_F_QUIESCE selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE traceevent/block: Add REQ_ATOMIC flag to block trace events ublk: run auto buf unregisgering in same io_ring_ctx with registering io_uring: add helper io_uring_cmd_ctx_handle() ublk: remove io argument from ublk_auto_buf_reg_fallback() ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch() selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK selftests: ublk: support UBLK_F_AUTO_BUF_REG ublk: support UBLK_AUTO_BUF_REG_FALLBACK ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG ublk: prepare for supporting to register request buffer automatically ublk: convert to refcount_t selftests: ublk: make IO & device removal test more stressful nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk nvme: introduce multipath_always_on module param nvme-multipath: introduce delayed removal of the multipath head node nvme-pci: derive and better document max segments limits nvme-pci: use struct_size for allocation struct nvme_dev ... |
||
![]() |
927244f6ef |
traceevent/block: Add REQ_ATOMIC flag to block trace events
Filesystems like XFS can implement atomic write I/O using either REQ_ATOMIC flag set in the bio or via CoW operation. It will be useful if we have a flag in trace events to distinguish between the two. This patch adds char 'U' (Untorn writes) to rwbs field of the trace events if REQ_ATOMIC flag is set in the bio. <W/ REQ_ATOMIC> ================= xfs_io-4238 [009] ..... 4148.126843: block_rq_issue: 259,0 WFSU 16384 () 768 + 32 none,0,0 [xfs_io] <idle>-0 [009] d.h1. 4148.129864: block_rq_complete: 259,0 WFSU () 768 + 32 none,0,0 [0] <W/O REQ_ATOMIC> =============== xfs_io-4237 [010] ..... 4143.325616: block_rq_issue: 259,0 WS 16384 () 768 + 32 none,0,0 [xfs_io] <idle>-0 [010] d.H1. 4143.329138: block_rq_complete: 259,0 WS () 768 + 32 none,0,0 [0] Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/44317cb2ec4588f6a2c1501a96684e6a1196e8ba.1747921498.git.ritesh.list@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
![]() |
db26d62d79
|
netfs: Fix undifferentiation of DIO reads from unbuffered reads
On cifs, "DIO reads" (specified by O_DIRECT) need to be differentiated from
"unbuffered reads" (specified by cache=none in the mount parameters). The
difference is flagged in the protocol and the server may behave
differently: Windows Server will, for example, mandate that DIO reads are
block aligned.
Fix this by adding a NETFS_UNBUFFERED_READ to differentiate this from
NETFS_DIO_READ, parallelling the write differentiation that already exists.
cifs will then do the right thing.
Fixes:
|
||
![]() |
cba4dbeb7b |
mm: remove VM_PAT
It's unused, so let's remove it. Link: https://lkml.kernel.org/r/20250512123424.637989-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> [x86 bits] Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Airlie <airlied@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
28be240c76 |
trace/io_uring: fix io_uring_local_work_run ctx documentation
The comment for the tracepoint io_uring_local_work_run refers to a field "tctx" and a type "io_uring_ctx", neither of which exist. "tctx" looks to mean "ctx" and "io_uring_ctx" should be "io_ring_ctx". Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250522150451.2385652-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
![]() |
5fddfbc0cb
|
Merge patch series "netfs: Miscellaneous fixes"
David Howells <dhowells@redhat.com> says: Here are some miscellaneous fixes and changes for netfslib, if you could pull them: (1) Fix an oops in write-retry due to mis-resetting the I/O iterator. (2) Fix the recording of transferred bytes for short DIO reads. (3) Fix a request's work item to not require a reference, thereby avoiding the need to get rid of it in BH/IRQ context. (4) Fix waiting and waking to be consistent about the waitqueue used. * patches from https://lore.kernel.org/20250519090707.2848510-1-dhowells@redhat.com: netfs: Fix wait/wake to be consistent about the waitqueue used netfs: Fix the request's work item to not require a ref netfs: Fix setting of transferred bytes with short DIO reads netfs: Fix oops in write-retry from mis-resetting the subreq iterator Link: https://lore.kernel.org/20250519090707.2848510-1-dhowells@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
20d72b00ca
|
netfs: Fix the request's work item to not require a ref
When the netfs_io_request struct's work item is queued, it must be supplied
with a ref to the work item struct to prevent it being deallocated whilst
on the queue or whilst it is being processed. This is tricky to manage as
we have to get a ref before we try and queue it and then we may find it's
already queued and is thus already holding a ref - in which case we have to
try and get rid of the ref again.
The problem comes if we're in BH or IRQ context and need to drop the ref:
if netfs_put_request() reduces the count to 0, we have to do the cleanup -
but the cleanup may need to wait.
Fix this by adding a new work item to the request, ->cleanup_work, and
dispatching that when the refcount hits zero. That can then synchronously
cancel any outstanding work on the main work item before doing the cleanup.
Adding a new work item also deals with another problem upstream where it's
sometimes changing the work func in the put function and requeuing it -
which has occasionally in the past caused the cleanup to happen
incorrectly.
As a bonus, this allows us to get rid of the 'was_async' parameter from a
bunch of functions. This indicated whether the put function might not be
permitted to sleep.
Fixes:
|
||
![]() |
9cd78ca04f
|
fs/netfs: remove unused source NETFS_INVALID_WRITE
This enum choice was added by commit |
||
![]() |
748922dcfa |
cgroup: use subsystem-specific rstat locks to avoid contention
It is possible to eliminate contention between subsystems when updating/flushing stats by using subsystem-specific locks. Let the existing rstat locks be dedicated to the cgroup base stats and rename them to reflect that. Add similar locks to the cgroup_subsys struct for use with individual subsystems. Lock initialization is done in the new function ss_rstat_init(ss) which replaces cgroup_rstat_boot(void). If NULL is passed to this function, the global base stat locks will be initialized. Otherwise, the subsystem locks will be initialized. Change the existing lock helper functions to accept a reference to a css. Then within these functions, conditionally select the appropriate locks based on the subsystem affiliation of the given css. Add helper functions for this selection routine to avoid repeated code. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> |
||
![]() |
08d6ee6d8a |
sunrpc: implement rfc2203 rpcsec_gss seqnum cache
This implements a sequence number cache of the last three (right now hardcoded) sent sequence numbers for a given XID, as suggested by the RFC. From RFC2203 5.3.3.1: "Note that the sequence number algorithm requires that the client increment the sequence number even if it is retrying a request with the same RPC transaction identifier. It is not infrequent for clients to get into a situation where they send two or more attempts and a slow server sends the reply for the first attempt. With RPCSEC_GSS, each request and reply will have a unique sequence number. If the client wishes to improve turn around time on the RPC call, it can cache the RPCSEC_GSS sequence number of each request it sends. Then when it receives a response with a matching RPC transaction identifier, it can compute the checksum of each sequence number in the cache to try to match the checksum in the reply's verifier." Signed-off-by: Nikhil Jha <njha@janestreet.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> |
||
![]() |
06aa9378df |
x86/tracing, x86/mm: Move page fault tracepoints to generic
Page fault tracepoints are interesting for other architectures as well. Move them to be generic. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-trace-kernel@vger.kernel.org Link: https://lore.kernel.org/r/89c2f284adf9b4c933f0e65811c50cef900a5a95.1747046848.git.namcao@linutronix.de |
||
![]() |
4eb56b0761 |
erofs: refine readahead tracepoint
- trace_erofs_readpages => trace_erofs_readahead; - Rename a redundant statement `nrpages = readahead_count(rac);`; - Move the tracepoint to the beginning of z_erofs_readahead(). Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Link: https://lore.kernel.org/r/20250514120820.2739288-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> |
||
![]() |
c1269d3d12 |
tcp: add tcp_rcvbuf_grow() tracepoint
Provide a new tracepoint to better understand
tcp_rcv_space_adjust() (currently broken) behavior.
Call it only when tcp_rcv_space_adjust() has a chance
to make a change.
I chose to leave trace_tcp_rcv_space_adjust() as is,
because commit
|
||
![]() |
cff6df108b |
btrfs: tracepoints: remove no longer used tracepoints for eb locking
There are several tracepoints for extent buffer locks that are not used
anymore:
* btrfs_tree_read_unlock_blocking
* btrfs_set_lock_blocking_read
* btrfs_set_lock_blocking_write
* btrfs_tree_read_lock_atomic
These stopped being used after we switched extent buffer locks from a
custom implementation to rw semaphores in commit
|
||
![]() |
81eb6ce8b5 |
btrfs: tracepoints: add btrfs prefix to names where it's missing
Most of our tracepoints have the 'btrfs_' prefix in their names but a few of them are missing, making it inconsistent. So add the prefix to the ones that are missing it, creating consistency, making it clear for users these are btrfs tracepoints and eventually avoid name collisions with other tracepoints defined by other kernel subsystems. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
![]() |
02c340c278 |
btrfs: rename the functions to get inode and fs_info from an extent io tree
These functions are exported so they should have a 'btrfs_' prefix by convention, to make it clear they are btrfs specific and to avoid collisions with functions from elsewhere in the kernel. So add a 'btrfs_' prefix to their name to make it clear they are from btrfs. Also remove the 'const' suffix from extent_io_tree_to_inode_const() since there's no non-const variant anymore and makes the naming consistent with extent_io_tree_to_fs_info() (no 'const' suffix and returns a const pointer). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
![]() |
41708a4c23 |
btrfs: add btrfs prefix to trace events for extent state alloc and free
These trace events don't have the 'btrfs_' prefix in their name, unlike the other trace events from extent-io-tree.c. So add the prefix to make them consistent and follow coding style conventions too. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
![]() |
0f987c099d |
btrfs: tracepoints: use btrfs_root_id() to get the id of a root
Instead of open coding btrfs_root_id() to get the ID of a root, use the helper in the trace points, which also makes the code less verbose. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
![]() |
5c41f6010e |
btrfs: remove EXTENT_UPTODATE io tree flag
The EXTENT_UPTODATE io tree flag is now used only to mark ranges in the fs_info->excluded_extents as used by super blocks and not available for extent allocation (to prevent adding those ranges as free space in the in memory space caches). As we can use any flag for that purpose, and we are using EXTENT_DIRTY for the pinned extents io tree for example, remove the EXTENT_UPTODATE flag and use instead EXTENT_DIRTY for the excluded extents io tree. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
![]() |
155fd6c3e2 |
tracing/sched: Use __string() instead of fixed lengths for task->comm
The sched_switch and sched_waking events hardcoded the length of the comm it recorded because these events were created before the dynamic strings were implemented. Unfortunately, several other events copied this method. As the size of the comm may change in the future, make the string dynamic. The dynamic string requires a 4 byte meta data to hold the size and offset of the string. The amount stored in the ring buffer will then be the strlen(comm) + 5 (for the \n), and aligned to 4 bytes if there's no other strings. This means that a task comm can have up to 10 characters before it requires another 4 bytes in the ring buffer. Most tasks are usually less than that, so this should not be a problem, and it also allows the name to be extended over the TASK_COMM_LEN [1] Note, sched_switch and the sched_waking trace events still hardcode the length, as there is tooling that still requires that. An effort to update the tooling will be made to allow this to change in the future. [1] https://lore.kernel.org/all/20250507110444.963779-1-bhupesh@igalia.com/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Bhupesh <bhupesh@igalia.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/20250507133458.51bafd95@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
ac01fa73f5 |
tracepoint: Have tracepoints created with DECLARE_TRACE() have _tp suffix
Most tracepoints in the kernel are created with TRACE_EVENT(). The TRACE_EVENT() macro (and DECLARE_EVENT_CLASS() and DEFINE_EVENT() where in reality, TRACE_EVENT() is just a helper macro that calls those other two macros), will create not only a tracepoint (the function trace_<event>() used in the kernel), it also exposes the tracepoint to user space along with defining what fields will be saved by that tracepoint. There are a few places that tracepoints are created in the kernel that are not exposed to userspace via tracefs. They can only be accessed from code within the kernel. These tracepoints are created with DEFINE_TRACE() Most of these tracepoints end with "_tp". This is useful as when the developer sees that, they know that the tracepoint is for in-kernel only (meaning it can only be accessed inside the kernel, either directly by the kernel or indirectly via modules and BPF programs) and is not exposed to user space. Instead of making this only a process to add "_tp", enforce it by making the DECLARE_TRACE() append the "_tp" suffix to the tracepoint. This requires adding DECLARE_TRACE_EVENT() macros for the TRACE_EVENT() macro to use that keeps the original name. Link: https://lore.kernel.org/all/20250418083351.20a60e64@gandalf.local.home/ Cc: netdev <netdev@vger.kernel.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Ahern <dsahern@kernel.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Breno Leitao <leitao@debian.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/20250510163730.092fad5b@gandalf.local.home Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
3fc567e4c0 |
sched/numa: add tracepoint that tracks the skipping of numa balancing due to cpuset memory pinning
Unlike sched_skip_vma_numa tracepoint which tracks skipped VMAs, this tracks the task subjected to cpuset.mems pinning and prints out its allowed memory node mask. Link: https://lkml.kernel.org/r/20250424024523.2298272-3-libo.chen@oracle.com Signed-off-by: Libo Chen <libo.chen@oracle.com> Cc: "Chen, Tim C" <tim.c.chen@intel.com> Cc: Chen Yu <yu.c.chen@intel.com> Cc: Chris Hyser <chris.hyser@oracle.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Cc: Mel Gorman <mgorman <mgorman@suse.de> Cc: Michal Koutný <mkoutny@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Srikanth Aithal <sraithal@amd.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
50dbe53129 |
khugepaged: pass folio instead of head page to trace events
The trace functions trace_mm_collapse_huge_page_isolate() and trace_mm_khugepaged_scan_pmd() each have a single user, which always passes in the head page of a folio. Refactor both functions to take a folio directly. Link: https://lkml.kernel.org/r/20250425002425.533698-1-nifan.cxl@gmail.com Signed-off-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Nico Pache <npache@redhat.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Adam Manzanares <a.manzanares@samsung.com> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Mariano Pache <npache@redhat.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
18c64378ad |
sunrpc: add info about xprt queue times to svc_xprt_dequeue tracepoint
I've been looking at a problem where we see increased RPC timeouts in clients when the nfs_layout_flexfiles dataserver_timeo value is tuned very low (6s). This is necessary to ensure quick failover to a different mirror if a server goes down, but it causes a lot more major RPC timeouts. Ultimately, the problem is server-side however. It's sometimes doesn't respond to connection attempts. My theory is that the interrupt handler runs when a connection comes in, the xprt ends up being enqueued, but it takes a significant amount of time for the nfsd thread to pick it up. Currently, the svc_xprt_dequeue tracepoint displays "wakeup-us". This is the time between the wake_up() call, and the thread dequeueing the xprt. If no thread was woken, or the thread ended up picking up a different xprt than intended, then this value won't tell us how long the xprt was waiting. Add a new xpt_qtime field to struct svc_xprt and set it in svc_xprt_enqueue(). When the dequeue tracepoint fires, also store the time that the xprt sat on the queue in total. Display it as "qtime-us". Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> |
||
![]() |
b9e22b35d4 |
tsm-mr: Add TVM Measurement Register support
Introduce new TSM Measurement helper library (tsm-mr) for TVM guest drivers to expose MRs (Measurement Registers) as sysfs attributes, with Crypto Agility support. Add the following new APIs (see include/linux/tsm-mr.h for details): - tsm_mr_create_attribute_group(): Take on input a `struct tsm_measurements` instance, which includes one `struct tsm_measurement_register` per MR with properties like `TSM_MR_F_READABLE` and `TSM_MR_F_WRITABLE`, to determine the supported operations and create the sysfs attributes accordingly. On success, return a `struct attribute_group` instance that will typically be included by the guest driver into `miscdevice.groups` before calling misc_register(). - tsm_mr_free_attribute_group(): Free the memory allocated to the attrubute group returned by tsm_mr_create_attribute_group(). tsm_mr_create_attribute_group() creates one attribute for each MR, with names following this pattern: MRNAME[:HASH] - MRNAME - Placeholder for the MR name, as specified by `tsm_measurement_register.mr_name`. - :HASH - Optional suffix indicating the hash algorithm associated with this MR, as specified by `tsm_measurement_register.mr_hash`. Support Crypto Agility by allowing multiple definitions of the same MR (i.e., with the same `mr_name`) with distinct HASH algorithms. NOTE: Crypto Agility, introduced in TPM 2.0, allows new hash algorithms to be introduced without breaking compatibility with applications using older algorithms. CC architectures may face the same challenge in the future, needing new hashes for security while retaining compatibility with older hashes, hence the need for Crypto Agility. Signed-off-by: Cedric Xing <cedric.xing@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Dionna Amalie Glaze <dionnaglaze@google.com> [djbw: fixup bin_attr const conflict] Link: https://patch.msgid.link/20250509020739.882913-1-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> |
||
![]() |
6b02fd7799 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc6). No conflicts. Adjacent changes: net/core/dev.c: |
||
![]() |
402dd9f02c |
f2fs: remove wbc->for_reclaim handling
Since commits |
||
![]() |
0d8d44db29 |
for-6.15-rc5-tag
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmgaECkACgkQxWXV+ddt WDsHeA//SCLb1tlI9LEiOuDP7Dk429caxrQwPU/AXPOoUwGT0rNSjmBDLXfIRFHT gRmI48huDvuVu00wL+wOY9Xs1M5oMkExsAW8nq08MHM2I+sNx+ppojjM5RgpwwCs QAASTEu4DOhtYrzJ9SPn0jmK8kDadi3fFSNNIJBd5IjpcLIhNiyryU6l7iXq9f7A pA3EEg7KL4jvciaOsnqE+/nvAd7oT0OtIRkrzPRKnsjJEg5zZEVo/4hUMhbNHVLC 7CuQB6MR79PoTOW8kZL/636FOQqv0XO+luHZEUf26sTuKiTEHgjq2jBymViDibCy XNNKCnqTmmYCcN4bqIkdDzM5cPZmOchih7eTUUTlpNH3qmtGn0HVx6pmOS+U6lHI DFRELbo+ry3LikZ8a7sGNcZQcooq7A7FgxggbI37Nbn0M6FxvmbiwfTDvvn6o04H +Q7+Sdbklb3MnNCa/ebIq+9XewYIoNXCAqnLJxMIj8OzrBtvPWoI5R3/CGe7MYsf jvEGHQuSLaw39tBJmrypImkoRocK/4hhHzYpGGQ5FNtbcgTEqHNIi+uIjHJlxQfi 9Tg95o2eK/glg+T3WrG/uviSnz5VbIKdj5Ksjw3evC0ihzX61NljMnPIlWEkAHAZ AIFnx5aQe1FhN9HQMiGenCYg+QuFsHXX3Qbh+2PW6QHbQ0os9Fg= =oczg -----END PGP SIGNATURE----- Merge tag 'for-6.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - revert device path canonicalization, this does not work as intended with namespaces and is not reliable in all setups - fix crash in scrub when checksum tree is not valid, e.g. when mounted with rescue=ignoredatacsums - fix crash when tracepoint btrfs_prelim_ref_insert is enabled - other minor fixups: - open code folio_index(), meant to be used in MM code - use matching type for sizeof in compression allocation * tag 'for-6.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: open code folio_index() in btree_clear_folio_dirty_tag() Revert "btrfs: canonicalize the device path before adding it" btrfs: avoid NULL pointer dereference if no valid csum tree btrfs: handle empty eb->folios in num_extent_folios() btrfs: correct the order of prelim_ref arguments in btrfs__prelim_ref btrfs: compression: adjust cb->compressed_folios allocation type |
||
![]() |
eeadd68e2a |
block: remove bounce buffering support
The block layer bounce buffering support is unused now, remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250505081138.3435992-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
![]() |
bc7e097509 |
btrfs: correct the order of prelim_ref arguments in btrfs__prelim_ref
btrfs_prelim_ref() calls the old and new reference variables in the incorrect order. This causes a NULL pointer dereference because oldref is passed as NULL to trace_btrfs_prelim_ref_insert(). Note, trace_btrfs_prelim_ref_insert() is being called with newref as oldref (and oldref as NULL) on purpose in order to print out the values of newref. To reproduce: echo 1 > /sys/kernel/debug/tracing/events/btrfs/btrfs_prelim_ref_insert/enable Perform some writeback operations. Backtrace: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 115949067 P4D 115949067 PUD 11594a067 PMD 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 1 UID: 0 PID: 1188 Comm: fsstress Not tainted 6.15.0-rc2-tester+ #47 PREEMPT(voluntary) 7ca2cef72d5e9c600f0c7718adb6462de8149622 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014 RIP: 0010:trace_event_raw_event_btrfs__prelim_ref+0x72/0x130 Code: e8 43 81 9f ff 48 85 c0 74 78 4d 85 e4 0f 84 8f 00 00 00 49 8b 94 24 c0 06 00 00 48 8b 0a 48 89 48 08 48 8b 52 08 48 89 50 10 <49> 8b 55 18 48 89 50 18 49 8b 55 20 48 89 50 20 41 0f b6 55 28 88 RSP: 0018:ffffce44820077a0 EFLAGS: 00010286 RAX: ffff8c6b403f9014 RBX: ffff8c6b55825730 RCX: 304994edf9cf506b RDX: d8b11eb7f0fdb699 RSI: ffff8c6b403f9010 RDI: ffff8c6b403f9010 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000010 R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8c6b4e8fb000 R13: 0000000000000000 R14: ffffce44820077a8 R15: ffff8c6b4abd1540 FS: 00007f4dc6813740(0000) GS:ffff8c6c1d378000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 000000010eb42000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> prelim_ref_insert+0x1c1/0x270 find_parent_nodes+0x12a6/0x1ee0 ? __entry_text_end+0x101f06/0x101f09 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 btrfs_is_data_extent_shared+0x167/0x640 ? fiemap_process_hole+0xd0/0x2c0 extent_fiemap+0xa5c/0xbc0 ? __entry_text_end+0x101f05/0x101f09 btrfs_fiemap+0x7e/0xd0 do_vfs_ioctl+0x425/0x9d0 __x64_sys_ioctl+0x75/0xc0 Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |