2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

48406 Commits

Author SHA1 Message Date
Shivank Garg
a984f16fba mm: use folio_expected_ref_count() helper for reference counting
Replace open-coded folio reference count calculations with the
folio_expected_ref_count().

No functional changes intended.

Link: https://lkml.kernel.org/r/20250611052706.515408-2-shivankg@amd.com
Signed-off-by: Shivank Garg <shivankg@amd.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Namhyung kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:08 -07:00
Linus Torvalds
772b78c2ab - Fix the calculation of the deadline server task's runtime as this mishap was
preventing realtime tasks from running
 
 - Avoid a race condition during migrate-swapping two tasks
 
 - Fix the string reported for the "none" dynamic preemption option
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhqM/8ACgkQEsHwGGHe
 VUqxng/+P/CQXrijxNOTSlN0NeDfuVPMtpmaijDONxa+m/BAxDjNKVuJefZY/tGa
 jV14hTUMIQkrjuSapIdN2Io02dK7p371ozsOxjNB+kJvDI6kKkOkOn1tWLOGyI+e
 oTIrpJvuxTkVmJOud+3Bl6OR/k+mrQ2R5ud5xJ/exgmBz+wRaRMxIYwQBlmCAZ7I
 uzrR94VL++sZdIuWrBt/5qFQMiwJ3xdrruhz/wdWoq6OQJovNECV1TGFZifKh2Rh
 4DXoMR46gPRXV0r5JoP8BSyw0V2PGwFnVoM3PsOCcN1guJgdiKszCGp89lzN5Z2x
 ySDegu6rnpYoaCmQLjBngGlzBnaEKWKUz9IYrXr/qGjVR8GIvoWjAhOQWvbXjyS2
 5CHRsUBlSJhwlTPJc5RGt8+O9ahWkBGPBCSsnImygTMGl2JIxsZUEEv8ELxaUq5K
 qTAZKYBwzOb2aA3FNe51Pwpz8SI3TKcDLWujHvcNeOSlbO23Bg/TTa3OCy1c3gGg
 HJ7dKw5lSi89VzKhpWwhqBKL1vu/fuVTZ52GCu0BiiwYfCVJwYD40vNNKgiiG1oq
 X2Sr4DUCtwzpFcIMfo9yJ9scqaT5gJywydnB4+oHlbg5OCLDOuWCs0EGGOCPd4LY
 Gi3ft9MBepwYeuCv7DELKKO62jIrlDeOU2FmW+9/RC7/z5egWI4=
 =ubM9
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Fix the calculation of the deadline server task's runtime as this
   mishap was preventing realtime tasks from running

 - Avoid a race condition during migrate-swapping two tasks

 - Fix the string reported for the "none" dynamic preemption option

* tag 'sched_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix dl_server runtime calculation formula
  sched/core: Fix migrate_swap() vs. hotplug
  sched: Fix preemption string of preempt_dynamic_none
2025-07-06 11:17:47 -07:00
Linus Torvalds
a1639ce5e5 - Revert uprobes to using CAP_SYS_ADMIN again as currently they can
destructively modify kernel code from an unprivileged process
 
 - Move a warning to where it belongs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhqMXoACgkQEsHwGGHe
 VUrSMRAAhwOkHt/Snwd2iT4G7xwbrOvW8q5Ed3WnlHSvy8ygkdbDEF0WPAuQasb6
 qc8iTuQv4i96UtHXWacIuk+q+P/oS/n1wdAyU8nGEavQZGQCaGaTw7gPxy1YcBJB
 mZGtP5HpIIlZpH74lvbBp8Q7T+BJFYSYt+KL2e7Qrc+AyuBUWSvaMnRkT7Ek520S
 aOY75BO104SI2QLY4lx9fTlumgowX44a1tJNEYjntAoMQd+MFMi71zP2FdObtYe6
 Cv4YAU6tSYZML0cOs+YJi50Qk0qg9EGOZKGopuFEs5jIXp4fXCTlCyowqQWFeC5M
 MNlHH1sg2mp+PdDYRatQiarO4gXDMhsT+G+K+TRtBtNwuL0WnbSUxJyNPOkCxfwQ
 nBup5knS9vzPXtuox3az8pYr/VS3H0efBVnDElwG/FhsYHGbOBfOLz54iv3j+Bbe
 CylXJPYPWfJ2UvIJeGRI9NJ3pGKHBLkvUzkwAsGBouCrrZcZIQZeXS1h2IggxDXf
 ooD66aAPcYMgIDKxlpVa8BSYlFzrB0+eq1CsuxoHbr/UcfSjaWSvK1qY+b6EoSaT
 R6L60vuSXyX5s9sHQ8QZoL3qkYIb4oCy8zTFlFjZry8vwHEL4XlzgHq+PTE5oXv4
 VT1uoiybzrx3/X7etz+AO7Vmd/yasyxZSpzd7b6+FVvLhUMXoKg=
 =dYTn
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Revert uprobes to using CAP_SYS_ADMIN again as currently they can
   destructively modify kernel code from an unprivileged process

 - Move a warning to where it belongs

* tag 'perf_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Revert to requiring CAP_SYS_ADMIN for uprobes
  perf/core: Fix the WARN_ON_ONCE is out of lock protected region
2025-07-06 10:49:27 -07:00
Rafael J. Wysocki
250d0579da Merge branch 'pm-sleep'
Merge fixes related to system sleep for 6.16-rc5:

 - Fix typo in the ABI documentation (Sumanth Gavini).

 - Allow swap to be used a bit longer during system suspend and
   hibernation to avoid suspend failures under memory pressure (Mario
   Limonciello).

* pm-sleep:
  PM: sleep: docs: Replace "diasble" with "disable"
  PM: Restrict swap use to later in the suspend sequence
2025-07-04 21:54:55 +02:00
kuyo chang
fc975cfb36 sched/deadline: Fix dl_server runtime calculation formula
In our testing with 6.12 based kernel on a big.LITTLE system, we were
seeing instances of RT tasks being blocked from running on the LITTLE
cpus for multiple seconds of time, apparently by the dl_server. This
far exceeds the default configured 50ms per second runtime.

This is due to the fair dl_server runtime calculation being scaled
for frequency & capacity of the cpu.

Consider the following case under a Big.LITTLE architecture:
Assume the runtime is: 50,000,000 ns, and Frequency/capacity
scale-invariance defined as below:
Frequency scale-invariance: 100
Capacity scale-invariance: 50
First by Frequency scale-invariance,
the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
Then by capacity scale-invariance,
it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
So it will scaled to 238,418 ns.

This smaller "accounted runtime" value is what ends up being
subtracted against the fair-server's runtime for the current period.
Thus after 50ms of real time, we've only accounted ~238us against the
fair servers runtime. This 209:1 ratio in this example means that on
the smaller cpu the fair server is allowed to continue running,
blocking RT tasks, for over 10 seconds before it exhausts its supposed
50ms of runtime.  And on other hardware configurations it can be even
worse.

For the fair deadline_server, to prevent realtime tasks from being
unexpectedly delayed, we really do want to use fixed time, and not
scaled time for smaller capacity/frequency cpus. So remove the scaling
from the fair server's accounting to fix this.

Fixes: a110a81c52 ("sched/deadline: Deferrable dl server")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: John Stultz <jstultz@google.com>
Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Acked-by: John Stultz <jstultz@google.com>
Tested-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/r/20250702021440.2594736-1-kuyo.chang@mediatek.com
2025-07-04 10:35:56 +02:00
Peter Zijlstra
ba677dbe77 perf: Revert to requiring CAP_SYS_ADMIN for uprobes
Jann reports that uprobes can be used destructively when used in the
middle of an instruction. The kernel only verifies there is a valid
instruction at the requested offset, but due to variable instruction
length cannot determine if this is an instruction as seen by the
intended execution stream.

Additionally, Mark Rutland notes that on architectures that mix data
in the text segment (like arm64), a similar things can be done if the
data word is 'mistaken' for an instruction.

As such, require CAP_SYS_ADMIN for uprobes.

Fixes: c9e0924e5c ("perf/core: open access to probes for CAP_PERFMON privileged process")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/CAG48ez1n4520sq0XrWYDHKiKxE_+WCfAK+qt9qkY4ZiBGmL-5g@mail.gmail.com
2025-07-03 10:33:55 +02:00
Peter Zijlstra
009836b4fa sched/core: Fix migrate_swap() vs. hotplug
On Mon, Jun 02, 2025 at 03:22:13PM +0800, Kuyo Chang wrote:

> So, the potential race scenario is:
>
> 	CPU0							CPU1
> 	// doing migrate_swap(cpu0/cpu1)
> 	stop_two_cpus()
> 							  ...
> 							 // doing _cpu_down()
> 							      sched_cpu_deactivate()
> 								set_cpu_active(cpu, false);
> 								balance_push_set(cpu, true);
> 	cpu_stop_queue_two_works
> 	    __cpu_stop_queue_work(stopper1,...);
> 	    __cpu_stop_queue_work(stopper2,..);
> 	stop_cpus_in_progress -> true
> 		preempt_enable();
> 								...
> 							1st balance_push
> 							stop_one_cpu_nowait
> 							cpu_stop_queue_work
> 							__cpu_stop_queue_work
> 							list_add_tail  -> 1st add push_work
> 							wake_up_q(&wakeq);  -> "wakeq is empty.
> 										This implies that the stopper is at wakeq@migrate_swap."
> 	preempt_disable
> 	wake_up_q(&wakeq);
> 	        wake_up_process // wakeup migrate/0
> 		    try_to_wake_up
> 		        ttwu_queue
> 		            ttwu_queue_cond ->meet below case
> 		                if (cpu == smp_processor_id())
> 			         return false;
> 			ttwu_do_activate
> 			//migrate/0 wakeup done
> 		wake_up_process // wakeup migrate/1
> 	           try_to_wake_up
> 		    ttwu_queue
> 			ttwu_queue_cond
> 		        ttwu_queue_wakelist
> 			__ttwu_queue_wakelist
> 			__smp_call_single_queue
> 	preempt_enable();
>
> 							2nd balance_push
> 							stop_one_cpu_nowait
> 							cpu_stop_queue_work
> 							__cpu_stop_queue_work
> 							list_add_tail  -> 2nd add push_work, so the double list add is detected
> 							...
> 							...
> 							cpu1 get ipi, do sched_ttwu_pending, wakeup migrate/1
>

So this balance_push() is part of schedule(), and schedule() is supposed
to switch to stopper task, but because of this race condition, stopper
task is stuck in WAKING state and not actually visible to be picked.

Therefore CPU1 can do another schedule() and end up doing another
balance_push() even though the last one hasn't been done yet.

This is a confluence of fail, where both wake_q and ttwu_wakelist can
cause crucial wakeups to be delayed, resulting in the malfunction of
balance_push.

Since there is only a single stopper thread to be woken, the wake_q
doesn't really add anything here, and can be removed in favour of
direct wakeups of the stopper thread.

Then add a clause to ttwu_queue_cond() to ensure the stopper threads
are never queued / delayed.

Of all 3 moving parts, the last addition was the balance_push()
machinery, so pick that as the point the bug was introduced.

Fixes: 2558aacff8 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug")
Reported-by: Kuyo Chang <kuyo.chang@mediatek.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Kuyo Chang <kuyo.chang@mediatek.com>
Link: https://lkml.kernel.org/r/20250605100009.GO39944@noisy.programming.kicks-ass.net
2025-07-01 15:02:03 +02:00
Thomas Weißschuh
3ebb1b6522 sched: Fix preemption string of preempt_dynamic_none
Zero is a valid value for "preempt_dynamic_mode", namely
"preempt_dynamic_none".

Fix the off-by-one in preempt_model_str(), so that "preempty_dynamic_none"
is correctly formatted as PREEMPT(none) instead of PREEMPT(undef).

Fixes: 8bdc5daaa0 ("sched: Add a generic function to return the preemption string")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20250626-preempt-str-none-v2-1-526213b70a89@linutronix.de
2025-07-01 15:02:02 +02:00
Luo Gengkun
7b4c5a3754 perf/core: Fix the WARN_ON_ONCE is out of lock protected region
commit 3172fb9866 ("perf/core: Fix WARN in perf_cgroup_switch()") try to
fix a concurrency problem between perf_cgroup_switch and
perf_cgroup_event_disable. But it does not to move the WARN_ON_ONCE into
lock-protected region, so the warning is still be triggered.

Fixes: 3172fb9866 ("perf/core: Fix WARN in perf_cgroup_switch()")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250626135403.2454105-1-luogengkun@huaweicloud.com
2025-06-30 09:32:49 +02:00
Linus Torvalds
2fc18d0b89 - Make sure an AUX perf event is really disabled when it overruns
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhg/fMACgkQEsHwGGHe
 VUpi5BAAwBTf3vpsGZvVQNhZhTM9uy9EG0ZmNzPihhJ+e2Ko4BMlWmnBfB0olYgN
 SUBypUQQwkneh5qnUnNe7MEsFof2NONRK4EBwr2l2GWcO8YhEKe6DH+ow+wT+fB0
 B5ifBiEGua1Cv+G276c54WJr35Tkc7XqyfRorvT5LdmynbawU7raS1JK7lQRmKFD
 TzBcTqb8OSTq3tJ+G3eXB5rA9XbYd/TeVCDWYXGOl+BhCt1hnHph+p1xEz/o5PAV
 orCbR8tgv0+tBCvsnSDGQ3TEfAqdPnGYOzIyXte5r9/FaXPhyL8K8x3ixVx1zjnE
 8i+HCUvK7aQs0jFuQ6rfIGnKwNURmM8qVjL65MsFglTJenfXwa7WBYti7dlKUai3
 riaW0FQaEmRt5UhadB3OZJFMzQXKw3ZsxUHjTeYKlx8csangdb03pzwVvMz2o0VO
 xAhJ1i0jgRXaMOFOORtzU7FOZFUuhV8pDKergSObMpimmMG69reNU3MAZPJToYaO
 0Dxx2R/yWsnZMUctVWkcQPL5Qb2e63ecTcYOBUsMfOBuj2WNNLSnh9z6VmHPcT22
 n5nmeAwcGFD33C7CqyT76ruY2687pQi6DxvWxF3ED8vNOkXnP/URkHjpMcRA9fr0
 rUvglIeAxZSXus79ScMy+9Yu985AMljn6ZuMKlGapMWw4+BQAVQ=
 =yQqt
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v6.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Borislav Petkov:

 - Make sure an AUX perf event is really disabled when it overruns

* tag 'perf_urgent_for_v6.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/aux: Fix pending disable flow when the AUX ring buffer overruns
2025-06-29 08:16:02 -07:00
Linus Torvalds
ded779017a tracing fixes for v6.16:
- Fix possible UAF on error path in filter_free_subsystem_filters()
 
   When freeing a subsystem filter, the filter for the subsystem is passed in
   to be freed and all the events within the subsystem will have their filter
   freed too. In order to free without waiting for RCU synchronization, list
   items are allocated to hold what is going to be freed to free it via a
   call_rcu(). If the allocation of these items fails, it will call the
   synchronization directly and free after that (causing a bit of delay for
   the user).
 
   The subsystem filter is first added to this list and then the filters for
   all the events under the subsystem. The bug is if one of the allocations
   of the list items for the event filters fail to allocate, it jumps to the
   "free_now" label which will free the subsystem filter, then all the items
   on the allocated list, and then the event filters that were not added to
   the list yet. But because the subsystem filter was added first, it gets
   freed twice.
 
   The solution is to add the subsystem filter after the events, and then if
   any of the allocations fail it will not try to free any of them twice
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaF/yIRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qpoNAP9AuI6SzS+E14UFbA7lEPVtQAgaj6rv
 xURhlmZdsGJ2AQEA3ZTv6Lf3DbnSHzPDOUnK9ItQZE7UHPh4Yed0QrriEAM=
 =hFZ1
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:

 - Fix possible UAF on error path in filter_free_subsystem_filters()

   When freeing a subsystem filter, the filter for the subsystem is
   passed in to be freed and all the events within the subsystem will
   have their filter freed too. In order to free without waiting for RCU
   synchronization, list items are allocated to hold what is going to be
   freed to free it via a call_rcu(). If the allocation of these items
   fails, it will call the synchronization directly and free after that
   (causing a bit of delay for the user).

   The subsystem filter is first added to this list and then the filters
   for all the events under the subsystem. The bug is if one of the
   allocations of the list items for the event filters fail to allocate,
   it jumps to the "free_now" label which will free the subsystem
   filter, then all the items on the allocated list, and then the event
   filters that were not added to the list yet. But because the
   subsystem filter was added first, it gets freed twice.

   The solution is to add the subsystem filter after the events, and
   then if any of the allocations fail it will not try to free any of
   them twice

* tag 'trace-v6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix filter logic error
2025-06-28 11:39:24 -07:00
Linus Torvalds
0fd39af24e 16 hotfixes. 6 are cc:stable and the remainder address post-6.15 issues
or aren't considered necessary for -stable kernels.  5 are for MM.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaF8vtQAKCRDdBJ7gKXxA
 jlK9AP9Syx5isoE7MAMKjr9iI/2z+NRaCCro/VM4oQk8m2cNFgD/ZsL9YMhjZlcL
 bMIVUZ9E+yf1w9dLeHLoDba+pnF7Wwc=
 =vdkO
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-06-27-16-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "16 hotfixes.

  6 are cc:stable and the remainder address post-6.15 issues or aren't
  considered necessary for -stable kernels. 5 are for MM"

* tag 'mm-hotfixes-stable-2025-06-27-16-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  MAINTAINERS: add Lorenzo as THP co-maintainer
  mailmap: update Duje Mihanović's email address
  selftests/mm: fix validate_addr() helper
  crashdump: add CONFIG_KEYS dependency
  mailmap: correct name for a historical account of Zijun Hu
  mailmap: add entries for Zijun Hu
  fuse: fix runtime warning on truncate_folio_batch_exceptionals()
  scripts/gdb: fix dentry_name() lookup
  mm/damon/sysfs-schemes: free old damon_sysfs_scheme_filter->memcg_path on write
  mm/alloc_tag: fix the kmemleak false positive issue in the allocation of the percpu variable tag->counters
  lib/group_cpus: fix NULL pointer dereference from group_cpus_evenly()
  mm/hugetlb: remove unnecessary holding of hugetlb_lock
  MAINTAINERS: add missing files to mm page alloc section
  MAINTAINERS: add tree entry to mm init block
  mm: add OOM killer maintainer structure
  fs/proc/task_mmu: fix PAGE_IS_PFNZERO detection for the huge zero folio
2025-06-27 20:34:10 -07:00
Edward Adam Davis
6921d1e07c tracing: Fix filter logic error
If the processing of the tr->events loop fails, the filter that has been
added to filter_head will be released twice in free_filter_list(&head->rcu)
and __free_filter(filter).

After adding the filter of tr->events, add the filter to the filter_head
process to avoid triggering uaf.

Link: https://lore.kernel.org/tencent_4EF87A626D702F816CD0951CE956EC32CD0A@qq.com
Fixes: a9d0aab5eb ("tracing: Fix regression of filter waiting a long time on RCU synchronization")
Reported-by: syzbot+daba72c4af9915e9c894@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=daba72c4af9915e9c894
Tested-by: syzbot+daba72c4af9915e9c894@syzkaller.appspotmail.com
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-06-27 15:51:36 -04:00
Mario Limonciello
12ffc3b151 PM: Restrict swap use to later in the suspend sequence
Currently swap is restricted before drivers have had a chance to do
their prepare() PM callbacks. Restricting swap this early means that if
a driver needs to evict some content from memory into sawp in it's
prepare callback, it won't be able to.

On AMD dGPUs this can lead to failed suspends under memory pressure
situations as all VRAM must be evicted to system memory or swap.

Move the swap restriction to right after all devices have had a chance
to do the prepare() callback.  If there is any problem with the sequence,
restore swap in the appropriate dpm resume callbacks or error handling
paths.

Closes: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Nat Wittstock <nat@fardog.io>
Tested-by: Lucian Langa <lucilanga@7pot.org>
Link: https://patch.msgid.link/20250613214413.4127087-1-superm1@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-06-26 20:39:34 +02:00
Leo Yan
1476b21832 perf/aux: Fix pending disable flow when the AUX ring buffer overruns
If an AUX event overruns, the event core layer intends to disable the
event by setting the 'pending_disable' flag. Unfortunately, the event
is not actually disabled afterwards.

In commit:

  ca6c21327c ("perf: Fix missing SIGTRAPs")

the 'pending_disable' flag was changed to a boolean. However, the
AUX event code was not updated accordingly. The flag ends up holding a
CPU number. If this number is zero, the flag is taken as false and the
IRQ work is never triggered.

Later, with commit:

  2b84def990 ("perf: Split __perf_pending_irq() out of perf_pending_irq()")

a new IRQ work 'pending_disable_irq' was introduced to handle event
disabling. The AUX event path was not updated to kick off the work queue.

To fix this bug, when an AUX ring buffer overrun is detected, call
perf_event_disable_inatomic() to initiate the pending disable flow.

Also update the outdated comment for setting the flag, to reflect the
boolean values (0 or 1).

Fixes: 2b84def990 ("perf: Split __perf_pending_irq() out of perf_pending_irq()")
Fixes: ca6c21327c ("perf: Fix missing SIGTRAPs")
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Liang Kan <kan.liang@linux.intel.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20250625170737.2918295-1-leo.yan@arm.com
2025-06-26 10:50:37 +02:00
Linus Torvalds
ee88bddf7f bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmhcdnsACgkQ6rmadz2v
 bTqkRA//f024qEkYGrnnkRk1ZoOuKWk7DEUvw/J+us9dhPvJABmUHL3ZuMmDp1D/
 EgGWAMg1q8tsXvlnAR4mV25T1DLpfMmo6hzwZgVeGl3X9YqTCPbgBONRr6F1HXP4
 OXHnm9vHVcki8z0vPIUHsAbudp0PrXx9lSUssT3kCoZuV0xeQKTznvUS9HwGC8vP
 ex59XrkNaUeEyVozsa0YFHtT57NAH/77QSj1A5HC/x0u9SJroao18ct3b/5t7QdQ
 N4hcc/GH+xoGDyXPFYFlst9kXmYwCpz26w8bCpBY5x0Red+LhkvHwRv6KM1Czl3J
 f9da+S2qbetqeiGJwg8/lNLnHQcgqUifYu5lr35ijpxf7Qgyw0jbT+Cy2kd68GcC
 J0GCminZep+bsKARriq9+ZBcm282xBTfzBN4936HTxC6zh41J+jdbOC62Gw+pXju
 9EJwQmY59KPUyDKz5mUm48NmY4g7Zcvk2y7kCaiD5Np+WR1eFbWT7v6eAchA+JRi
 tRfTR5eqSS17GybfrPntto2aoydEC2rPublMTu2OT3bjJe2WPf4aFZaGmOoQZwX2
 97sa0hpMSbf4zS7h1mqHQ9y3p9qvXTwzWikm1fjFeukvb53GiRYxax5LutpePxEU
 OFHREy4InWHdCet0Irr8u44UbrAkxiNUYBD5KLQO/ZUlrMmsrBI=
 =Buaz
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix use-after-free in libbpf when map is resized (Adin Scannell)

 - Fix verifier assumptions about 2nd argument of bpf_sysctl_get_name
   (Jerome Marchand)

 - Fix verifier assumption of nullness of d_inode in dentry (Song Liu)

 - Fix global starvation of LRU map (Willem de Bruijn)

 - Fix potential NULL dereference in btf_dump__free (Yuan Chen)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: adapt one more case in test_lru_map to the new target_free
  libbpf: Fix possible use-after-free for externs
  selftests/bpf: Convert test_sysctl to prog_tests
  bpf: Specify access type of bpf_sysctl_get_name args
  libbpf: Fix null pointer dereference in btf_dump__free on allocation failure
  bpf: Adjust free target to avoid global starvation of LRU map
  bpf: Mark dentry->d_inode as trusted_or_null
2025-06-25 21:09:02 -07:00
Linus Torvalds
c5c2a8b497 Several mount-related fixes
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaFx0bQAKCRBZ7Krx/gZQ
 63yTAQC4NS7qopT8BQGn3aM+t8YjYo36BTeSRcSy4hVEAFrEJAD/WyW5Dcy1lWZR
 S8g8rqRimsCepwxqTinYJlS7H8S56ws=
 =CmGc
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull mount fixes from Al Viro:
 "Several mount-related fixes"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  userns and mnt_idmap leak in open_tree_attr(2)
  attach_recursive_mnt(): do not lock the covering tree when sliding something under it
  replace collect_mounts()/drop_collected_mounts() with a safer variant
2025-06-25 20:48:48 -07:00
Arnd Bergmann
b6f5e74858 crashdump: add CONFIG_KEYS dependency
The dm_crypt code fails to build without CONFIG_KEYS:

kernel/crash_dump_dm_crypt.c: In function 'restore_dm_crypt_keys_to_thread_keyring':
kernel/crash_dump_dm_crypt.c:105:9: error: unknown type name 'key_ref_t'; did you mean 'key_ref_put'?

There is a mix of 'select KEYS' and 'depends on KEYS' in Kconfig,
so there is no single obvious solution here, but generally using 'depends on'
makes more sense and is less likely to cause dependency loops.

Link: https://lkml.kernel.org/r/20250620112140.3396316-1-arnd@kernel.org
Fixes: 62f17d9df6 ("crash_dump: retrieve dm crypt keys in kdump kernel")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Coiby Xu <coxu@redhat.com>
Cc: Dave Vasilevsky <dave@vasilevsky.ca>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-25 15:55:04 -07:00
Jerome Marchand
2eb7648558 bpf: Specify access type of bpf_sysctl_get_name args
The second argument of bpf_sysctl_get_name() helper is a pointer to a
buffer that is being written to. However that isn't specify in the
prototype.

Until commit 37cce22dbd ("bpf: verifier: Refactor helper access
type tracking"), all helper accesses were considered as a possible
write access by the verifier, so no big harm was done. However, since
then, the verifier might make wrong asssumption about the content of
that address which might lead it to make faulty optimizations (such as
removing code that was wrongly labeled dead). This is what happens in
test_sysctl selftest to the tests related to sysctl_get_name.

Add MEM_WRITE flag the second argument of bpf_sysctl_get_name().

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250619140603.148942-2-jmarchan@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-23 21:50:44 -07:00
Al Viro
7484e15dbb replace collect_mounts()/drop_collected_mounts() with a safer variant
collect_mounts() has several problems - one can't iterate over the results
directly, so it has to be done with callback passed to iterate_mounts();
it has an oopsable race with d_invalidate(); it creates temporary clones
of mounts invisibly for sync umount (IOW, you can have non-lazy umount
succeed leaving filesystem not mounted anywhere and yet still busy).

A saner approach is to give caller an array of struct path that would pin
every mount in a subtree, without cloning any mounts.

        * collect_mounts()/drop_collected_mounts()/iterate_mounts() is gone
        * collect_paths(where, preallocated, size) gives either ERR_PTR(-E...) or
a pointer to array of struct path, one for each chunk of tree visible under
'where' (i.e. the first element is a copy of where, followed by (mount,root)
for everything mounted under it - the same set collect_mounts() would give).
Unlike collect_mounts(), the mounts are *not* cloned - we just get pinning
references to the roots of subtrees in the caller's namespace.
        Array is terminated by {NULL, NULL} struct path.  If it fits into
preallocated array (on-stack, normally), that's where it goes; otherwise
it's allocated by kmalloc_array().  Passing 0 as size means that 'preallocated'
is ignored (and expected to be NULL).
        * drop_collected_paths(paths, preallocated) is given the array returned
by an earlier call of collect_paths() and the preallocated array passed to that
call.  All mount/dentry references are dropped and array is kfree'd if it's not
equal to 'preallocated'.
        * instead of iterate_mounts(), users should just iterate over array
of struct path - nothing exotic is needed for that.  Existing users (all in
audit_tree.c) are converted.

[folded a fix for braino reported by Venkat Rao Bagalkote <venkat88@linux.ibm.com>]

Fixes: 80b5dce8c5 ("vfs: Add a function to lazily unmount all mounts from any dentry")
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-23 14:01:49 -04:00
Linus Torvalds
c06944560a 20 hotfixes. 7 are cc:stable and the remainder address post-6.15 issues
or aren't considered necessary for -stable kernels.  Only 4 are for MM.
 
 - The 3 patch series `Revert "bcache: update min_heap_callbacks to use
   default builtin swap"' from Kuan-Wei Chiu backs out the author's recent
   min_heap changes due to a performance regression.  A fix for this
   regression has been developed but we felt it best to go back to the
   known-good version to give the new code more bake time.
 
 - A lot of MAINTAINERS maintenance.  I like to get these changes
   upstreamed promptly because they can't break things and more
   accurate/complete MAINTAINERS info hopefully improves the speed and
   accuracy of our responses to submitters and reporters.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaFizWwAKCRDdBJ7gKXxA
 jhivAQDGQXgzgzPCu/5/fTQjjq+D/8M2QjGxNy4o1itKoK+fYAEAzQGTL/8ay9FY
 yhcipreU4A3lrxf94iOidiBCYkZaOgk=
 =kFFb
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-06-22-18-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "20 hotfixes. 7 are cc:stable and the remainder address post-6.15
  issues or aren't considered necessary for -stable kernels. Only 4 are
  for MM.

   - The series `Revert "bcache: update min_heap_callbacks to use
     default builtin swap"' from Kuan-Wei Chiu backs out the author's
     recent min_heap changes due to a performance regression.

     A fix for this regression has been developed but we felt it best to
     go back to the known-good version to give the new code more bake
     time.

   - A lot of MAINTAINERS maintenance.

     I like to get these changes upstreamed promptly because they can't
     break things and more accurate/complete MAINTAINERS info hopefully
     improves the speed and accuracy of our responses to submitters and
     reporters"

* tag 'mm-hotfixes-stable-2025-06-22-18-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  MAINTAINERS: add additional mmap-related files to mmap section
  MAINTAINERS: add memfd, shmem quota files to shmem section
  MAINTAINERS: add stray rmap file to mm rmap section
  MAINTAINERS: add hugetlb_cgroup.c to hugetlb section
  MAINTAINERS: add further init files to mm init block
  MAINTAINERS: update maintainers for HugeTLB
  maple_tree: fix MA_STATE_PREALLOC flag in mas_preallocate()
  MAINTAINERS: add missing test files to mm gup section
  MAINTAINERS: add missing mm/workingset.c file to mm reclaim section
  selftests/mm: skip uprobe vma merge test if uprobes are not enabled
  bcache: remove unnecessary select MIN_HEAP
  Revert "bcache: remove heap-related macros and switch to generic min_heap"
  Revert "bcache: update min_heap_callbacks to use default builtin swap"
  selftests/mm: add configs to fix testcase failure
  kho: initialize tail pages for higher order folios properly
  MAINTAINERS: add linux-mm@ list to Kexec Handover
  mm: userfaultfd: fix race of userfaultfd_move and swap cache
  mm/gup: revert "mm: gup: fix infinite loop within __get_longterm_locked"
  selftests/mm: increase timeout from 180 to 900 seconds
  mm/shmem, swap: fix softlockup with mTHP swapin
2025-06-23 09:20:39 -07:00
Linus Torvalds
33efa7dbab - Fix missing prototypes warnings
- Properly initialize work context when allocating it
 
 - Remove a method tracking when managed interrupts are suspended during
   hotplug, in favor of the code using a IRQ disable depth tracking now,
   and have interrupts get properly enabled again on restore
 
 - Make sure multiple CPUs getting hotplugged don't cause wrong tracking of the
   managed IRQ disable depth
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhXxGUACgkQEsHwGGHe
 VUoByw/+PGya16eguP068pvrd4XxvYs11HlN/HZwQKxOM9n1v7g1dP4xVJB0Cz2C
 wFKvWYAkJRqu9O+Z92YDsixEF6KEPQzGZApOQ4Ousb2gnX4+5nfjQFswcTArRyKp
 7cMJmmhQXRN3U6QcfX6GX3fHj/m6k7sBQYuMqNV/ac67iKBAa41EI5HLHW6Ojxri
 i02bDQGOQIbmCX/O5IQymJOAMJTYSB3INyeAjRg8Vz5oyJgfJGY3My0LldBFCNFO
 R7YZ26zZNf2UMLifF3W6FNTJGBsmfxaKoNXWnQ9zOjlxWCccGStxem44RXFzs5lk
 OkUS1KPmZ7wRvXp5n7/AMaj6XvSm31To8SJVzpgxzGGGAfgC9xA0+MW1TOKU5RUo
 RLEHqOufz4YY1oz70mb/1eZT225+rOfHDpvPzWb44HyezOzo1rLTvonysV59oXEz
 oYYHLrXkVeGU9TcMdVqPw8X0ZDqg2VK0BReqFBXKgBKsZPKB5kFCNPKrMi++rgCG
 6f+6jD/yhsnvnLitsk2ogqvBA/GExnc2wW0d0BM9xeMQUC1GfDrrs6ktlbOXKyWa
 +F/yaH2vxzvugNn8M4rxHBGnzsTuRjBgCctqi8uouuXnMBpcgCs0yDolCRMoaRiY
 slttArffkYCUV5/TRiPSxIIdOdUkQMSEFZA234ZIGtVW1d70Q64=
 =WBMv
 -----END PGP SIGNATURE-----

Merge tag 'irq_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Fix missing prototypes warnings

 - Properly initialize work context when allocating it

 - Remove a method tracking when managed interrupts are suspended during
   hotplug, in favor of the code using a IRQ disable depth tracking now,
   and have interrupts get properly enabled again on restore

 - Make sure multiple CPUs getting hotplugged don't cause wrong tracking
   of the managed IRQ disable depth

* tag 'irq_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/ath79-misc: Fix missing prototypes warnings
  genirq/irq_sim: Initialize work context pointers properly
  genirq/cpuhotplug: Restore affinity even for suspended IRQ
  genirq/cpuhotplug: Rebalance managed interrupts across multi-CPU hotplug
2025-06-22 10:17:51 -07:00
Linus Torvalds
17ef32ae66 - Avoid a crash on a heterogeneous machine where not all cores support the
same hw events features
 
 - Avoid a deadlock when throttling events
 
 - Document the perf event states more
 
 - Make sure a number of perf paths switching off or rescheduling events call
   perf_cgroup_event_disable()
 
 - Make sure perf does task sampling before its userspace mapping is torn down,
   and not after
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhXvYYACgkQEsHwGGHe
 VUotdg//TXchNnZ9xcGKSTFphDQMWVIy1cRWUffWC5ewUhjE9H7+FMZvCmvih8uc
 uvAsZ92GXE64fuzF0tU/5ybWEgca6HPbgI8aOhnk+vo9Yzxj9/0eO0SKK8qqSvzo
 ecn/p9yX4/jD86kIo6K279z7ZX8/0tSLselnicrGy1r4RGuaebEAvXDEzZm8p/c6
 0MjaTGC4TzkZkGyEeWXRt7jewiWvXO+91TqqwMyrhmIG3cs2TCbPhSn0QowXUZsF
 PdCJA5Z+vKp6j8n8fohRTFoATSRw5xAoqT+JRmPZ2K3QOCwtf1X0MbM6ZKkapgZO
 Y4Tp3HPw9yHUu8cyvEEwqU0jDn4J0EaqFgwCrxzvQj9ufkHBlPgNahjXW5upcw4k
 TV3qEp6KKfywTWWExh6Gjie7y7Hq3aHOkJVCg/ZeQjwMXhpZg7z+mGwh7x08Jn/2
 9/bpLG8Gl8eto3G6L1px/NUMc4poZTbSheKrjEMt3Z6ErNoAR4gb7SO547Lvf8HK
 bty5NZftDUNv42bqqXI0GY7YXKkr1AtHdRDlTeLlc5YmPzhIyG3LgEi4BqN3gyFf
 emh/CFG/1KT8GWxNCrPW6d01TBRswZjFyBDHL89HO3i0r2nDe98+2fLmllnl2Bv2
 EadgGE1XWv6RB5APJ726HXqMgtXM9cHRMogKMhiHNZnwkQba+ug=
 =nnMl
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Avoid a crash on a heterogeneous machine where not all cores support
   the same hw events features

 - Avoid a deadlock when throttling events

 - Document the perf event states more

 - Make sure a number of perf paths switching off or rescheduling events
   call perf_cgroup_event_disable()

 - Make sure perf does task sampling before its userspace mapping is
   torn down, and not after

* tag 'perf_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix crash in icl_update_topdown_event()
  perf: Fix the throttle error of some clock events
  perf: Add comment to enum perf_event_state
  perf/core: Fix WARN in perf_cgroup_switch()
  perf: Fix dangling cgroup pointer in cpuctx
  perf: Fix cgroup state vs ERROR
  perf: Fix sample vs do_exit()
2025-06-22 10:11:45 -07:00
Linus Torvalds
aff2a7e23f - Make sure the switch to the global hash is requested always under a lock so
that two threads requesting that simultaneously cannot get to inconsistent
   state
 
 - Reject negative NUMA nodes earlier in the futex NUMA interface handling code
 
 - Selftests fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhXudYACgkQEsHwGGHe
 VUqtnxAAosFnuc5wJUJnZoqNQFQAQZruINu6CbxCd17aeeJ4aGu1A+aMfBQdPTiW
 jtKgvE/ES6c+mz4Nuj/YaiSKK2TnuNPcC1OcCHZTZ4UYwHMHKnjFspgAmzxL7Hpm
 of4iXDCdf5B36Y0eGO1521/KJj7MvU/z5Oe4rvaTv3MSL1XwjRfeG5XUHBk4iHgk
 SUS9VaEXDoR4mJjByC1yeVeRWR1ntvItT4OeMCATBeGTccVnr4xSVA9eTJyzl+n0
 3bBGIElijD9qJtL2ahTxm11kwy34uKC8mQPhr7FwzPcaih4xVHv0ys9cBWcn8ulH
 YDeK+rxA/eLlM6sL3jy8hskDWE6LvYpy52JgigImdhgUNSbFjkIb4gio8FnHRwyJ
 VgoHsmjXxnzIlvu5RCzYXzGwAIBemwSrkzcRPUuE0HBf7yVvYPKorjxSEJa2lGAV
 WAZkGn1ftEfWf/DtFBxTu/cjGLWoscEfT9yaJqWjayVMoVW3FO7yBUgdPTC0x18x
 d1QPhwZWSu1TyYzPs7t/jAvrxng2OpZyc2mxaUKyBjdr83Myz5FSin+6hhjP+36z
 MmAbahTxncjeyiyNdhweZJBGg2vGNCahTSbfc9+AxGa7c1WuhjYAmPKdAosvutPN
 6VdE9Ty3QuIjsTlpqLuFMOnkMyTN1VjGtB7OO/TtH7LovNpAWp4=
 =HGVi
 -----END PGP SIGNATURE-----

Merge tag 'locking_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Borislav Petkov:

 - Make sure the switch to the global hash is requested always under a
   lock so that two threads requesting that simultaneously cannot get to
   inconsistent state

 - Reject negative NUMA nodes earlier in the futex NUMA interface
   handling code

 - Selftests fixes

* tag 'locking_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Verify under the lock if hash can be replaced
  futex: Handle invalid node numbers supplied by user
  selftests/futex: Set the home_node in futex_numa_mpol
  selftests/futex: getopt() requires int as return value.
2025-06-22 10:09:23 -07:00
Uladzislau Rezki (Sony)
33b6a1f155 rcu: Return early if callback is not specified
Currently the call_rcu() API does not check whether a callback
pointer is NULL. If NULL is passed, rcu_core() will try to invoke
it, resulting in NULL pointer dereference and a kernel crash.

To prevent this and improve debuggability, this patch adds a check
for NULL and emits a kernel stack trace to help identify a faulty
caller.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-06-20 15:31:48 -04:00
Pratyush Yadav
12b9a2c05d kho: initialize tail pages for higher order folios properly
Currently, when restoring higher order folios, kho_restore_folio() only
calls prep_compound_page() on all the pages.  That is not enough to
properly initialize the folios.  The managed page count does not get
updated, the reserved flag does not get dropped, and page count does not
get initialized properly.

Restoring a higher order folio with it results in the following BUG with
CONFIG_DEBUG_VM when attempting to free the folio:

    BUG: Bad page state in process test  pfn:104e2b
    page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffffffffffffffff pfn:0x104e2b
    flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff)
    raw: 002fffff80000000 0000000000000000 00000000ffffffff 0000000000000000
    raw: ffffffffffffffff 0000000000000000 00000001ffffffff 0000000000000000
    page dumped because: nonzero _refcount
    [...]
    Call Trace:
    <TASK>
    dump_stack_lvl+0x4b/0x70
    bad_page.cold+0x97/0xb2
    __free_frozen_pages+0x616/0x850
    [...]

Combine the path for 0-order and higher order folios, initialize the tail
pages with a count of zero, and call adjust_managed_page_count() to
account for all the pages instead of just missing them.

In addition, since all the KHO-preserved pages get marked with
MEMBLOCK_RSRV_NOINIT by deserialize_bitmap(), the reserved flag is not
actually set (as can also be seen from the flags of the dumped page in the
logs above).  So drop the ClearPageReserved() calls.

[ptyadav@amazon.de: declare i in the loop instead of at the top]
  Link: https://lkml.kernel.org/r/20250613125916.39272-1-pratyush@kernel.org
Link: https://lkml.kernel.org/r/20250605171143.76963-1-pratyush@kernel.org
Fixes: fc33e4b44b ("kexec: enable KHO support for memory preservation")
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-19 20:48:02 -07:00
Willem de Bruijn
d4adf1c9ee bpf: Adjust free target to avoid global starvation of LRU map
BPF_MAP_TYPE_LRU_HASH can recycle most recent elements well before the
map is full, due to percpu reservations and force shrink before
neighbor stealing. Once a CPU is unable to borrow from the global map,
it will once steal one elem from a neighbor and after that each time
flush this one element to the global list and immediately recycle it.

Batch value LOCAL_FREE_TARGET (128) will exhaust a 10K element map
with 79 CPUs. CPU 79 will observe this behavior even while its
neighbors hold 78 * 127 + 1 * 15 == 9921 free elements (99%).

CPUs need not be active concurrently. The issue can appear with
affinity migration, e.g., irqbalance. Each CPU can reserve and then
hold onto its 128 elements indefinitely.

Avoid global list exhaustion by limiting aggregate percpu caches to
half of map size, by adjusting LOCAL_FREE_TARGET based on cpu count.
This change has no effect on sufficiently large tables.

Similar to LOCAL_NR_SCANS and lru->nr_scans, introduce a map variable
lru->free_target. The extra field fits in a hole in struct bpf_lru.
The cacheline is already warm where read in the hot path. The field is
only accessed with the lru lock held.

Tested-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/r/20250618215803.3587312-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-18 18:50:14 -07:00
Linus Torvalds
74b4cc9b87 cgroup: A fix for v6.16-rc2
In cgroup1 freezer, a task migrating into a frozen cgroup might not get
 frozen immediately due to the wrong operation order. Fix it.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaFMgGw4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGe8eAQDHA0joxz9WROpdhU7CVjYPqV2Ncqh3mCMI6apF
 4OMR3gD/ZAQ+Pwc0bRtQ9CfkKgHsemHPK2fUzuSy9OuYkAjUMAo=
 =R/+k
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fix from Tejun Heo:

 - In cgroup1 freezer, a task migrating into a frozen cgroup might not
   get frozen immediately due to the wrong operation order. Fix it.

* tag 'cgroup-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup,freezer: fix incomplete freezing when attaching tasks
2025-06-18 14:25:50 -07:00
Linus Torvalds
0564e6a8c2 workqueue: A fix for v6.16-rc2
Fix missed early init of wq_isolated_cpumask.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaFMe+Q4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGb6HAP90ihRrPDXQkzTq3u06s9AlegSiAWyn2d8Pf5Lb
 vtaksgEAuiJD9sdet2Mn4sqEmr4riJ+KpMEcFyIeZXgSdgpRAQU=
 =ugls
 -----END PGP SIGNATURE-----

Merge tag 'wq-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue fix from Tejun Heo:

 - Fix missed early init of wq_isolated_cpumask

* tag 'wq-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Initialize wq_isolated_cpumask in workqueue_init_early()
2025-06-18 14:22:31 -07:00
Linus Torvalds
4f24bfcc39 sched_ext: Fixes for v6.16-rc2
- Fix a couple bugs in cgroup cpu.weight support.
 
 - Add the new sched-ext@lists.linux.dev to MAINTAINERS.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaFMaxA4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGefOAP9pB97PjjSJaS2c+/S9A+zIUZjWQ4yMRlUvw+zq
 29ymZgD/aWS5nUskfPu3z4ncGrufij5tv8A317PTbFiUMwJHzgE=
 =9PFz
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

 - Fix a couple bugs in cgroup cpu.weight support

 - Add the new sched-ext@lists.linux.dev to MAINTAINERS

* tag 'sched_ext-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext, sched/core: Don't call scx_group_set_weight() prematurely from sched_create_group()
  sched_ext: Make scx_group_set_weight() always update tg->scx.weight
  sched_ext: Update mailing list entry in MAINTAINERS
2025-06-18 14:17:15 -07:00
Chen Ridong
37fb58a727 cgroup,freezer: fix incomplete freezing when attaching tasks
An issue was found:

	# cd /sys/fs/cgroup/freezer/
	# mkdir test
	# echo FROZEN > test/freezer.state
	# cat test/freezer.state
	FROZEN
	# sleep 1000 &
	[1] 863
	# echo 863 > test/cgroup.procs
	# cat test/freezer.state
	FREEZING

When tasks are migrated to a frozen cgroup, the freezer fails to
immediately freeze the tasks, causing the cgroup to remain in the
"FREEZING".

The freeze_task() function is called before clearing the CGROUP_FROZEN
flag. This causes the freezing() check to incorrectly return false,
preventing __freeze_task() from being invoked for the migrated task.

To fix this issue, clear the CGROUP_FROZEN state before calling
freeze_task().

Fixes: f5d39b0208 ("freezer,sched: Rewrite core freezer logic")
Cc: stable@vger.kernel.org # v6.1+
Reported-by: Zhong Jiawei <zhongjiawei1@huawei.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-06-18 09:43:30 -10:00
Steven Rostedt
327e286643 fgraph: Do not enable function_graph tracer when setting funcgraph-args
When setting the funcgraph-args option when function graph tracer is net
enabled, it incorrectly enables it. Worse, it unregisters itself when it
was never registered. Then when it gets enabled again, it will register
itself a second time causing a WARNing.

 ~# echo 1 > /sys/kernel/tracing/options/funcgraph-args
 ~# head -20 /sys/kernel/tracing/trace
 # tracer: nop
 #
 # entries-in-buffer/entries-written: 813/26317372   #P:8
 #
 #                                _-----=> irqs-off/BH-disabled
 #                               / _----=> need-resched
 #                              | / _---=> hardirq/softirq
 #                              || / _--=> preempt-depth
 #                              ||| / _-=> migrate-disable
 #                              |||| /     delay
 #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
 #              | |         |   |||||     |         |
           <idle>-0       [007] d..4.   358.966010:  7)   1.692 us    |          fetch_next_timer_interrupt(basej=4294981640, basem=357956000000, base_local=0xffff88823c3ae040, base_global=0xffff88823c3af300, tevt=0xffff888100e47cb8);
           <idle>-0       [007] d..4.   358.966012:  7)               |          tmigr_cpu_deactivate(nextexp=357988000000) {
           <idle>-0       [007] d..4.   358.966013:  7)               |            _raw_spin_lock(lock=0xffff88823c3b2320) {
           <idle>-0       [007] d..4.   358.966014:  7)   0.981 us    |              preempt_count_add(val=1);
           <idle>-0       [007] d..5.   358.966017:  7)   1.058 us    |              do_raw_spin_lock(lock=0xffff88823c3b2320);
           <idle>-0       [007] d..4.   358.966019:  7)   5.824 us    |            }
           <idle>-0       [007] d..5.   358.966021:  7)               |            tmigr_inactive_up(group=0xffff888100cb9000, child=0x0, data=0xffff888100e47bc0) {
           <idle>-0       [007] d..5.   358.966022:  7)               |              tmigr_update_events(group=0xffff888100cb9000, child=0x0, data=0xffff888100e47bc0) {

Notice the "tracer: nop" at the top there. The current tracer is the "nop"
tracer, but the content is obviously the function graph tracer.

Enabling function graph tracing will cause it to register again and
trigger a warning in the accounting:

 ~# echo function_graph > /sys/kernel/tracing/current_tracer
 -bash: echo: write error: Device or resource busy

With the dmesg of:

 ------------[ cut here ]------------
 WARNING: CPU: 7 PID: 1095 at kernel/trace/ftrace.c:3509 ftrace_startup_subops+0xc1e/0x1000
 Modules linked in: kvm_intel kvm irqbypass
 CPU: 7 UID: 0 PID: 1095 Comm: bash Not tainted 6.16.0-rc2-test-00006-gea03de4105d3 #24 PREEMPT
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
 RIP: 0010:ftrace_startup_subops+0xc1e/0x1000
 Code: 48 b8 22 01 00 00 00 00 ad de 49 89 84 24 88 01 00 00 8b 44 24 08 89 04 24 e9 c3 f7 ff ff c7 04 24 ed ff ff ff e9 b7 f7 ff ff <0f> 0b c7 04 24 f0 ff ff ff e9 a9 f7 ff ff c7 04 24 f4 ff ff ff e9
 RSP: 0018:ffff888133cff948 EFLAGS: 00010202
 RAX: 0000000000000001 RBX: 1ffff1102679ff31 RCX: 0000000000000000
 RDX: 1ffffffff0b27a60 RSI: ffffffff8593d2f0 RDI: ffffffff85941140
 RBP: 00000000000c2041 R08: ffffffffffffffff R09: ffffed1020240221
 R10: ffff88810120110f R11: ffffed1020240214 R12: ffffffff8593d2f0
 R13: ffffffff8593d300 R14: ffffffff85941140 R15: ffffffff85631100
 FS:  00007f7ec6f28740(0000) GS:ffff8882b5251000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f7ec6f181c0 CR3: 000000012f1d0005 CR4: 0000000000172ef0
 Call Trace:
  <TASK>
  ? __pfx_ftrace_startup_subops+0x10/0x10
  ? find_held_lock+0x2b/0x80
  ? ftrace_stub_direct_tramp+0x10/0x10
  ? ftrace_stub_direct_tramp+0x10/0x10
  ? trace_preempt_on+0xd0/0x110
  ? __pfx_trace_graph_entry_args+0x10/0x10
  register_ftrace_graph+0x4d2/0x1020
  ? tracing_reset_online_cpus+0x14b/0x1e0
  ? __pfx_register_ftrace_graph+0x10/0x10
  ? ring_buffer_record_enable+0x16/0x20
  ? tracing_reset_online_cpus+0x153/0x1e0
  ? __pfx_tracing_reset_online_cpus+0x10/0x10
  ? __pfx_trace_graph_return+0x10/0x10
  graph_trace_init+0xfd/0x160
  tracing_set_tracer+0x500/0xa80
  ? __pfx_tracing_set_tracer+0x10/0x10
  ? lock_release+0x181/0x2d0
  ? _copy_from_user+0x26/0xa0
  tracing_set_trace_write+0x132/0x1e0
  ? __pfx_tracing_set_trace_write+0x10/0x10
  ? ftrace_graph_func+0xcc/0x140
  ? ftrace_stub_direct_tramp+0x10/0x10
  ? ftrace_stub_direct_tramp+0x10/0x10
  ? ftrace_stub_direct_tramp+0x10/0x10
  vfs_write+0x1d0/0xe90
  ? __pfx_vfs_write+0x10/0x10

Have the setting of the funcgraph-args check if function_graph tracer is
the current tracer of the instance, and if not, do nothing, as there's
nothing to do (the option is checked when function_graph tracing starts).

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20250618073801.057ea636@gandalf.local.home
Fixes: c7a60a733c ("ftrace: Have funcgraph-args take affect during tracing")
Closes: https://lore.kernel.org/all/4ab1a7bdd0174ab09c7b0d68cdbff9a4@huawei.com/
Reported-by: Changbin Du <changbin.du@huawei.com>
Tested-by: Changbin Du <changbin.du@huawei.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-06-18 07:43:22 -04:00
Chuyi Zhou
261dce3d64 workqueue: Initialize wq_isolated_cpumask in workqueue_init_early()
Now when isolcpus is enabled via the cmdline, wq_isolated_cpumask does
not include these isolated CPUs, even wq_unbound_cpumask has already
excluded them. It is only when we successfully configure an isolate cpuset
partition that wq_isolated_cpumask gets overwritten by
workqueue_unbound_exclude_cpumask(), including both the cmdline-specified
isolated CPUs and the isolated CPUs within the cpuset partitions.

Fix this issue by initializing wq_isolated_cpumask properly in
workqueue_init_early().

Fixes: fe28f631fa ("workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from wq_unbound_cpumask")
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-06-17 08:58:29 -10:00
Tejun Heo
33796b9187 sched_ext, sched/core: Don't call scx_group_set_weight() prematurely from sched_create_group()
During task_group creation, sched_create_group() calls
scx_group_set_weight() with CGROUP_WEIGHT_DFL to initialize the sched_ext
portion. This is premature and ends up calling ops.cgroup_set_weight() with
an incorrect @cgrp before ops.cgroup_init() is called.

sched_create_group() should just initialize SCX related fields in the new
task_group. Fix it by factoring out scx_tg_init() from sched_init() and
making sched_create_group() call that function instead of
scx_group_set_weight().

v2: Retain CONFIG_EXT_GROUP_SCHED ifdef in sched_init() as removing it leads
    to build failures on !CONFIG_GROUP_SCHED configs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 8195136669 ("sched_ext: Add cgroup support")
Cc: stable@vger.kernel.org # v6.12+
2025-06-17 08:19:55 -10:00
Tejun Heo
c50784e99f sched_ext: Make scx_group_set_weight() always update tg->scx.weight
Otherwise, tg->scx.weight can go out of sync while scx_cgroup is not enabled
and ops.cgroup_init() may be called with a stale weight value.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 8195136669 ("sched_ext: Add cgroup support")
Cc: stable@vger.kernel.org # v6.12+
2025-06-17 08:19:43 -10:00
Song Liu
a766cfbbeb bpf: Mark dentry->d_inode as trusted_or_null
LSM hooks such as security_path_mknod() and security_inode_rename() have
access to newly allocated negative dentry, which has NULL d_inode.
Therefore, it is necessary to do the NULL pointer check for d_inode.

Also add selftests that checks the verifier enforces the NULL pointer
check.

Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20250613052857.1992233-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-17 08:40:59 -07:00
Oleg Nesterov
f90fff1e15 posix-cpu-timers: fix race between handle_posix_cpu_timers() and posix_cpu_timer_del()
If an exiting non-autoreaping task has already passed exit_notify() and
calls handle_posix_cpu_timers() from IRQ, it can be reaped by its parent
or debugger right after unlock_task_sighand().

If a concurrent posix_cpu_timer_del() runs at that moment, it won't be
able to detect timer->it.cpu.firing != 0: cpu_timer_task_rcu() and/or
lock_task_sighand() will fail.

Add the tsk->exit_state check into run_posix_cpu_timers() to fix this.

This fix is not needed if CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y, because
exit_task_work() is called before exit_notify(). But the check still
makes sense, task_work_add(&tsk->posix_cputimers_work.work) will fail
anyway in this case.

Cc: stable@vger.kernel.org
Reported-by: Benoît Sevens <bsevens@google.com>
Fixes: 0bdd2ed413 ("sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-06-13 10:55:49 -07:00
Gyeyoung Baek
8a2277a3c9 genirq/irq_sim: Initialize work context pointers properly
Initialize `ops` member's pointers properly by using kzalloc() instead of
kmalloc() when allocating the simulation work context. Otherwise the
pointers contain random content leading to invalid dereferencing.

Signed-off-by: Gyeyoung Baek <gye976@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250612124827.63259-1-gye976@gmail.com
2025-06-13 15:36:35 +02:00
Brian Norris
72218d74c9 genirq/cpuhotplug: Restore affinity even for suspended IRQ
Commit 788019eb55 ("genirq: Retain disable depth for managed interrupts
across CPU hotplug") tried to make managed shutdown/startup properly
reference counted, but it missed the fact that the unplug and hotplug code
has an intentional imbalance by skipping IRQS_SUSPENDED interrupts on
the "restore" path.

This means that if a managed-affinity interrupt was both suspended and
managed-shutdown (such as may happen during system suspend / S3), resume
skips calling irq_startup_managed(), and would again have an unbalanced
depth this time, with a positive value (i.e., remaining unexpectedly
masked).

This IRQS_SUSPENDED check was introduced in commit a60dd06af6
("genirq/cpuhotplug: Skip suspended interrupts when restoring affinity")
for essentially the same reason as commit 788019eb55, to prevent that
irq_startup() would unconditionally re-enable an interrupt too early.

Because irq_startup_managed() now respsects the disable-depth count, the
IRQS_SUSPENDED check is not longer needed, and instead, it causes harm.

Thus, drop the IRQS_SUSPENDED check, and restore balance.

This effectively reverts commit a60dd06af6 ("genirq/cpuhotplug: Skip
suspended interrupts when restoring affinity"), because it is replaced
by commit 788019eb55 ("genirq: Retain disable depth for managed
interrupts across CPU hotplug").

Fixes: 788019eb55 ("genirq: Retain disable depth for managed interrupts across CPU hotplug")
Reported-by: Aleksandrs Vinarskis <alex.vinarskis@gmail.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Aleksandrs Vinarskis <alex.vinarskis@gmail.com>
Link: https://lore.kernel.org/all/20250612183303.3433234-3-briannorris@chromium.org
Closes: https://lore.kernel.org/lkml/24ec4adc-7c80-49e9-93ee-19908a97ab84@gmail.com/
2025-06-13 15:13:35 +02:00
Brian Norris
2b32fc8ff0 genirq/cpuhotplug: Rebalance managed interrupts across multi-CPU hotplug
Commit 788019eb55 ("genirq: Retain disable depth for managed interrupts
across CPU hotplug") intended to only decrement the disable depth once per
managed shutdown, but instead it decrements for each CPU hotplug in the
affinity mask, until its depth reaches a point where it finally gets
re-started.

For example, consider:

1. Interrupt is affine to CPU {M,N}
2. disable_irq() -> depth is 1
3. CPU M goes offline -> interrupt migrates to CPU N / depth is still 1
4. CPU N goes offline -> irq_shutdown() / depth is 2
5. CPU N goes online
    -> irq_restore_affinity_of_irq()
       -> irqd_is_managed_and_shutdown()==true
          -> irq_startup_managed() -> depth is 1
6. CPU M goes online
    -> irq_restore_affinity_of_irq()
       -> irqd_is_managed_and_shutdown()==true
          -> irq_startup_managed() -> depth is 0
          *** BUG: driver expects the interrupt is still disabled ***
             -> irq_startup() -> irqd_clr_managed_shutdown()
7. enable_irq() -> depth underflow / unbalanced enable_irq() warning

This should clear the managed-shutdown flag at step 6, so that further
hotplugs don't cause further imbalance.

Note: It might be cleaner to also remove the irqd_clr_managed_shutdown()
invocation from __irq_startup_managed(). But this is currently not possible
because of irq_update_affinity_desc() as it sets IRQD_MANAGED_SHUTDOWN and
expects irq_startup() to clear it.

Fixes: 788019eb55 ("genirq: Retain disable depth for managed interrupts across CPU hotplug")
Reported-by: Aleksandrs Vinarskis <alex.vinarskis@gmail.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Aleksandrs Vinarskis <alex.vinarskis@gmail.com>
Link: https://lore.kernel.org/all/20250612183303.3433234-2-briannorris@chromium.org
2025-06-13 15:13:35 +02:00
Sebastian Andrzej Siewior
69a14d146f futex: Verify under the lock if hash can be replaced
Once the global hash is requested there is no way back to switch back to
the per-task private hash. This is checked at the begin of the function.

It is possible that two threads simultaneously request the global hash
and both pass the initial check and block later on the
mm::futex_hash_lock. In this case the first thread performs the switch
to the global hash. The second thread will also attempt to switch to the
global hash and while doing so, accessing the nonexisting slot 1 of the
struct futex_private_hash.
The same applies if the hash is made immutable: There is no reference
counting and the hash must not be replaced.

Verify under mm_struct::futex_phash that neither the global hash nor an
immutable hash in use.

Tested-by: "Lai, Yi" <yi1.lai@linux.intel.com>
Reported-by: "Lai, Yi" <yi1.lai@linux.intel.com>
Closes: https://lore.kernel.org/all/aDwDw9Aygqo6oAx+@ly-workstation/
Fixes: bd54df5ea7 ("futex: Allow to resize the private local hash")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250610104400.1077266-5-bigeasy@linutronix.de/
2025-06-11 17:24:09 +02:00
Kan Liang
bc4394e5e7 perf: Fix the throttle error of some clock events
Both ARM and IBM CI reports RCU stall, which can be reproduced by the
below perf command.
  perf record -a -e cpu-clock -- sleep 2

The issue is introduced by the generic throttle patch set, which
unconditionally invoke the event_stop() when throttle is triggered.

The cpu-clock and task-clock are two special SW events, which rely on
the hrtimer. The throttle is invoked in the hrtimer handler. The
event_stop()->hrtimer_cancel() waits for the handler to finish, which is
a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
be used to stop the timer.

There may be two ways to fix it:
 - Introduce a PMU flag to track the case. Avoid the event_stop in
   perf_event_throttle() if the flag is detected.
   It has been implemented in the
   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
   The new flag was thought to be an overkill for the issue.
 - Add a check in the event_stop. Return immediately if the throttle is
   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
   method to stop the timer.

The latter is implemented here.

Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
the order the same as perf_event_unthrottle(). Except the patch, no one
checks the hw.interrupts in the stop(). There is no impact from the
order change.

When stops in the throttle, the event should not be updated,
stop(event, 0). But the cpu_clock_event_stop() doesn't handle the flag.
In logic, it's wrong. But it didn't bring any problems with the old
code, because the stop() was not invoked when handling the throttle.
Checking the flag before updating the event.

Fixes: 9734e25fbf ("perf: Fix the throttle logic for a group")
Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
Closes: https://lore.kernel.org/lkml/4ce106d0-950c-aadc-0b6a-f0215cd39913@maine.edu/
Reported-by: Leo Yan <leo.yan@arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20250606192546.915765-1-kan.liang@linux.intel.com
2025-06-11 14:05:08 +02:00
Steven Rostedt
8a157d8a00 tracing: Do not free "head" on error path of filter_free_subsystem_filters()
The variable "head" is allocated and initialized as a list before
allocating the first "item" for the list. If the allocation of "item"
fails, it frees "head" and then jumps to the label "free_now" which will
process head and free it.

This will cause a UAF of "head", and it doesn't need to free it before
jumping to the "free_now" label as that code will free it.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250610093348.33c5643a@gandalf.local.home
Fixes: a9d0aab5eb ("tracing: Fix regression of filter waiting a long time on RCU synchronization")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202506070424.lCiNreTI-lkp@intel.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-06-10 09:39:58 -04:00
Linus Torvalds
be54f8c558 The delayed from_timer() API cleanup:
The renaming to the timer_*() namespace was delayed due massive conflicts
   against Linux-next. Now that everything is upstream finish the conversion.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmhFOY0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRw8D/9ii/hq8jKguupde3UNVsdqICggO7bY
 8PIY8FjZB2z3ALGOML9Pf1yystwnz1wbda9UhgGkKGj2iWvG0wWiN56J6FpksuIn
 08poxMXUsLu7Wu6DaQkQrDwJ2Wu4EMefsxf6YtY/dGLLe553Bh5FHBLr75PO3d1j
 AZNjGXysowzBBr//oSQuP8/MTVXd9KWvPSFPMn9oJZlkFVUbB0a6imjy10tDFC5s
 uLUXwyPhsJvU6lj+B41H1hTNIoTBZexJgRgl1PhuNrN/5FLcUAPUVKbyLo+cCrqt
 iB8WRw7fJu2CaKnSfRIWmi4kSeUP2d4H8oC/W4xymQtzvKNW6l0RIETg40FYqDAs
 wucMBc5FmLzOBnUyoWDpn34NxOND5sHWd42yPHpxowmLWIZ2wAbSR/AHGA9vkmXa
 Ksh8elyTR3swO9PRalrSsg3vM8KhH2RBXDotVFKBGmkay4WkW0TzTyjVDVZd1+bH
 XxGO4PZWOXYcoQ840ocb1UMHdfEZivuaWrY4j5HWzsK/3No5f9ECJ9Dd5p/u6Ju7
 FDmhrhovqKgLGnqo3MBmOeI1zSBsQuqPpRxUG1/gHVl4CYFwhcOU8pk0064ZSN9Q
 RasjoJEghSlwKf4FEHJN9Z3+izoMntZGB3aUG+MXfxbBNJkHmO8/Tb4AwVQ6HyhT
 +xF2fwKHwHyIbw==
 =jJaU
 -----END PGP SIGNATURE-----

Merge tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer cleanup from Thomas Gleixner:
 "The delayed from_timer() API cleanup:

  The renaming to the timer_*() namespace was delayed due massive
  conflicts against Linux-next. Now that everything is upstream finish
  the conversion"

* tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  treewide, timers: Rename from_timer() to timer_container_of()
2025-06-08 11:33:00 -07:00
Linus Torvalds
538c429a4b tracing fixes:
- Fix regression of waiting a long time on updating trace event filters
 
   When the faultable trace points were added, it needed task trace RCU
   synchronization. This was added to the tracepoint_synchronize_unregister()
   function. The filter logic always called this function whenever it
   updated the trace event filters before freeing the old filters.
   This increased the time of "trace-cmd record" from taking 13 seconds
   to running over 2 minutes to complete.
 
   Move the freeing of the filters to call_rcu*() logic, which brings the
   time back down to 13 seconds.
 
 - Fix ring_buffer_subbuf_order_set() error path lock protection
 
   The error path of the ring_buffer_subbuf_order_set() released the
   mutex too early and allowed subsequent accesses to setting the
   subbuffer size to corrupt the data and cause a bug.
 
   By moving the mutex locking to the end of the error path, it prevents
   the reentrant access to the critical data and also allows the function
   to convert the taking of the mutex over to the guard() logic.
 
 - Remove unused power management clock events
 
   The clock events were added in 2010 for power management. In 2011
   arm used them. In 2013 the code they were used in was removed.
   These events have been wasting memory since then.
 
 - Fix sparse warnings
 
   There was a few places that sparse warned about trace_events_filter.c
   where file->filter was referenced directly, but it is annotated with
   an __rcu tag. Use the helper functions and fix them up to use
   rcu_dereference() properly.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaEST0xQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qgdSAPoD7L17oeiP5KQkM0wPuPBz0tmJF7XE
 2VmHp1lBu5rYwgEAyHTD7SqWvInMMp9sGt5tzkByXpOsYC65/RprkbFpXwA=
 =s4wK
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull more tracing fixes from Steven Rostedt:

 - Fix regression of waiting a long time on updating trace event filters

   When the faultable trace points were added, it needed task trace RCU
   synchronization.

   This was added to the tracepoint_synchronize_unregister() function.
   The filter logic always called this function whenever it updated the
   trace event filters before freeing the old filters. This increased
   the time of "trace-cmd record" from taking 13 seconds to running over
   2 minutes to complete.

   Move the freeing of the filters to call_rcu*() logic, which brings
   the time back down to 13 seconds.

 - Fix ring_buffer_subbuf_order_set() error path lock protection

   The error path of the ring_buffer_subbuf_order_set() released the
   mutex too early and allowed subsequent accesses to setting the
   subbuffer size to corrupt the data and cause a bug.

   By moving the mutex locking to the end of the error path, it prevents
   the reentrant access to the critical data and also allows the
   function to convert the taking of the mutex over to the guard()
   logic.

 - Remove unused power management clock events

   The clock events were added in 2010 for power management. In 2011 arm
   used them. In 2013 the code they were used in was removed. These
   events have been wasting memory since then.

 - Fix sparse warnings

   There was a few places that sparse warned about trace_events_filter.c
   where file->filter was referenced directly, but it is annotated with
   an __rcu tag. Use the helper functions and fix them up to use
   rcu_dereference() properly.

* tag 'trace-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Add rcu annotation around file->filter accesses
  tracing: PM: Remove unused clock events
  ring-buffer: Fix buffer locking in ring_buffer_subbuf_order_set()
  tracing: Fix regression of filter waiting a long time on RCU synchronization
2025-06-08 08:19:01 -07:00
Ingo Molnar
41cb08555c treewide, timers: Rename from_timer() to timer_container_of()
Move this API to the canonical timer_*() namespace.

[ tglx: Redone against pre rc1 ]

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-06-08 09:07:37 +02:00
Linus Torvalds
8630c59e99 Kbuild updates for v6.16
- Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which exports a
    symbol only to specified modules
 
  - Improve ABI handling in gendwarfksyms
 
  - Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n
 
  - Add checkers for redundant or missing <linux/export.h> inclusion
 
  - Deprecate the extra-y syntax
 
  - Fix a genksyms bug when including enum constants from *.symref files
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmhEZc4VHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGVAgQAKLRdBGga1kBJJFIkUOHWC5+g/je
 U/dO5rGnuOLviWDexC6QT8AQV2N+dQXhB11x+KacSu1bwowsEvwuegtA6VqwbETs
 tyWmB0PftEzVyPfc+Rjfy0LDfKkiKkm4RhXiMwcem/rlw45gvJXrVU7jJin9fI3A
 So8glpOAX+mEizUHkjZkS51nkYCZFDsn7hVo0X43vqjeFrrFGLEQ5xas4Ci+dkY3
 9g8Q5bFL8CC5PHjSO8wFftCcAWwTukAht6CSSb522MKGnCVZ9RxTmRwEPXrBmXtS
 5eWa8yg6y0tFVmot8iwZGBYleAWDNsj0a2j2oVjUN+EF91sk3WQApJVNBok/nQFb
 4MgO3N3UXZdy4tYkBX8tMgOcGkfjZAFoNxSUm5oVouh9NyT0dpqYHhJHBNVbVJoF
 igQWeVOYcioDjeU1iXnP2cw64q44ROfxmOpDxOSRz9PTM6CCya1R0m/zzBLV6Lwk
 rzlXk1LLf+jIfgmS5RLlkCgrXS1U0vNGXxQH9Ui9dZSEtzdU7qt5WQ/Rz44bEBhS
 OeIlJfMMx6QYJztJc/BaUjkKsutTkII52QctRbRCj/nKswHd8SnHV+xk1c2WPxrg
 yKq10rPpdg1BcvmODY6cmcndt7ogDRfkogm2gvGQIBZEglRimpmpg51sZQRD0ueE
 0rt12TmktsLbglB4
 =Dy49
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which
   exports a symbol only to specified modules

 - Improve ABI handling in gendwarfksyms

 - Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n

 - Add checkers for redundant or missing <linux/export.h> inclusion

 - Deprecate the extra-y syntax

 - Fix a genksyms bug when including enum constants from *.symref files

* tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits)
  genksyms: Fix enum consts from a reference affecting new values
  arch: use always-$(KBUILD_BUILTIN) for vmlinux.lds
  kbuild: set y instead of 1 to KBUILD_{BUILTIN,MODULES}
  efi/libstub: use 'targets' instead of extra-y in Makefile
  module: make __mod_device_table__* symbols static
  scripts/misc-check: check unnecessary #include <linux/export.h> when W=1
  scripts/misc-check: check missing #include <linux/export.h> when W=1
  scripts/misc-check: add double-quotes to satisfy shellcheck
  kbuild: move W=1 check for scripts/misc-check to top-level Makefile
  scripts/tags.sh: allow to use alternative ctags implementation
  kconfig: introduce menu type enum
  docs: symbol-namespaces: fix reST warning with literal block
  kbuild: link lib-y objects to vmlinux forcibly even when CONFIG_MODULES=n
  tinyconfig: enable CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
  docs/core-api/symbol-namespaces: drop table of contents and section numbering
  modpost: check forbidden MODULE_IMPORT_NS("module:") at compile time
  kbuild: move kbuild syntax processing to scripts/Makefile.build
  Makefile: remove dependency on archscripts for header installation
  Documentation/kbuild: Add new gendwarfksyms kABI rules
  Documentation/kbuild: Drop section numbers
  ...
2025-06-07 10:05:35 -07:00
Steven Rostedt
549e914c96 tracing: Add rcu annotation around file->filter accesses
Running sparse on trace_events_filter.c triggered several warnings about
file->filter being accessed directly even though it's annotated with __rcu.

Add rcu_dereference() around it and shuffle the logic slightly so that
it's always referenced via accessor functions.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250607102821.6c7effbf@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-06-07 10:31:22 -04:00
Linus Torvalds
d3c82f618a 13 hotfixes. 6 are cc:stable and the remainder address post-6.15 issues
or aren't considered necessary for -stable kernels.  11 are for MM.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaENzlAAKCRDdBJ7gKXxA
 joNYAP9n38QNDUoRR6ChFikzzY77q4alD2NL0aqXBZdcSRXoUgEAlQ8Ea+t6xnzp
 GnH+cnsA6FDp4F6lIoZBdENJyBYrkQE=
 =ud9O
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-06-06-16-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "13 hotfixes.

  6 are cc:stable and the remainder address post-6.15 issues or aren't
  considered necessary for -stable kernels. 11 are for MM"

* tag 'mm-hotfixes-stable-2025-06-06-16-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count
  MAINTAINERS: add mm swap section
  kmsan: test: add module description
  MAINTAINERS: add tlb trace events to MMU GATHER AND TLB INVALIDATION
  mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race
  mm/hugetlb: unshare page tables during VMA split, not before
  MAINTAINERS: add Alistair as reviewer of mm memory policy
  iov_iter: use iov_offset for length calculation in iov_iter_aligned_bvec
  mm/mempolicy: fix incorrect freeing of wi_kobj
  alloc_tag: handle module codetag load errors as module load failures
  mm/madvise: handle madvise_lock() failure during race unwinding
  mm: fix vmstat after removing NR_BOUNCE
  KVM: s390: rename PROT_NONE to PROT_TYPE_DUMMY
2025-06-06 21:45:45 -07:00
Linus Torvalds
119b1e61a7 RISC-V Patches for the 6.16 Merge Window, Part 1
* Support for the FWFT SBI extension, which is part of SBI 3.0 and a
   dependency for many new SBI and ISA extensions.
 * Support for getrandom() in the VDSO.
 * Support for mseal.
 * Optimized routines for raid6 syndrome and recovery calculations.
 * kexec_file() supports loading Image-formatted kernel binaries.
 * Improvements to the instruction patching framework to allow for atomic
   instruction patching, along with rules as to how systems need to
   behave in order to function correctly.
 * Support for a handful of new ISA extensions: Svinval, Zicbop, Zabha,
   some SiFive vendor extensions.
 * Various fixes and cleanups, including: misaligned access handling, perf
   symbol mangling, module loading, PUD THPs, and improved uaccess
   routines.
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmhDLP8ZHHBhbG1lcmRh
 YmJlbHRAZ29vZ2xlLmNvbQAKCRAuExnzX7sYiZhFD/4+Zikkld812VjFb9dTF+Wj
 n/x9h86zDwAEFgf2BMIpUQhHru6vtdkO2l/Ky6mQblTPMWLafF4eK85yCsf84sQ0
 +RX4sOMLZ0+qvqxKX+aOFe9JXOWB0QIQuPvgBfDDOV4UTm60sglIxwqOpKcsBEHs
 2nplXXjiv0ckaMFLos8xlwu1uy4A/jMfT3Y9FDcABxYCqBoKOZ1frcL9ezJZbHbv
 BoOKLDH8ZypFxIG/eQ511lIXXtrnLas0l4jHWjrfsWu6pmXTgJasKtbGuH3LoLnM
 G/4qvHufR6lpVUOIL5L0V6PpsmYwDi/ciFIFlc8NH2oOZil3qiVaGSEbJIkWGFu9
 8lWTXQWnbinZbfg2oYbWp8GlwI70vKomtDyYNyB9q9Cq9jyiTChMklRNODr4764j
 ZiEnzc/l4KyvaxUg8RLKCT595lKECiUDnMytbIbunJu05HBqRCoGpBtMVzlQsyUd
 ybkRt3BA7eOR8/xFA7ZZQeJofmiu2yxkBs5ggMo8UnSragw27hmv/OA0mWMXEuaD
 aaWc4ZKpKqf7qLchLHOvEl5ORUhsisyIJgZwOqdme5rQoWorVtr51faA4AKwFAN4
 vcKgc5qJjK8vnpW+rl3LNJF9LtH+h4TgmUI853vUlukPoH2oqRkeKVGSkxG0iAze
 eQy2VjP1fJz6ciRtJZn9aw==
 =cZGy
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.16-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for the FWFT SBI extension, which is part of SBI 3.0 and a
   dependency for many new SBI and ISA extensions

 - Support for getrandom() in the VDSO

 - Support for mseal

 - Optimized routines for raid6 syndrome and recovery calculations

 - kexec_file() supports loading Image-formatted kernel binaries

 - Improvements to the instruction patching framework to allow for
   atomic instruction patching, along with rules as to how systems need
   to behave in order to function correctly

 - Support for a handful of new ISA extensions: Svinval, Zicbop, Zabha,
   some SiFive vendor extensions

 - Various fixes and cleanups, including: misaligned access handling,
   perf symbol mangling, module loading, PUD THPs, and improved uaccess
   routines

* tag 'riscv-for-linus-6.16-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (69 commits)
  riscv: uaccess: Only restore the CSR_STATUS SUM bit
  RISC-V: vDSO: Wire up getrandom() vDSO implementation
  riscv: enable mseal sysmap for RV64
  raid6: Add RISC-V SIMD syndrome and recovery calculations
  riscv: mm: Add support for Svinval extension
  RISC-V: Documentation: Add enough title underlines to CMODX
  riscv: Improve Kconfig help for RISCV_ISA_V_PREEMPTIVE
  MAINTAINERS: Update Atish's email address
  riscv: uaccess: do not do misaligned accesses in get/put_user()
  riscv: process: use unsigned int instead of unsigned long for put_user()
  riscv: make unsafe user copy routines use existing assembly routines
  riscv: hwprobe: export Zabha extension
  riscv: Make regs_irqs_disabled() more clear
  perf symbols: Ignore mapping symbols on riscv
  RISC-V: Kconfig: Fix help text of CMDLINE_EXTEND
  riscv: module: Optimize PLT/GOT entry counting
  riscv: Add support for PUD THP
  riscv: xchg: Prefetch the destination word for sc.w
  riscv: Add ARCH_HAS_PREFETCH[W] support with Zicbop
  riscv: Add support for Zicbop
  ...
2025-06-06 18:05:18 -07:00