2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

47836 Commits

Author SHA1 Message Date
Kumar Kartikeya Dwivedi
92b90f780d bpf: Use architecture provided res_smp_cond_load_acquire
In v2 of rqspinlock [0], we fixed potential problems with WFE usage in
arm64 to fallback to a version copied from Ankur's series [1]. This
logic was moved into arch-specific headers in v3 [2].

However, we missed using the arch-provided res_smp_cond_load_acquire
in commit ebababcd03 ("rqspinlock: Hardcode cond_acquire loops for arm64")
due to a rebasing mistake between v2 and v3 of the rqspinlock series.
Fix the typo to fallback to the arm64 definition as we did in v2.

  [0]: https://lore.kernel.org/bpf/20250206105435.2159977-18-memxor@gmail.com
  [1]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com
  [2]: https://lore.kernel.org/bpf/20250303152305.3195648-9-memxor@gmail.com

Fixes: ebababcd03 ("rqspinlock: Hardcode cond_acquire loops for arm64")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250410145512.1876745-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-10 12:47:07 -07:00
Sebastian Andrzej Siewior
92e250c624 timekeeping: Add a lockdep override in tick_freeze()
tick_freeze() acquires a raw spinlock (tick_freeze_lock). Later in the
callchain (timekeeping_suspend() -> mc146818_avoid_UIP()) the RTC driver
acquires a spinlock which becomes a sleeping lock on PREEMPT_RT.  Lockdep
complains about this lock nesting.

Add a lockdep override for this special case and a comment explaining
why it is okay.

Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250404133429.pnAzf-eF@linutronix.de
Closes: https://lore.kernel.org/all/20250330113202.GAZ-krsjAnurOlTcp-@fat_crate.local/
Closes: https://lore.kernel.org/all/CAP-bSRZ0CWyZZsMtx046YV8L28LhY0fson2g4EqcwRAVN1Jk+Q@mail.gmail.com/
2025-04-09 22:30:39 +02:00
Nam Cao
2424e146be hrtimer: Add missing ACCESS_PRIVATE() for hrtimer::function
The "function" field of struct hrtimer has been changed to private, but
two instances have not been converted to use ACCESS_PRIVATE().

Convert them to use ACCESS_PRIVATE().

Fixes: 04257da0c9 ("hrtimers: Make callback function pointer private")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250408103854.1851093-1-namcao@linutronix.de
Closes: https://lore.kernel.org/oe-kbuild-all/202504071931.vOVl13tt-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202504072155.5UAZjYGU-lkp@intel.com/
2025-04-09 21:00:42 +02:00
Steven Rostedt
e1a453a57b tracing: Do not add length to print format in synthetic events
The following causes a vsnprintf fault:

  # echo 's:wake_lat char[] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events
  # echo 'hist:keys=pid:ts=common_timestamp.usecs if !(common_flags & 0x18)' > /sys/kernel/tracing/events/sched/sched_waking/trigger
  # echo 'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,next_comm,$delta)' > /sys/kernel/tracing/events/sched/sched_switch/trigger

Because the synthetic event's "wakee" field is created as a dynamic string
(even though the string copied is not). The print format to print the
dynamic string changed from "%*s" to "%s" because another location
(__set_synth_event_print_fmt()) exported this to user space, and user
space did not need that. But it is still used in print_synth_event(), and
the output looks like:

          <idle>-0       [001] d..5.   193.428167: wake_lat: wakee=(efault)sshd-sessiondelta=155
    sshd-session-879     [001] d..5.   193.811080: wake_lat: wakee=(efault)kworker/u34:5delta=58
          <idle>-0       [002] d..5.   193.811198: wake_lat: wakee=(efault)bashdelta=91
            bash-880     [002] d..5.   193.811371: wake_lat: wakee=(efault)kworker/u35:2delta=21
          <idle>-0       [001] d..5.   193.811516: wake_lat: wakee=(efault)sshd-sessiondelta=129
    sshd-session-879     [001] d..5.   193.967576: wake_lat: wakee=(efault)kworker/u34:5delta=50

The length isn't needed as the string is always nul terminated. Just print
the string and not add the length (which was hard coded to the max string
length anyway).

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250407154139.69955768@gandalf.local.home
Fixes: 4d38328eb4 ("tracing: Fix synth event printk format for str fields");
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-09 11:34:21 -04:00
Arnd Bergmann
d7b98ae522 dma/contiguous: avoid warning about unused size_bytes
When building with W=1, this variable is unused for configs with
CONFIG_CMA_SIZE_SEL_PERCENTAGE=y:

kernel/dma/contiguous.c:67:26: error: 'size_bytes' defined but not used [-Werror=unused-const-variable=]

Change this to a macro to avoid the warning.

Fixes: c64be2bb1c ("drivers: add Contiguous Memory Allocator")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20250409151557.3890443-1-arnd@kernel.org
2025-04-09 17:28:53 +02:00
Linus Torvalds
bec7dcbc24 Probes fixes for v6.14:
- fprobe: Fix to remove fprobe_hlist_node when module unloading
 
   When a fprobe target module is removed, the fprobe_hlist_node
   should be removed from the fprobe's hash table to prevent reusing
   accidentally if another module is loaded at the same address.
 
 - fprobe: Fix to lock module while registering fprobe
 
  The module containing the function to be probeed is locked using a
   reference counter until the fprobe registration is complete, which
   prevents use after free.
 
 - fprobe-events: Fix possible UAF on modules
 
   Basically as same as above, but in the fprobe-events layer we also
   need to get module reference counter when we find the tracepoint
   in the module.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmf0kJ8bHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8b+bEIALiFuYBn2y26OJnfaRnW
 rgSC2JupswEVg7HwsN5kA/x1ypXl9SPfJGjbiHLUTq9+4KGOBTmY+k5/OpVO+Qkh
 3nYKOkZxKRTglA7hRSTH0rxDV1eobps4nv/xkPjprugcjCGU54+4yb9Hq7Kyflpa
 o8p+VS/0VOJ9f3Iy9a9JRfu9qE7Qzz9USCj4N64WMgx/qczPe27twqFEaUpTf1VW
 Sw9twtKnqGs9hNE2QmhlzUBuq6gOZMXkjH6t1U4pMWBGB51JqZ5ZBhC4kL/5XEIZ
 bEau82El5qdieQC2B7c0RxldceKa4t4QUlJDalZGKpxvTXrCw9rFyv0dRe2cXnKm
 Yo0=
 =I+MO
 -----END PGP SIGNATURE-----

Merge tag 'probes-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

 - fprobe: remove fprobe_hlist_node when module unloading

   When a fprobe target module is removed, the fprobe_hlist_node should
   be removed from the fprobe's hash table to prevent reusing
   accidentally if another module is loaded at the same address.

 - fprobe: lock module while registering fprobe

   The module containing the function to be probeed is locked using a
   reference counter until the fprobe registration is complete, which
   prevents use after free.

 - fprobe-events: fix possible UAF on modules

   Basically as same as above, but in the fprobe-events layer we also
   need to get module reference counter when we find the tracepoint in
   the module.

* tag 'probes-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: fprobe: Cleanup fprobe hash when module unloading
  tracing: fprobe events: Fix possible UAF on modules
  tracing: fprobe: Fix to lock module while registering fprobe
2025-04-08 12:51:34 -07:00
Linus Torvalds
e37f72b3b4 cgroup: Fixes for v6.15-rc1
- A number of cpuset remote partition related fixes and cleanups along with
   selftest updates.
 
 - A change from this merge window made cgroup_rstat_updated_list() called
   outside cgroup_rstat_lock leading to list corruptions. Fix it by
   relocating the call inside the lock.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZ/QMSQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGebUAP0bdg/hIX5OjhREbaDKWoUyAHnHqMdg3Dvngvhp
 d9aOqQD/b1jdVfDINFtb2qjOpizPjyI0ycQxrr9K3DrSYmUAKAs=
 =hFhq
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - A number of cpuset remote partition related fixes and cleanups along
   with selftest updates.

 - A change from this merge window made cgroup_rstat_updated_list()
   called outside cgroup_rstat_lock leading to list corruptions. Fix it
   by relocating the call inside the lock.

* tag 'cgroup-for-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Fix race between newly created partition and dying one
  cgroup: rstat: call cgroup_rstat_updated_list with cgroup_rstat_lock
  selftest/cgroup: Add a remote partition transition test to test_cpuset_prs.sh
  selftest/cgroup: Clean up and restructure test_cpuset_prs.sh
  selftest/cgroup: Update test_cpuset_prs.sh to use | as effective CPUs and state separator
  cgroup/cpuset: Remove unneeded goto in sched_partition_write() and rename it
  cgroup/cpuset: Code cleanup and comment update
  cgroup/cpuset: Don't allow creation of local partition over a remote one
  cgroup/cpuset: Remove remote_partition_check() & make update_cpumasks_hier() handle remote partition
  cgroup/cpuset: Fix error handling in remote_partition_disable()
  cgroup/cpuset: Fix incorrect isolated_cpus update in update_parent_effective_cpumask()
2025-04-08 12:15:05 -07:00
Frederic Weisbecker
56799bc035 perf: Fix hang while freeing sigtrap event
Perf can hang while freeing a sigtrap event if a related deferred
signal hadn't managed to be sent before the file got closed:

perf_event_overflow()
   task_work_add(perf_pending_task)

fput()
   task_work_add(____fput())

task_work_run()
    ____fput()
        perf_release()
            perf_event_release_kernel()
                _free_event()
                    perf_pending_task_sync()
                        task_work_cancel() -> FAILED
                        rcuwait_wait_event()

Once task_work_run() is running, the list of pending callbacks is
removed from the task_struct and from this point on task_work_cancel()
can't remove any pending and not yet started work items, hence the
task_work_cancel() failure and the hang on rcuwait_wait_event().

Task work could be changed to remove one work at a time, so a work
running on the current task can always cancel a pending one, however
the wait / wake design is still subject to inverted dependencies when
remote targets are involved, as pictured by Oleg:

T1                                                      T2

fd = perf_event_open(pid => T2->pid);                  fd = perf_event_open(pid => T1->pid);
close(fd)                                              close(fd)
    <IRQ>                                                  <IRQ>
    perf_event_overflow()                                  perf_event_overflow()
       task_work_add(perf_pending_task)                        task_work_add(perf_pending_task)
    </IRQ>                                                 </IRQ>
    fput()                                                 fput()
        task_work_add(____fput())                              task_work_add(____fput())

    task_work_run()                                        task_work_run()
        ____fput()                                             ____fput()
            perf_release()                                         perf_release()
                perf_event_release_kernel()                            perf_event_release_kernel()
                    _free_event()                                          _free_event()
                        perf_pending_task_sync()                               perf_pending_task_sync()
                            rcuwait_wait_event()                                   rcuwait_wait_event()

Therefore the only option left is to acquire the event reference count
upon queueing the perf task work and release it from the task work, just
like it was done before 3a5465418f ("perf: Fix event leak upon exec and file release")
but without the leaks it fixed.

Some adjustments are necessary to make it work:

* A child event might dereference its parent upon freeing. Care must be
  taken to release the parent last.

* Some places assuming the event doesn't have any reference held and
  therefore can be freed right away must instead put the reference and
  let the reference counting to its job.

Reported-by: "Yi Lai" <yi1.lai@linux.intel.com>
Closes: https://lore.kernel.org/all/Zx9Losv4YcJowaP%2F@ly-workstation/
Reported-by: syzbot+3c4321e10eea460eb606@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/673adf75.050a0220.87769.0024.GAE@google.com/
Fixes: 3a5465418f ("perf: Fix event leak upon exec and file release")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250304135446.18905-1-frederic@kernel.org
2025-04-08 20:55:43 +02:00
Tejun Heo
bc08b15b54 sched_ext: Mark SCX_OPS_HAS_CGROUP_WEIGHT for deprecation
SCX_OPS_HAS_CGROUP_WEIGHT was only used to suppress the missing cgroup
weight support warnings. Now that the warnings are removed, the flag doesn't
do anything. Mark it for deprecation and remove its usage from scx_flatcg.

v2: Actually include the scx_flatcg update.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-and-reviewed-by: Andrea Righi <arighi@nvidia.com>
2025-04-08 08:53:52 -10:00
Tejun Heo
e776b26e37 sched_ext: Remove cpu.weight / cpu.idle unimplemented warnings
sched_ext generates warnings when cpu.weight / cpu.idle are set to
non-default values if the BPF scheduler doesn't implement weight support.
These warnings don't provide much value while adding constant annoyance. A
BPF scheduler may not implement any particular behavior and there's nothing
particularly special about missing cgroup weight support. Drop the warnings.

Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-08 08:00:49 -10:00
Breno Leitao
47068309b5 sched_ext: Use kvzalloc for large exit_dump allocation
Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which
can require large contiguous memory depending on the implementation.
This change prevents allocation failures by allowing the system to fall
back to vmalloc when contiguous memory allocation fails.

Since this buffer is only used for debugging purposes, physical memory
contiguity is not required, making vmalloc a suitable alternative.

Cc: stable@vger.kernel.org
Fixes: 07814a9439 ("sched_ext: Print debug dump after an error exit")
Suggested-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-08 07:53:27 -10:00
Masami Hiramatsu (Google)
a3dc2983ca tracing: fprobe: Cleanup fprobe hash when module unloading
Cleanup fprobe address hash table on module unloading because the
target symbols will be disappeared when unloading module and not
sure the same symbol is mapped on the same address.

Note that this is at least disables the fprobes if a part of target
symbols on the unloaded modules. Unlike kprobes, fprobe does not
re-enable the probe point by itself. To do that, the caller should
take care register/unregister fprobe when loading/unloading modules.
This simplifies the fprobe state managememt related to the module
loading/unloading.

Link: https://lore.kernel.org/all/174343534473.843280.13988101014957210732.stgit@devnote2/

Fixes: 4346ba1604 ("fprobe: Rewrite fprobe on function-graph tracer")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-04-08 08:46:25 +09:00
Andrii Nakryiko
0cd575cab1 uprobes: Avoid false-positive lockdep splat on CONFIG_PREEMPT_RT=y in the ri_timer() uprobe timer callback, use raw_write_seqcount_*()
Avoid a false-positive lockdep warning in the CONFIG_PREEMPT_RT=y
configuration when using write_seqcount_begin() in the uprobe timer
callback by using raw_write_* APIs.

Uprobe's use of timer callback is guaranteed to not race with itself
for a given uprobe_task, and as such seqcount's insistence on having
preemption disabled on the writer side is irrelevant. So switch to
raw_ variants of seqcount API instead of disabling preemption unnecessarily.

Also, point out in the comments more explicitly why we use seqcount
despite our reader side being rather simple and never retrying. We favor
well-maintained kernel primitive in favor of open-coding our own memory
barriers.

Fixes: 8622e45b5d ("uprobes: Reuse return_instances between multiple uretprobes within task")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Suggested-by: Sebastian Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20250404194848.2109539-1-andrii@kernel.org
2025-04-07 20:15:16 +02:00
Steven Rostedt
4808595a99 tracing: Hide get_vm_area() from MMUless builds
The function get_vm_area() is not defined for non-MMU builds and causes a
build error if it is used. Hide the map_pages() function around a:

 #ifdef CONFIG_MMU

to keep it from being compiled when CONFIG_MMU is not set.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250407120111.2ccc9319@gandalf.local.home
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/4f8ece8b-8862-4f7c-8ede-febd28f8a9fe@roeck-us.net/
Fixes: 394f3f02de ("tracing: Use vmap_page_range() to map memmap ring buffer")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-07 12:02:11 -04:00
Gabriel Shahrouzi
0ba3a4ab76 perf/core: Fix WARN_ON(!ctx) in __free_event() for partial init
Move the get_ctx(child_ctx) call and the child_event->ctx assignment to
occur immediately after the child event is allocated. Ensure that
child_event->ctx is non-NULL before any subsequent error path within
inherit_event calls free_event(), satisfying the assumptions of the
cleanup code.

Details:

There's no clear Fixes tag, because this bug is a side-effect of
multiple interacting commits over time (up to 15 years old), not
a single regression.

The code initially incremented refcount then assigned context
immediately after the child_event was created. Later, an early
validity check for child_event was added before the
refcount/assignment. Even later, a WARN_ON_ONCE() cleanup check was
added, assuming event->ctx is valid if the pmu_ctx is valid.
The problem is that the WARN_ON_ONCE() could trigger after the initial
check passed but before child_event->ctx was assigned, violating its
precondition. The solution is to assign child_event->ctx right after
its initial validation. This ensures the context exists for any
subsequent checks or cleanup routines, resolving the WARN_ON_ONCE().

To resolve it, defer the refcount update and child_event->ctx assignment
directly after child_event->pmu_ctx is set but before checking if the
parent event is orphaned. The cleanup routine depends on
event->pmu_ctx being non-NULL before it verifies event->ctx is
non-NULL. This also maintains the author's original intent of passing
in child_ctx to find_get_pmu_context before its refcount/assignment.

[ mingo: Expanded the changelog from another email by Gabriel Shahrouzi. ]

Reported-by: syzbot+ff3aa851d46ab82953a3@syzkaller.appspotmail.com
Signed-off-by: Gabriel Shahrouzi <gshahrouzi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lore.kernel.org/r/20250405203036.582721-1-gshahrouzi@gmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ff3aa851d46ab82953a3
2025-04-06 20:30:28 +02:00
Linus Torvalds
dda8887894 Fix a perf events time accounting bug.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfyslURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jGRw/8CZ3NyJI/g0StlH/csA4a1f58JBpIR1XY
 I+uUXuTEsd1yS1TMLz5WeK8hsLp1ZCnehojLcjY4Ee/8Tp+mBRlgYuvgJ/T7wBbI
 OOJjsSku2IHk+D3RshUJt3kG9deeBIkZSQON+HeJ28WaYzXs0qpNKnS3a5RMLJWo
 k+PS4IpRaWN5a/YIxC2XMGGxEBE0W9wJXXthIbbSozuu1uXuNiZ92cxAa8IzPiZn
 4oThM4dq1XyR4NvcjWf23206fUUVEzBoK/XS15oRK3Nk2oHMZ2ilruTxkBEaFf50
 6Nr2zNVVQ6/l6wR9DYMAQTE+UHFMJGb7+l1oSARLFKPKc9h7nj4+eItBMzkzbxXS
 wAZX0nq+kkXAr2DABHBxeT6q10OGjLHCOTrE4AfU0Iss8kmNhRPiqwuXT87dOcDa
 OWH75mrP6rkGdUdExV5+ZdB1GhBiomg/KB3YCILeM2OjrluXtZC1aHYiuS1RPh24
 KaH6H20WtRzNbF7uxRPBwOS1U3xHfFd+usZ1XnBzl2DWNz09hyJUjuG8B90EmjvI
 POQ2lyepVg2tIV/uplM0sb9J39tNZNXRlfQrmjsyuHk+1kmK+bh0lL9WVe7qQqVB
 jEX6X0yqTdLAMyRllcwRwtIhgb1u89PFRhnC3IKRfuJeuj07LTgnkOYSVwp7YQ+C
 p7eIplRJhrw=
 =slQk
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event fix from Ingo Molnar:
 "Fix a perf events time accounting bug"

* tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix child_total_time_enabled accounting bug at task exit
2025-04-06 10:48:12 -07:00
Linus Torvalds
302deb109d Miscellaneous scheduler fixes/updates:
- Fix a nonsensical Kconfig combination
  - Remove an unnecessary rseq-notification
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfysY8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gftQ//Sy6HJRQhC/xxLuv8N8f9bEzju5fSgEaK
 2Cg5aKmKyf6IECIAMilB3snAM4h1X1ajnGbxx98sERuDTjkjKawLD17hC5riYxyr
 fReDy6VkdNw/yALK/QcodJZHPWBbYeVT3uVO9qQSnMq6q8IJrOkM0rZwOYawo2FJ
 ID7xWUGhPTatuqm2TM4r4yzwXrPq5fHllrWEsc4LlhtXYRJmzeOGbLh63vUgUFZO
 iu0uM7qt93GoVZqPsw5fliuFE+m4Ug8fPY+hBtXZlUn/npQpR9dP3+hccXIsslCq
 H00pmnqiE5nyDo8zsOG3rzO4gml6k4JQGUWkcmzVq56n02N6naC4KTGZS79aCpaV
 7KwInYW2fzwYcd6UEVHlRqeJK/XFTcL+fDfFWSEp5T/3keeCnwilZDRAHPAgW6ot
 GxAUPT8P8qlnGhXSOMoOoJND3KChelQQzJBQc/j5EToqYCNLytqzWnPgXbzwN/Za
 ZWhlL2T39n3ykEQarlm0MOL35n/0CF27Q5dKOLeaS6OA7K1wYHOQYuCf09zRpKrv
 aaKiKhir4RyYLsfUIJD9cSO68AZQGAwXZGEyM23eErjcA/ZNHrew4TGFM3Tzwj2Q
 /7wHpWfRhhcP7igGrOoJ+YDOCvrfUSgegRYx8hgucuWmFFI1h1mrmfWy8lyPihtm
 pPy9jAwjElI=
 =n5lA
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:

 - Fix a nonsensical Kconfig combination

 - Remove an unnecessary rseq-notification

* tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq: Eliminate useless task_work on execve
  sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP
2025-04-06 10:44:58 -07:00
Linus Torvalds
16cd1c2657 A set of final cleanups for the timer subsystem:
1) Convert all del_timer[_sync]() instances over to the new
      timer_delete[_sync]() API and remove the legacy wrappers.
 
      Conversion was done with coccinelle plus some manual fixups as
      coccinelle chokes on scoped_guard().
 
   2) The final cleanup of the hrtimer_init() to hrtimer_setup() conversion.
 
      This has been delayed to the end of the merge window, so that all
      patches which have been merged through other trees are in mainline and
      all new users are catched.
 
 Doing this right before rc1 ensures that new code which is merged post rc1
 is not introducing new instances of the original functionality.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfyXi0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYzlD/4ykDZbUzgTreYOxEQpBJ9elPwBhxfL
 1v8OwDjRWlNrmLup8RiUfKrlbmztGl1J/u9ld0qhjcqkywCCBC1N5S+DhCjYetyP
 MPWLbi2Dc35cFA+M7i8fMgxI2K9MLz2Zj1UKxz1MdsSuNHm07N3mul/3T11Ye4Rz
 nPlzeQBTBDFCKTEGKjr8zjuoD15Wl48sObM0AjV35BPuQR1jfY4CE6VXo2h78+0c
 jYwpJpDmcd+o1bDrfFhWUME2DzABEkHhn4wNSETnM4E5RXZRMUbi4UiigzInibQr
 JOUTKwPJXTMX/Erd0XyXErrYf2qy1X9BQy6NlyDDOv+8kLEVRsC9Efplx9uoEtfi
 QvVT/UmgmhZFJBfIT3/B8OvasrfwOropaYoG4L0zbDpp1b09VY47N5lCLlNr/mZf
 jb2TwIln8Szy2EfIT2RSd0ZNupyU8V4aH/mYNpSlbUJ6mfvfIAttBSS/YH+Zeqku
 7zOJkoCusaySOCZCOQkeikL3ZBN+FHtNteXxmGnp34ed/tsfgGZj1lsbmkM2rrWo
 f2mQsYAclUA4KQeY9z/Xf7/c5wJUkME69PxOaaN23dOpBR7GA58Cvb0PQTnPlAiT
 KnH/JRweBHtcv4KEHMi2f5no4cxcmXyKTj7/TLyYNjc8LATL9Eo/nxG36PLxy4lN
 QPOWz11zEBLjQQ==
 =8Ftq
 -----END PGP SIGNATURE-----

Merge tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer cleanups from Thomas Gleixner:
 "A set of final cleanups for the timer subsystem:

   - Convert all del_timer[_sync]() instances over to the new
     timer_delete[_sync]() API and remove the legacy wrappers.

     Conversion was done with coccinelle plus some manual fixups as
     coccinelle chokes on scoped_guard().

   - The final cleanup of the hrtimer_init() to hrtimer_setup()
     conversion.

     This has been delayed to the end of the merge window, so that all
     patches which have been merged through other trees are in mainline
     and all new users are catched.

  Doing this right before rc1 ensures that new code which is merged post
  rc1 is not introducing new instances of the original functionality"

* tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing/timers: Rename the hrtimer_init event to hrtimer_setup
  hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack()
  hrtimers: Rename debug_init() to debug_setup()
  hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper()
  hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns()
  hrtimers: Make callback function pointer private
  hrtimers: Merge __hrtimer_init() into __hrtimer_setup()
  hrtimers: Switch to use __htimer_setup()
  hrtimers: Delete hrtimer_init()
  treewide: Convert new and leftover hrtimer_init() users
  treewide: Switch/rename to timer_delete[_sync]()
2025-04-06 08:35:37 -07:00
Linus Torvalds
ff0c66685d A set of updates for the interrupt subsystem:
1) A treewide cleanup for the irq_domain code, which makes the naming
      consistent and gets rid of the original oddity of naming domains
      'host'.
 
      This is a trivial mechanical change and is done late to ensure that
      all instances have been catched and new code merged post rc1 wont
      reintroduce new instances.
 
   2) A trivial consistency fix in the migration code
 
      The recent introduction of irq_force_complete_move() in the core
      code, causes a problem for the nostalgia crowd who maintains ia64 out
      of tree.
 
      The code assumes that hierarchical interrupt domains are enabled and
      dereferences irq_data::parent_data unconditionally. That works in mainline
      because both architectures which enable that code have hierarchical domains
      enabled. Though it breaks the ia64 build, which enables the functionality,
      but does not have hierarchical domains.
 
      While it's not really a problem for mainline today, this
      unconditional dereference is inconsistent and trivially fixable by
      using the existing helper function irqd_get_parent_data(), which has
      the appropriate #ifdeffery in place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfyW1sTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWywD/sG69q7rjt0bBHleXPjjUIrM5TdRI9k
 r9S3BhVtZzfreiMnhQS1CLrA64fBFhKGJVo9HtKbsjC0hF8r10A1+OKEftYpydPz
 Mk7DreqCvQO/GQ/p2MiwHiQL39iXW5eFqL8qScafD8jUnkQ1kjHu53blLuoAzx2u
 ysfe/4V3KtcziKgShss4Y0SGg3CEL5sJiLbU7SLNCSRNkO/hCPh1KYAFcsrRaXnQ
 pcnHae8N58RrgGIhe1F9oPNji2B0YdQ2vt7Ora2g6TlbMv66LYQ+QCu++/0n3HZI
 EV/ikBtuF7zwAg6qzcmfY63XfTMj/K/Oj7qKTsMtcgHFlrpcQ9HW33qMUm90rATB
 Sx/oeiJS10XFlEoseX0dO8NoRE/ZvF9wioAXnvbxxZtOchr+3hyQSbI3hGdJoncL
 mqIRyf08o5kzBoRUY7Nqztlst6/+0bBgxPgDFsW7j47V/NBlUYQ0UBlB+FyoeVfk
 RWS3Z18jpKlvVNKn67ZYRI0zlaxgyyGszwSsLTpQvOFt2HGdKiHFeCuBiBVOboel
 vhtIRW+zT3cyMKvZimQ3BfKnBgFiEKd73VQIjaHBB+eLt2DtNpq6x0dnaOQLvVau
 7eSFgBKOwEz3zAu81omcgHwMb/5/Z46e5jrtliF4YFThHWUZPZFrhrr7JFJ+pqTz
 PTNWb0zGIzQCmg==
 =lhoB
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull more irq updates from Thomas Gleixner:
 "A set of updates for the interrupt subsystem:

   - A treewide cleanup for the irq_domain code, which makes the naming
     consistent and gets rid of the original oddity of naming domains
     'host'.

     This is a trivial mechanical change and is done late to ensure that
     all instances have been catched and new code merged post rc1 wont
     reintroduce new instances.

   - A trivial consistency fix in the migration code

     The recent introduction of irq_force_complete_move() in the core
     code, causes a problem for the nostalgia crowd who maintains ia64
     out of tree.

     The code assumes that hierarchical interrupt domains are enabled
     and dereferences irq_data::parent_data unconditionally. That works
     in mainline because both architectures which enable that code have
     hierarchical domains enabled. Though it breaks the ia64 build,
     which enables the functionality, but does not have hierarchical
     domains.

     While it's not really a problem for mainline today, this
     unconditional dereference is inconsistent and trivially fixable by
     using the existing helper function irqd_get_parent_data(), which
     has the appropriate #ifdeffery in place"

* tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move()
  irqdomain: Stop using 'host' for domain
  irqdomain: Rename irq_get_default_host() to irq_get_default_domain()
  irqdomain: Rename irq_set_default_host() to irq_set_default_domain()
2025-04-06 08:17:43 -07:00
Linus Torvalds
a91c49517d A revert to fix a adjtimex() regression:
The recent change to prevent that time goes backwards for the coarse time
 getters due to immediate multiplier adjustments via adjtimex(), changed the
 way how the timekeeping core treats that.
 
 That change result in a regression on the adjtimex() side, which is user
 space visible:
 
  1) The forwarding of the base time moves the update out of the original
     period and establishes a new one. That's changing the behaviour of the
     [PF]LL control, which user space expects to be applied periodically.
 
  2) The clearing of the accumulated NTP error due to #1, changes the
     behaviour as well.
 
 It was tried to delay the multiplier/frequency update to the next tick, but
 that did not solve the problem as userspace expects that the multiplier or
 frequency updates are in effect, when the syscall returns.
 
 There is a different solution for the coarse time problem available, so
 revert the offending commit to restore the existing adjtimex() behaviour.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfyVtsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaeBEACIjssZjasdb/6sGgpp+6jdp3yCuUff
 XUB30O54u+54NBtIoxq8a8w74sI06Y1xmmRIRgmchBrylUooZyglaE1BfKZpzt5h
 FWszpSp3pfFOZ+A2rpNGWhskGxNVCDnGsPAsQSmPgCY17ZU5j+BkTSE4fcDZqftC
 E/Ojr67KD24kXGXDeQp08fSdXCfyd85PnFmpZmyqnDePuAA2JF6uAfqJE+QoeuUh
 KkQdARi+xAvXdzIRCLw5cQ/tlhxwPYrHOiMt/VRg/A44Nowl/+IEo83QjXRn7cz9
 sq1X2tAY42D/VSG01ZS8cpErWQuSYlI+hilFJ13POVZP+2xhZQUI3QzmrjG4+jqr
 s6I5g6RQyasG8tgkVTTR9+rIvSOAVkp0j0Y2tZ14e/9gi+/0+f5DYhxRc7MFPLW0
 ssS6oPIO1lsnU5KcaZ88SdDZ1OYmAj+L3R3dKM8PoggK8igZkaqezKwiH3RorKQJ
 8yZ5yfGYRNInzLHq7MUkai0xnLGbbx/hHCPZt+V7rNWP34eD+xykSKestC3wFscm
 jWAwP/CERz6mYR5mqicWkP52o39fIjbFixq+epAzBabmBJnPNBaUyb9V3MEf6ycq
 yWscFVjPu6koeX4MNUtDpcFdtb1QZJMJAtBxxnysFy03eNaryYRvta1t8EP/WgMz
 Zu71G7I8SvWrUw==
 =0MEH
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A revert to fix a adjtimex() regression:

  The recent change to prevent that time goes backwards for the coarse
  time getters due to immediate multiplier adjustments via adjtimex(),
  changed the way how the timekeeping core treats that.

  That change result in a regression on the adjtimex() side, which is
  user space visible:

   1) The forwarding of the base time moves the update out of the
      original period and establishes a new one. That's changing the
      behaviour of the [PF]LL control, which user space expects to be
      applied periodically.

   2) The clearing of the accumulated NTP error due to #1, changes the
      behaviour as well.

  An attempt to delay the multiplier/frequency update to the next tick
  did not solve the problem as userspace expects that the multiplier or
  frequency updates are in effect, when the syscall returns.

  There is a different solution for the coarse time problem available,
  so revert the offending commit to restore the existing adjtimex()
  behaviour"

* tag 'timers-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "timekeeping: Fix possible inconsistencies in _COARSE clockids"
2025-04-06 08:13:16 -07:00
Linus Torvalds
f4d2ef4825 Kbuild updates for v6.15
- Improve performance in gendwarfksyms
 
  - Remove deprecated EXTRA_*FLAGS and KBUILD_ENABLE_EXTRA_GCC_CHECKS
 
  - Support CONFIG_HEADERS_INSTALL for ARCH=um
 
  - Use more relative paths to sources files for better reproducibility
 
  - Support the loong64 Debian architecture
 
  - Add Kbuild bash completion
 
  - Introduce intermediate vmlinux.unstripped for architectures that need
    static relocations to be stripped from the final vmlinux
 
  - Fix versioning in Debian packages for -rc releases
 
  - Treat missing MODULE_DESCRIPTION() as an error
 
  - Convert Nios2 Makefiles to use the generic rule for built-in DTB
 
  - Add debuginfo support to the RPM package
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmfxp2EVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGkIUP/AgNiP6or6fmY5+HSyjlrdutBWAh
 QNW0AiKh5vytmBIv63/i103OE0SRbt+U6IApn9c7FQKkeuyIlD1e9NfSwFMZixmP
 P7t6JqDCL61G5d3W2Iisqle1cpBoVvNgUwu0k3sTSXl0vNsDbiyxcCzQzLhZMKsd
 O+Ppwp3zNGE2vIUwpIjzJsR5Dt/Z5MfuKDi4UShsyWpFZ1rg9X93YKc9QJOXjKwj
 4Np2x2cukDo2oz4uXuZQ8F1+bOFsKYoilCwjtxlrC6BO0lSPiJsRTN6nGJ0ejns9
 GGD56mBNGcGk+NEPGhAMQmZHqNAP4JfjEvAgaoSBn0Rdnjd9Cj/2T+4n61xkR4Wu
 MXCP/LEJ3MyctmkZjUq+0fDAe2wjxuaAG15kAHCha+9KxIG2NzHbf2XXb4E49DDU
 2rw3fqA41/cKCq1ZEaqRn3pZZgU6ysfsEW42JmnNxO+7zz9k8RX4rk8CVaVIEUuw
 Xojkis//KnE6+OCBe6Tb0H2Rzo0JF3AG2eNF4zY/xnc562FRIMS19WYS38tKZng6
 Gr1BRG0bA4t9mf2Vck1W1LcAb3Jh0mddtyrgYKhbcwq0YOj2q/H6F50DkC+wL282
 wvhV6B/vKAH8BByEWAn3rBcN0N+w/VFc0uPCz//tkoAm4nPg8PvKq63JHPrHsyZe
 mOMhifoiVbjF4KFo
 =GiQ6
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Improve performance in gendwarfksyms

 - Remove deprecated EXTRA_*FLAGS and KBUILD_ENABLE_EXTRA_GCC_CHECKS

 - Support CONFIG_HEADERS_INSTALL for ARCH=um

 - Use more relative paths to sources files for better reproducibility

 - Support the loong64 Debian architecture

 - Add Kbuild bash completion

 - Introduce intermediate vmlinux.unstripped for architectures that need
   static relocations to be stripped from the final vmlinux

 - Fix versioning in Debian packages for -rc releases

 - Treat missing MODULE_DESCRIPTION() as an error

 - Convert Nios2 Makefiles to use the generic rule for built-in DTB

 - Add debuginfo support to the RPM package

* tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
  kbuild: rpm-pkg: build a debuginfo RPM
  kconfig: merge_config: use an empty file as initfile
  nios2: migrate to the generic rule for built-in DTB
  rust: kbuild: skip `--remap-path-prefix` for `rustdoc`
  kbuild: pacman-pkg: hardcode module installation path
  kbuild: deb-pkg: don't set KBUILD_BUILD_VERSION unconditionally
  modpost: require a MODULE_DESCRIPTION()
  kbuild: make all file references relative to source root
  x86: drop unnecessary prefix map configuration
  kbuild: deb-pkg: add comment about future removal of KDEB_COMPRESS
  kbuild: Add a help message for "headers"
  kbuild: deb-pkg: remove "version" variable in mkdebian
  kbuild: deb-pkg: fix versioning for -rc releases
  Documentation/kbuild: Fix indentation in modules.rst example
  x86: Get rid of Makefile.postlink
  kbuild: Create intermediate vmlinux build with relocations preserved
  kbuild: Introduce Kconfig symbol for linking vmlinux with relocations
  kbuild: link-vmlinux.sh: Make output file name configurable
  kbuild: do not generate .tmp_vmlinux*.map when CONFIG_VMLINUX_MAP=y
  Revert "kheaders: Ignore silly-rename files"
  ...
2025-04-05 15:46:50 -07:00
Nam Cao
244132c4e5 tracing/timers: Rename the hrtimer_init event to hrtimer_setup
The function hrtimer_init() doesn't exist anymore. It was replaced by
hrtimer_setup().

Thus, rename the hrtimer_init trace event to hrtimer_setup to keep it
consistent.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/cba84c3d853c5258aa3a262363a6eac08e2c7afc.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao
59c9edafc0 hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack()
All the hrtimer_init*() functions have been renamed to hrtimer_setup*().
Rename debug_init_on_stack() to debug_setup_on_stack() as well, to keep the
names consistent.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/073cf6162779a2f5b12624677d4c49ee7eccc1ed.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao
e9ef2093ad hrtimers: Rename debug_init() to debug_setup()
All the hrtimer_init*() functions have been renamed to hrtimer_setup*().
Rename debug_init() to debug_setup() as well, to keep the names consistent.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/4b730c1f79648b16a1c5413f928fdc2e138dfc43.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao
fcea1ccf24 hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper()
All the hrtimer_init*() functions have been renamed to hrtimer_setup*().
Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper() as well, to
keep the names consistent.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/807694aedad9353421c4a7347629a30c5c31026f.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao
1cc24f2e76 hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns()
The struct hrtimer::function field can only be changed using
hrtimer_setup*() or hrtimer_update_function(), and both already null-check
'function'. Therefore, null-checking 'function' in hrtimer_start_range_ns()
is not necessary.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/4661c571ee87980c340ccc318fc1a473c0c8f6bc.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao
04257da0c9 hrtimers: Make callback function pointer private
Make the struct hrtimer::function field private, to prevent users from
changing this field in an unsafe way. hrtimer_update_function() should be
used if the callback function needs to be changed.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/7d0e6e0c5c59a64a9bea940051aac05d750bc0c2.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao
87d82cff38 hrtimers: Merge __hrtimer_init() into __hrtimer_setup()
__hrtimer_init() is only called by __hrtimer_setup(). Simplify by merging
__hrtimer_init() into __hrtimer_setup().

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/8a0a847a35f711f66b2d05b57255aa44e7e61279.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao
50177a8b2e hrtimers: Switch to use __htimer_setup()
__hrtimer_init_sleeper() calls __hrtimer_init() and also sets up the
callback function. But there is already __hrtimer_setup() which does both
actions.

Switch to use __hrtimer_setup() to simplify the code.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/d9a45a51b6a8aa0045310d63f73753bf6b33f385.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao
9779489a31 hrtimers: Delete hrtimer_init()
hrtimer_init() is now unused. Delete it.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/003722f60c7a2a4f8d4ed24fb741aa313b7e5136.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Thomas Gleixner
8fa7292fee treewide: Switch/rename to timer_delete[_sync]()
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with coccinelle plus manual fixups where necessary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-05 10:30:12 +02:00
Thomas Gleixner
324a2219ba Revert "timekeeping: Fix possible inconsistencies in _COARSE clockids"
This reverts commit 757b000f7b.

Miroslav reported that the changes for handling the inconsistencies in the
coarse time getters result in a regression on the adjtimex() side.

There are two issues:

  1) The forwarding of the base time moves the update out of the original
     period and establishes a new one.

  2) The clearing of the accumulated NTP error is changing the behaviour as
     well.

Userspace expects that multiplier/frequency updates are in effect, when the
syscall returns, so delaying the update to the next tick is not solving the
problem either.

Revert the change, so that the established expectations of user space
implementations (ntpd, chronyd) are restored. The re-introduced
inconsistency of the coarse time getters will be addressed in a subsequent
fix.

Fixes: 757b000f7b ("timekeeping: Fix possible inconsistencies in _COARSE clockids")
Reported-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/Z-qsg6iDGlcIJulJ@localhost
2025-04-04 19:10:00 +02:00
Thomas Gleixner
9b305678c5 genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move()
Frank reported, that the common irq_force_complete_move() breaks the out of
tree build of ia64. The reason is that ia64 uses the migration code, but
does not have hierarchical interrupt domains enabled.

This went unnoticed in mainline as both x86 and RISC-V have hierarchical
domains enabled. Not that it matters for mainline, but it's still
inconsistent.

Use irqd_get_parent_data() instead of accessing the parent_data field
directly. The helper returns NULL when hierarchical domains are disabled
otherwise it accesses the parent_data field of the domain.

No functional change.

Fixes: 751dc837da ("genirq: Introduce common irq_force_complete_move() implementation")
Reported-by: Frank Scheiner <frank.scheiner@web.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Frank Scheiner <frank.scheiner@web.de>
Link: https://lore.kernel.org/all/87h634ugig.ffs@tglx
2025-04-04 17:08:36 +02:00
Jiri Slaby (SUSE)
0a27ea384c irqdomain: Rename irq_get_default_host() to irq_get_default_domain()
Naming interrupt domains host is confusing at best and the irqdomain code
uses both domain and host inconsistently.

Therefore rename irq_get_default_host() to irq_get_default_domain().

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-4-jirislaby@kernel.org
2025-04-04 16:39:10 +02:00
Jiri Slaby (SUSE)
825dfab23b irqdomain: Rename irq_set_default_host() to irq_set_default_domain()
Naming interrupt domains host is confusing at best and the irqdomain code
uses both domain and host inconsistently.

Therefore rename irq_set_default_host() to irq_set_default_domain().

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-3-jirislaby@kernel.org
2025-04-04 16:39:10 +02:00
Linus Torvalds
6cb0bd94c0 Persistent buffer cleanups and simplifications for v6.15:
It was mistaken that the physical memory returned from "reserve_mem" had to
 be vmap()'d to get to it from a virtual address. But reserve_mem already
 maps the memory to the virtual address of the kernel so a simple
 phys_to_virt() can be used to get to the virtual address from the physical
 memory returned by "reserve_mem". With this new found knowledge, the
 code can be cleaned up and simplified.
 
 - Enforce that the persistent memory is page aligned
 
   As the buffers using the persistent memory are all going to be
   mapped via pages, make sure that the memory given to the tracing
   infrastructure is page aligned. If it is not, it will print a warning
   and fail to map the buffer.
 
 - Use phys_to_virt() to get the virtual address from reserve_mem
 
   Instead of calling vmap() on the physical memory returned from
   "reserve_mem", use phys_to_virt() instead.
 
   As the memory returned by "memmap" or any other means where a physical
   address is given to the tracing infrastructure, it still needs to
   be vmap(). Since this memory can never be returned back to the buddy
   allocator nor should it ever be memmory mapped to user space, flag
   this buffer and up the ref count. The ref count will keep it from
   ever being freed, and the flag will prevent it from ever being memory
   mapped to user space.
 
 - Use vmap_page_range() for memmap virtual address mapping
 
   For the memmap buffer, instead of allocating an array of struct pages,
   assigning them to the contiguous phsycial memory and then passing that to
   vmap(), use vmap_page_range() instead
 
 - Replace flush_dcache_folio() with flush_kernel_vmap_range()
 
   Instead of calling virt_to_folio() and passing that to
   flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
   This also fixes a bug where if a subbuffer was bigger than PAGE_SIZE
   only the PAGE_SIZE portion would be flushed.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+6oZRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhq6AP481KHAgaowQCg7zrKPkMlbYBIigYoU
 7aqoAg2rSLBRSQEAl8fViHZgZ9Q+O7xdozQWiIR7/KQW8VIaTcP/V7cHkAU=
 =+5JB
 -----END PGP SIGNATURE-----

Merge tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer updates from Steven Rostedt:
 "Persistent buffer cleanups and simplifications.

  It was mistaken that the physical memory returned from "reserve_mem"
  had to be vmap()'d to get to it from a virtual address. But
  reserve_mem already maps the memory to the virtual address of the
  kernel so a simple phys_to_virt() can be used to get to the virtual
  address from the physical memory returned by "reserve_mem". With this
  new found knowledge, the code can be cleaned up and simplified.

   - Enforce that the persistent memory is page aligned

     As the buffers using the persistent memory are all going to be
     mapped via pages, make sure that the memory given to the tracing
     infrastructure is page aligned. If it is not, it will print a
     warning and fail to map the buffer.

   - Use phys_to_virt() to get the virtual address from reserve_mem

     Instead of calling vmap() on the physical memory returned from
     "reserve_mem", use phys_to_virt() instead.

     As the memory returned by "memmap" or any other means where a
     physical address is given to the tracing infrastructure, it still
     needs to be vmap(). Since this memory can never be returned back to
     the buddy allocator nor should it ever be memmory mapped to user
     space, flag this buffer and up the ref count. The ref count will
     keep it from ever being freed, and the flag will prevent it from
     ever being memory mapped to user space.

   - Use vmap_page_range() for memmap virtual address mapping

     For the memmap buffer, instead of allocating an array of struct
     pages, assigning them to the contiguous phsycial memory and then
     passing that to vmap(), use vmap_page_range() instead

   - Replace flush_dcache_folio() with flush_kernel_vmap_range()

     Instead of calling virt_to_folio() and passing that to
     flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
     This also fixes a bug where if a subbuffer was bigger than
     PAGE_SIZE only the PAGE_SIZE portion would be flushed"

* tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio()
  tracing: Use vmap_page_range() to map memmap ring buffer
  tracing: Have reserve_mem use phys_to_virt() and separate from memmap buffer
  tracing: Enforce the persistent ring buffer to be page aligned
2025-04-03 16:09:29 -07:00
Linus Torvalds
8c7c1b5506 - The 2 patch series "mm: fixes for fallouts from mem_init() cleanup"
from Mike Rapoport fixes a couple of issues with the just-merged "arch,
   mm: reduce code duplication in mem_init()" series.
 
 - The 4 patch series "MAINTAINERS: add my isub-entries to MM part." from
   Mike Rapoport does some maintenance on MAINTAINERS.
 
 - The 6 patch series "remove tlb_remove_page_ptdesc()" from Qi Zheng
   does some cleanup work to the page mapping code.
 
 - The 7 patch series "mseal system mappings" from Jeff Xu permits
   sealing of "system mappings", such as vdso, vvar, vvar_vclock, vectors
   (arm compat-mode), sigpage (arm compat-mode).
 
 - Plus the usual shower of singleton patches.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+4XpgAKCRDdBJ7gKXxA
 jnwtAP43Rp3zyWf034fEypea36xQqcsy4I7YUTdZEgnFS7LCZwEApM97JvGHsYEr
 Ns9Zhnh+E3RWASfOAzJoVZVrAaMovg4=
 =MyVR
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

 - The series "mm: fixes for fallouts from mem_init() cleanup" from Mike
   Rapoport fixes a couple of issues with the just-merged "arch, mm:
   reduce code duplication in mem_init()" series

 - The series "MAINTAINERS: add my isub-entries to MM part." from Mike
   Rapoport does some maintenance on MAINTAINERS

 - The series "remove tlb_remove_page_ptdesc()" from Qi Zheng does some
   cleanup work to the page mapping code

 - The series "mseal system mappings" from Jeff Xu permits sealing of
   "system mappings", such as vdso, vvar, vvar_vclock, vectors (arm
   compat-mode), sigpage (arm compat-mode)

 - Plus the usual shower of singleton patches

* tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
  mseal sysmap: add arch-support txt
  mseal sysmap: enable s390
  selftest: test system mappings are sealed
  mseal sysmap: update mseal.rst
  mseal sysmap: uprobe mapping
  mseal sysmap: enable arm64
  mseal sysmap: enable x86-64
  mseal sysmap: generic vdso vvar mapping
  selftests: x86: test_mremap_vdso: skip if vdso is msealed
  mseal sysmap: kernel config and header change
  mm: pgtable: remove tlb_remove_page_ptdesc()
  x86: pgtable: convert to use tlb_remove_ptdesc()
  riscv: pgtable: unconditionally use tlb_remove_ptdesc()
  mm: pgtable: convert some architectures to use tlb_remove_ptdesc()
  mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc*
  mm: pgtable: make generic tlb_remove_table() use struct ptdesc
  microblaze/mm: put mm_cmdline_setup() in .init.text section
  mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
  MAINTAINERS: mm: add entry for secretmem
  MAINTAINERS: mm: add entry for numa memblocks and numa emulation
  ...
2025-04-03 11:10:00 -07:00
Linus Torvalds
ea59cb7423 sched_ext: Fixes for v6.15-rc0
- Calling scx_bpf_create_dsq() with the same ID would succeed creating
   duplicate DSQs. Fix it to return -EEXIST.
 
 - scx_select_cpu_dfl() fixes and cleanups.
 
 - Synchronize tool/sched_ext with external scheduler repo. While this isn't
   a fix. There's no risk to the kernel and it's better if they stay synced
   closer.
 -----BEGIN PGP SIGNATURE-----
 
 iIMEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZ+29Eg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGeNGAP97GCCCwovepx3f9HV3RRk8oEregsGI7gmr+TC5
 +XJrqwD4urg6I5JGM3K5dB9m626RyUP6k5RmYdjqBrEL6LauCg==
 =uWzD
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-6.15-rc0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

 - Calling scx_bpf_create_dsq() with the same ID would succeed creating
   duplicate DSQs. Fix it to return -EEXIST.

 - scx_select_cpu_dfl() fixes and cleanups.

 - Synchronize tool/sched_ext with external scheduler repo. While this
   isn't a fix. There's no risk to the kernel and it's better if they
   stay synced closer.

* tag 'sched_ext-for-6.15-rc0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  tools/sched_ext: Sync with scx repo
  sched_ext: initialize built-in idle state before ops.init()
  sched_ext: create_dsq: Return -EEXIST on duplicate request
  sched_ext: Remove a meaningless conditional goto in scx_select_cpu_dfl()
  sched_ext: idle: Fix return code of scx_select_cpu_dfl()
2025-04-03 10:03:38 -07:00
Linus Torvalds
41677970ad tracing fixes for 6.15
- Fix build error when CONFIG_PROBE_EVENTS_BTF_ARGS is not enabled
 
   The tracing of arguments in the function tracer depends on some
   functions that are only defined when PROBE_EVENTS_BTF_ARGS is enabled.
   In fact, PROBE_EVENTS_BTF_ARGS also depends on all the same configs
   as the function argument tracing requires. Just have the function
   argument tracing depend on PROBE_EVENTS_BTF_ARGS.
 
 - Free module_delta for persistent ring buffer instance
 
   When an instance holds the persistent ring buffer, it allocates
   a helper array to hold the deltas between where modules are loaded
   on the last boot and the current boot. This array needs to be freed
   when the instance is freed.
 
 - Add cond_resched() to loop in ftrace_graph_set_hash()
 
   The hash functions in ftrace loop over every function that can be
   enabled by ftrace. This can be 50,000 functions or more. This
   loop is known to trigger soft lockup warnings and requires a
   cond_resched(). The loop in ftrace_graph_set_hash() was missing it.
 
 - Fix the event format verifier to include "%*p.." arguments
 
   To prevent events from dereferencing stale pointers that can
   happen if a trace event uses a dereferece pointer to something
   that was not copied into the ring buffer and can be freed by the
   time the trace is read, a verifier is called. At boot or module
   load, the verifier scans the print format string for pointers
   that can be dereferenced and it checks the arguments to make sure
   they do not contain something that can be freed. The "%*p" was
   not handled, which would add another argument and cause the verifier
   to not only not verify this pointer, but it will look at the wrong
   argument for every pointer after that.
 
 - Fix mcount sorttable building for different endian type target
 
   When modifying the ELF file to sort the mcount_loc table in the
   sorttable.c code, the endianess of the file and the host is used
   to determine if the bytes need to be swapped when calculations are
   done. A change was made to the sorting of the mcount_loc that read
   the values from the ELF file into an array and the swap happened
   on the filling of the array. But one of the calculations of the
   array still did the swap when it did not need to. This caused building
   on a little endian machine for a big endian target to not find
   the mcount function in the 'nm' table and it zeroed it out, causing
   there to be no functions available to trace.
 
 - Add goto out_unlock jump to rv_register_monitor() on error path
 
   One of the error paths in rv_register_monitor() just returned the
   error when it should have jumped to the out_unlock label to release
   the mutex.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+2tyBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qjPYAPwJDti6nHTqheFwIa1WzJ3yC2tKRYKt
 1E5PYW/2Ct5NmwEAqgg3TvJppXHymVdutLghhGFnlBnyTWMI+KIhparSBw8=
 =NFM5
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix build error when CONFIG_PROBE_EVENTS_BTF_ARGS is not enabled

   The tracing of arguments in the function tracer depends on some
   functions that are only defined when PROBE_EVENTS_BTF_ARGS is
   enabled. In fact, PROBE_EVENTS_BTF_ARGS also depends on all the same
   configs as the function argument tracing requires. Just have the
   function argument tracing depend on PROBE_EVENTS_BTF_ARGS.

 - Free module_delta for persistent ring buffer instance

   When an instance holds the persistent ring buffer, it allocates a
   helper array to hold the deltas between where modules are loaded on
   the last boot and the current boot. This array needs to be freed when
   the instance is freed.

 - Add cond_resched() to loop in ftrace_graph_set_hash()

   The hash functions in ftrace loop over every function that can be
   enabled by ftrace. This can be 50,000 functions or more. This loop is
   known to trigger soft lockup warnings and requires a cond_resched().
   The loop in ftrace_graph_set_hash() was missing it.

 - Fix the event format verifier to include "%*p.." arguments

   To prevent events from dereferencing stale pointers that can happen
   if a trace event uses a dereferece pointer to something that was not
   copied into the ring buffer and can be freed by the time the trace is
   read, a verifier is called. At boot or module load, the verifier
   scans the print format string for pointers that can be dereferenced
   and it checks the arguments to make sure they do not contain
   something that can be freed. The "%*p" was not handled, which would
   add another argument and cause the verifier to not only not verify
   this pointer, but it will look at the wrong argument for every
   pointer after that.

 - Fix mcount sorttable building for different endian type target

   When modifying the ELF file to sort the mcount_loc table in the
   sorttable.c code, the endianess of the file and the host is used to
   determine if the bytes need to be swapped when calculations are done.
   A change was made to the sorting of the mcount_loc that read the
   values from the ELF file into an array and the swap happened on the
   filling of the array. But one of the calculations of the array still
   did the swap when it did not need to. This caused building on a
   little endian machine for a big endian target to not find the mcount
   function in the 'nm' table and it zeroed it out, causing there to be
   no functions available to trace.

 - Add goto out_unlock jump to rv_register_monitor() on error path

   One of the error paths in rv_register_monitor() just returned the
   error when it should have jumped to the out_unlock label to release
   the mutex.

* tag 'trace-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv: Fix missing unlock on double nested monitors return path
  scripts/sorttable: Fix endianness handling in build-time mcount sort
  tracing: Verify event formats that have "%*p.."
  ftrace: Add cond_resched() to ftrace_graph_set_hash()
  tracing: Free module_delta on freeing of persistent ring buffer
  ftrace: Have tracing function args depend on PROBE_EVENTS_BTF_ARGS
2025-04-03 09:52:44 -07:00
Mathieu Desnoyers
169eae7711 rseq: Eliminate useless task_work on execve
Eliminate a useless task_work on execve by moving the call to
rseq_set_notify_resume() from sched_mm_cid_after_execve() to the error
path of bprm_execve().

The call to rseq_set_notify_resume() from sched_mm_cid_after_execve() is
pointless in the success case, because rseq_execve() will clear the rseq
pointer before returning to userspace.

sched_mm_cid_after_execve() is called from both the success and error
paths of bprm_execve(). The call to rseq_set_notify_resume() is needed
on error because the mm_cid may have changed.

Also move the rseq_execve() to right after sched_mm_cid_after_execve()
in bprm_execve().

[ mingo: Merged to a recent upstream kernel, extended the changelog. ]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250327132945.1558783-1-mathieu.desnoyers@efficios.com
2025-04-03 13:10:47 +02:00
Linus Torvalds
ddd0172f18 TTY/Serial driver updates for 6.15-rc1
Here is the big set of serial and tty driver updates for 6.15-rc1.
 Include in here are the following:
   - more great tty layer cleanups from Jiri.  Someday this will be done,
     but that's not going to be any year soon...
   - kdb debug driver reverts to fix a reported issue
   - lots of .dts binding updates for different devices with serial
     devices
   - lots of tiny updates and tweaks and a few bugfixes for different
     serial drivers.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ+2YPA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yn2OQCgvxCyoeuNPuV4X89JdrgocMTMyTYAn15pGgDa
 r7w9UDO/D7UqRnKEnFy+
 =lJwK
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here is the big set of serial and tty driver updates for 6.15-rc1.
  Include in here are the following:

   - more great tty layer cleanups from Jiri. Someday this will be done,
     but that's not going to be any year soon...

   - kdb debug driver reverts to fix a reported issue

   - lots of .dts binding updates for different devices with serial
     devices

   - lots of tiny updates and tweaks and a few bugfixes for different
     serial drivers.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (79 commits)
  tty: serial: fsl_lpuart: Fix unused variable 'sport' build warning
  serial: stm32: do not deassert RS485 RTS GPIO prematurely
  serial: 8250: add driver for NI UARTs
  dt-bindings: serial: snps-dw-apb-uart: document RZ/N1 binding without DMA
  serial: icom: fix code format problems
  serial: sh-sci: Save and restore more registers
  tty: serial: pl011: remove incorrect of_match_ptr annotation
  dt-bindings: serial: snps-dw-apb-uart: Add support for rk3562
  tty: serial: lpuart: only disable CTS instead of overwriting the whole UARTMODIR register
  tty: caif: removed unused function debugfs_tx()
  serial: 8250_dma: terminate correct DMA in tx_dma_flush()
  tty: serial: fsl_lpuart: rename register variables more specifically
  tty: serial: fsl_lpuart: use port struct directly to simply code
  tty: serial: fsl_lpuart: Use u32 and u8 for register variables
  tty: serial: fsl_lpuart: disable transmitter before changing RS485 related registers
  tty: serial: 8250: Add Brainboxes XC devices
  dt-bindings: serial: fsl-lpuart: support i.MX94
  tty: serial: 8250: Add some more device IDs
  dt-bindings: serial: samsung: add exynos7870-uart compatible
  serial: 8250_dw: Comment possible corner cases in serial_out() implementation
  ...
2025-04-02 18:17:33 -07:00
Linus Torvalds
4b06c990c1 vfs-6.15-rc1.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ+1ZqAAKCRCRxhvAZXjc
 oqO7AP4jdW03PHsk5zkbuUTzTTngkZQ1AypIJLTCYIoKPATMowD+ILMjOTsVOfxA
 h38ziAM3tubsz1pwkGuIlsU+drwz5Ao=
 =QfJV
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Add a new maintainer for configfs

 - Fix exportfs module description

 - Place flexible array memeber at the end of an internal struct in the
   mount code

 - Add new maintainer for netfslib as Jeff Layton is stepping down as
   current co-maintainer

 - Fix error handling in cachefiles_get_directory()

 - Cleanup do_notify_pidfd()

 - Fix syscall number definitions in pidfd selftests

 - Fix racy usage of fs_struct->in exec during multi-threaded exec

 - Ensure correct exit code is reported when pidfs_exit() is called from
   release_task() for a delayed thread-group leader exit

 - Fix conflicting iomap flag definitions

* tag 'vfs-6.15-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iomap: Fix conflicting values of iomap flags
  fs: namespace: Avoid -Wflex-array-member-not-at-end warning
  MAINTAINERS: configfs: add Andreas Hindborg as maintainer
  exportfs: add module description
  exit: fix the usage of delay_group_leader->exit_code in do_notify_parent() and pidfs_exit()
  netfs: add Paulo as maintainer and remove myself as Reviewer
  cachefiles: Fix oops in vfs_mkdir from cachefiles_get_directory
  exec: fix the racy usage of fs_struct->in_exec
  selftests/pidfd: fixes syscall number defines
  pidfs: cleanup the usage of do_notify_pidfd()
2025-04-02 16:05:21 -07:00
Linus Torvalds
92b71befc3 These are objtool fixes and updates by Josh Poimboeuf, centered
around the fallout from the new CONFIG_OBJTOOL_WERROR=y feature,
 which, despite its default-off nature, increased the profile/impact
 of objtool warnings:
 
  - Improve error handling and the presentation of warnings/errors.
 
  - Revert the new summary warning line that some test-bot tools
    interpreted as new regressions.
 
  - Fix a number of objtool warnings in various drivers, core kernel
    code and architecture code. About half of them are potential
    problems related to out-of-bounds accesses or potential undefined
    behavior, the other half are additional objtool annotations.
 
  - Update objtool to latest (known) compiler quirks and
    objtool bugs triggered by compiler code generation
 
  - Misc fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfsRJMRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g0YRAApiCylIv+0ucdKiDVAiI+cU7dqAggFp9h
 ULcTuuCtVkfjYzIBw6y1Iw9JeYsyngYaI0VEMmLasJPt8o93K0vwBXGArXJKoMeu
 UPcVS8N6+LqrHsWBXk919t1wgBZ7csgUxsCa1K47NKa3eCijrqI0N8PtcoYqKd+M
 tOuyEcTCTfS0E2STv6Gpdp6VfDKms3Cn4MffLbcNWJXAsd1dwzDIG8IvAHUW9yG3
 /ezVjm46thneNrRd9j/qU3mqNmhsec9NemHG7URaTznRKleWULhpmhGmcPYCh4Rj
 AqGjmPtqprPELtgezeV+LIcmIm5UWF/f+0tzzBrsRy1MiY8ED2w+J51DHsLoHg8t
 IfIkPyYX/zu9StXoRIwx/7C5NQqBlUfXGp6TuOOwzgbKOt+uRJOU6SnSQ06ZDwsa
 l2brQ+NDfvF7EvGnvi18wIM+iqMc2jSuWl0AT94ATDuAZGCyzlmwluIYmDuLfyZM
 JuYOogojt5vgHXDN6Ro3rDfK+tYckwez+Txx4oByGB3IJy75osBihtvHiYno7FgW
 KXDbiAfLZ4SlfPzqxI6PPzaj3py6hG9LICEiL0U8VecC7bZ/22BZQCpdKko+/E/Y
 PwlqCatqz/25U7GlsnfBISJO2VAyyUcbymvjnVXzZCi+IPAfeih6WcsTPJ96jxsa
 LULLCnuvmoY=
 =KkiI
 -----END PGP SIGNATURE-----

Merge tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Ingo Molnar:
 "These are objtool fixes and updates by Josh Poimboeuf, centered around
  the fallout from the new CONFIG_OBJTOOL_WERROR=y feature, which,
  despite its default-off nature, increased the profile/impact of
  objtool warnings:

   - Improve error handling and the presentation of warnings/errors

   - Revert the new summary warning line that some test-bot tools
     interpreted as new regressions

   - Fix a number of objtool warnings in various drivers, core kernel
     code and architecture code. About half of them are potential
     problems related to out-of-bounds accesses or potential undefined
     behavior, the other half are additional objtool annotations

   - Update objtool to latest (known) compiler quirks and objtool bugs
     triggered by compiler code generation

   - Misc fixes"

* tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  objtool/loongarch: Add unwind hints in prepare_frametrace()
  rcu-tasks: Always inline rcu_irq_work_resched()
  context_tracking: Always inline ct_{nmi,irq}_{enter,exit}()
  sched/smt: Always inline sched_smt_active()
  objtool: Fix verbose disassembly if CROSS_COMPILE isn't set
  objtool: Change "warning:" to "error: " for fatal errors
  objtool: Always fail on fatal errors
  Revert "objtool: Increase per-function WARN_FUNC() rate limit"
  objtool: Append "()" to function name in "unexpected end of section" warning
  objtool: Ignore end-of-section jumps for KCOV/GCOV
  objtool: Silence more KCOV warnings, part 2
  objtool, drm/vmwgfx: Don't ignore vmw_send_msg() for ORC
  objtool: Fix STACK_FRAME_NON_STANDARD for cold subfunctions
  objtool: Fix segfault in ignore_unreachable_insn()
  objtool: Fix NULL printf() '%s' argument in builtin-check.c:save_argv()
  objtool, lkdtm: Obfuscate the do_nothing() pointer
  objtool, regulator: rk808: Remove potential undefined behavior in rk806_set_mode_dcdc()
  objtool, ASoC: codecs: wcd934x: Remove potential undefined behavior in wcd934x_slim_irq_handler()
  objtool, Input: cyapa - Remove undefined behavior in cyapa_update_fw_store()
  objtool, panic: Disable SMAP in __stack_chk_fail()
  ...
2025-04-02 10:30:10 -07:00
Linus Torvalds
af54a3a151 more printk changes for 6.15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmftIx0ACgkQUqAMR0iA
 lPJKUxAArJaWC7EXtDtd8u8Rl/CYpIEaMdPd7V+XA5sqdUyjkSJI+jRswonpOsNX
 Zn9pbGMds1LNBXm1NO9039+2TrPSJFTCGK6OgeCJC17/O31wnnm0LhZiU+JElgfi
 iQI5fdTnc3sB37bsjkvEUr9HFizRxY2fHHMWZ8ngiLfkKfki4ET+1u/yf7CraRk1
 6+LK9mM/WyytP6gYaSlL5YYVYs9fNcR/ND6IQgpfIN15/fOAOXWbMB1jE2iDRzqt
 MQUD4+DTYQYmeS6jQ4ToZdx3Ql9NwcP2nJnA5fxXeqPFHc/SgRS6KqOPQgQUD4tV
 N4q6ozLPlzDFeHVHMhPz/PzlSEn0zC1ZX87xXCUAilnkJpbEujcPxf44R/3RHu3d
 y7kmCRj0RwgHpLIwzLH5POrF4il9/wVlyZFRaYBPMkj09l0WBwYvfMhlnzvAtCP8
 pRKqHkjJ1FOWQFJyn98ONqcCmm2pZ8XKW2enikAhISVXcptI/1lIQ6IIpRdTjte1
 r60CbiJ7UFL+TrVqsWBuqWQRi5u5HykPkZiCL/YYXzZmrl3zLO+0ti9YzEU8Yrzd
 K1VAB/1aK/MDrTgOI+VaqlPq79uJBwtbrflgFhFBKAKsqTpBcsZUv9/1KHthnqXV
 Y84SsY2XpoGtjn58mU6eEc+8lLTOTDVXs+ZZL4/M3maW7ygNiYY=
 =Biv4
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull more printk updates from Petr Mladek:

 - Silence warnings about candidates for ‘gnu_print’ format attribute

* tag 'printk-for-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  vsnprintf: Silence false positive GCC warning for va_format()
  vsnprintf: Drop unused const char fmt * in va_format()
  vsnprintf: Mark binary printing functions with __printf() attribute
  tracing: Mark binary printing functions with __printf() attribute
  seq_file: Mark binary printing functions with __printf() attribute
  seq_buf: Mark binary printing functions with __printf() attribute
2025-04-02 10:05:55 -07:00
Linus Torvalds
da0512b2a3 RCU fixes for v6.15:
- srcu: Make FORCE_NEED_SRCU_NMI_SAFE depend on RCU_EXPERT
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAmfs09kACgkQSXnow7UH
 +rg+tQf/ehvjwWwijwLfaDRuVieLUQDEiTLNAsSmDDx9p/620lAm3PtIyi4XBT4d
 tPH15uoqNaFF4fOwWouiIAbJTCgmzg6aOrg8U2Nc1KRGS7JdNUBMV+MxKYJncBYh
 NNcw97n/HGMvi2BWLFj1xdOlSEMITX5xRZArp7c/PVRCau7DDC2lj2Ht47zYPOY3
 echRbQzozLiFCuHseGEiEpVfa00lq0Pg1UyWC+5cXCLVKhv6XlV1kMrsVOWMpF39
 g2CXT5QCTENnPHXBj1wCTG7hZMLVjnlcCE8+tMf92lwmc1zVM5L/T3GZLFzPBb42
 mJE6UhaqiLJYctplnoygWu4xhgLI4A==
 =MPwV
 -----END PGP SIGNATURE-----

Merge tag 'rcu-fixes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU fix from Boqun Feng:

 - srcu: Make FORCE_NEED_SRCU_NMI_SAFE depend on RCU_EXPERT

* tag 'rcu-fixes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
  srcu: Make FORCE_NEED_SRCU_NMI_SAFE depend on RCU_EXPERT
2025-04-02 10:04:48 -07:00
Linus Torvalds
002dcfd057 kgdb patches for 6.15
Two clean ups this cycle. The larger of which is the removal of a private
 allocator within kdb and replacing it with regular memory allocation. The
 other adopts the simplified version of strscpy() in a couple of places in
 kdb.
 
 Signed-off-by: Daniel Thompson (RISCstar) <danielt@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAmfqlAkACgkQfOMlXTn3
 iKFY4Q/9E8YEVGzvQykVAaxYy3etPm39aua3VtTgyhmMTAcEoxJy4g0yCKkErtec
 r0/AbbVbKtiV3PD/2QiWgYY6UHVuu8fLDgG2wgYwaAoUUuvpng5QZkUzAeGbIVrq
 XmzYPoS5ymVmjZfs9vS5IEvY2syATAYWfYe+zztaqT3WzG2ajPF80VyxqdH+4kKM
 Ds4RyQIRYzKndZF0qJ+absnKujgVbZcOUSxjawutZ2vbZtULDnvY3psuiJpnOlbw
 C2kd21K3IiJCLduHRhPr0VzK9xrOe4TilsJHU5bRr6W2t0My/bPa8TQag4g+G2cK
 xY8SKEaIZHOR3swnErwy0EERRr1WYpqC8e1wa9wYFkkS7rjAZ+wSzmnSJdOPKpyq
 0xYN/qTCsFCNBkUjUZc0zOw8+sOx+NPRp7oQKY4v7qAj9ptOMoUACc9dZnIFyqfX
 Rnj7Dn3Jx7g4+OD67UT9/X7b9InsXcr9aeSTYCoc9IdwEFpmKaZ5dPbYOqx4J9lS
 tLoRylprfzP7N0EVy8+R5TQ8fT9ct8tdQFGyVmzTmuKt9DurStNX5xJ2wS6Ot5rK
 +oOEuG1UULI3PdZFftQMtBZird7blhTJhsAqsMWXVLGt2spoZ+Qj99EGVrYpCVkj
 H0eEy0XfcpqklFsfyy0NR1K0oFgTgFSULBjohfLi5MZsyJIBeo4=
 =mLdP
 -----END PGP SIGNATURE-----

Merge tag 'kgdb-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux

Pull kgdb updates from Daniel Thompson:
 "Two cleanups this cycle. The larger of which is the removal of a
  private allocator within kdb and replacing it with regular memory
  allocation. The other adopts the simplified version of strscpy() in a
  couple of places in kdb"

* tag 'kgdb-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kdb: Remove optional size arguments from strscpy() calls
  kdb: remove usage of static environment buffer
2025-04-02 09:55:51 -07:00
Steven Rostedt
e4d4b8670c ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio()
Some architectures do not have data cache coherency between user and
kernel space. For these architectures, the cache needs to be flushed on
both the kernel and user addresses so that user space can see the updates
the kernel has made.

Instead of using flush_dcache_folio() and playing with virt_to_folio()
within the call to that function, use flush_kernel_vmap_range() which
takes the virtual address and does the work for those architectures that
need it.

This also fixes a bug where the flush of the reader page only flushed one
page. If the sub-buffer order is 1 or more, where the sub-buffer size
would be greater than a page, it would miss the rest of the sub-buffer
content, as the "reader page" is not just a page, but the size of a
sub-buffer.

Link: https://lore.kernel.org/all/CAG48ez3w0my4Rwttbc5tEbNsme6tc0mrSN95thjXUFaJ3aQ6SA@mail.gmail.com/

Cc: stable@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Link: https://lore.kernel.org/20250402144953.920792197@goodmis.org
Fixes: 117c39200d ("ring-buffer: Introducing ring-buffer mapping functions");
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:27 -04:00
Steven Rostedt
394f3f02de tracing: Use vmap_page_range() to map memmap ring buffer
The code to map the physical memory retrieved by memmap currently
allocates an array of pages to cover the physical memory and then calls
vmap() to map it to a virtual address. Instead of using this temporary
array of struct page descriptors, simply use vmap_page_range() that can
directly map the contiguous physical memory to a virtual address.

Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/20250402144953.754618481@goodmis.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:27 -04:00
Steven Rostedt
34ea8fa084 tracing: Have reserve_mem use phys_to_virt() and separate from memmap buffer
The reserve_mem kernel command line option may pass back a physical
address, but the memory is still part of the normal memory just like
using memblock_alloc() would be. This means that the physical memory
returned by the reserve_mem command line option can be converted directly
to virtual memory by simply using phys_to_virt().

When freeing the buffer there's no need to call vunmap() anymore as the
memory allocated by reserve_mem is freed by the call to
reserve_mem_release_by_name().

Because the persistent ring buffer can also be allocated via the memmap
option, which *is* different than normal memory as it cannot be added back
to the buddy system, it must be treated differently. It still needs to be
virtually mapped to have access to it. It also can not be freed nor can it
ever be memory mapped to user space.

Create a new trace_array flag called TRACE_ARRAY_FL_MEMMAP which gets set
if the buffer is created by the memmap option, and this will prevent the
buffer from being memory mapped by user space.

Also increment the ref count for memmap'ed buffers so that they can never
be freed.

Link: https://lore.kernel.org/all/Z-wFszhJ_9o4dc8O@kernel.org/

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/20250402144953.583750106@goodmis.org
Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:26 -04:00
Steven Rostedt
c44a14f216 tracing: Enforce the persistent ring buffer to be page aligned
Enforce that the address and the size of the memory used by the persistent
ring buffer is page aligned. Also update the documentation to reflect this
requirement.

Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/20250402144953.412882844@goodmis.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:26 -04:00
Masami Hiramatsu (Google)
dd941507a9 tracing: fprobe events: Fix possible UAF on modules
Commit ac91052f0a ("tracing: tprobe-events: Fix leakage of module
refcount") moved try_module_get() from __find_tracepoint_module_cb()
to find_tracepoint() caller, but that introduced a possible UAF
because the module can be unloaded before try_module_get(). In this
case, the module object should be freed too. Thus, try_module_get()
does not only fail but may access to the freed object.

To avoid that, try_module_get() in __find_tracepoint_module_cb()
again.

Link: https://lore.kernel.org/all/174342990779.781946.9138388479067729366.stgit@devnote2/

Fixes: ac91052f0a ("tracing: tprobe-events: Fix leakage of module refcount")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-04-02 23:18:58 +09:00
Masami Hiramatsu (Google)
d24fa977ee tracing: fprobe: Fix to lock module while registering fprobe
Since register_fprobe() does not get the module reference count while
registering fgraph filter, if the target functions (symbols) are in
modules, those modules can be unloaded when registering fprobe to
fgraph.

To avoid this issue, get the reference counter of module for each
symbol, and put it after register the fprobe.

Link: https://lore.kernel.org/all/174330568792.459674.16874380163991113156.stgit@devnote2/

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Closes: https://lore.kernel.org/all/20250325130628.3a9e234c@gandalf.local.home/
Fixes: 4346ba1604 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-04-02 23:18:57 +09:00
Gabriele Monaco
fc0585c7fa rv: Fix missing unlock on double nested monitors return path
RV doesn't support nested monitors having children monitors themselves
and exits with the EINVAL code. However, it returns without unlocking
the rv_interface_lock.

Unlock the lock before returning from the initialisation function.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250402071351.19864-2-gmonaco@redhat.com
Fixes: cb85c660fc ("rv: Add option for nested monitors and include sched")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/r/202503310200.UBXGitB4-lkp@intel.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:51:26 -04:00
Steven Rostedt
ea8d7647f9 tracing: Verify event formats that have "%*p.."
The trace event verifier checks the formats of trace events to make sure
that they do not point at memory that is not in the trace event itself or
in data that will never be freed. If an event references data that was
allocated when the event triggered and that same data is freed before the
event is read, then the kernel can crash by reading freed memory.

The verifier runs at boot up (or module load) and scans the print formats
of the events and checks their arguments to make sure that dereferenced
pointers are safe. If the format uses "%*p.." the verifier will ignore it,
and that could be dangerous. Cover this case as well.

Also add to the sample code a use case of "%*pbl".

Link: https://lore.kernel.org/all/bcba4d76-2c3f-4d11-baf0-02905db953dd@oracle.com/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 5013f454a3 ("tracing: Add check of trace event print fmts for dereferencing pointers")
Link: https://lore.kernel.org/20250327195311.2d89ec66@gandalf.local.home
Reported-by: Libo Chen <libo.chen@oracle.com>
Reviewed-by: Libo Chen <libo.chen@oracle.com>
Tested-by: Libo Chen <libo.chen@oracle.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:51:26 -04:00
zhoumin
42ea22e754 ftrace: Add cond_resched() to ftrace_graph_set_hash()
When the kernel contains a large number of functions that can be traced,
the loop in ftrace_graph_set_hash() may take a lot of time to execute.
This may trigger the softlockup watchdog.

Add cond_resched() within the loop to allow the kernel to remain
responsive even when processing a large number of functions.

This matches the cond_resched() that is used in other locations of the
code that iterates over all functions that can be traced.

Cc: stable@vger.kernel.org
Fixes: b9b0c831be ("ftrace: Convert graph filter to use hash tables")
Link: https://lore.kernel.org/tencent_3E06CE338692017B5809534B9C5C03DA7705@qq.com
Signed-off-by: zhoumin <teczm@foxmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:51:25 -04:00
Steven Rostedt
2c9ee74a6d tracing: Free module_delta on freeing of persistent ring buffer
If a persistent ring buffer is created, a "module_delta" array is also
allocated to hold the module deltas of loaded modules that match modules
in the scratch area. If this buffer gets freed, the module_delta array is
not freed and causes a memory leak.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250401124525.1f9ac02a@gandalf.local.home
Fixes: 35a380ddbc ("tracing: Show last module text symbols in the stacktrace")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:51:25 -04:00
Steven Rostedt
b81ff11c21 ftrace: Have tracing function args depend on PROBE_EVENTS_BTF_ARGS
The option PROBE_EVENTS_BTF_ARGS enables the functions
btf_find_func_proto() and btf_get_func_param() which are used by the
function argument tracing code. The option FUNCTION_TRACE_ARGS was
dependent on the same configs that PROBE_EVENTS_BTF_ARGS was dependent on,
but it was also dependent on PROBE_EVENTS_BTF_ARGS. In fact, if
PROBE_EVENTS_BTF_ARGS is supported then FUNCTION_TRACE_ARGS is supported.

Just make FUNCTION_TRACE_ARGS depend on PROBE_EVENTS_BTF_ARGS.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20250401113601.17fa1129@gandalf.local.home
Fixes: 533c20b062 ("ftrace: Add print_function_args()")
Closes: https://lore.kernel.org/all/DB9PR08MB75820599801BAD118D123D7D93AD2@DB9PR08MB7582.eurprd08.prod.outlook.com/
Reported-by: Christian Loehle <Christian.Loehle@arm.com>
Tested-by: Christian Loehle <Christian.Loehle@arm.com>
Tested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:50:56 -04:00
Waiman Long
a22b3d54de cgroup/cpuset: Fix race between newly created partition and dying one
There is a possible race between removing a cgroup diectory that is
a partition root and the creation of a new partition.  The partition
to be removed can be dying but still online, it doesn't not currently
participate in checking for exclusive CPUs conflict, but the exclusive
CPUs are still there in subpartitions_cpus and isolated_cpus. These
two cpumasks are global states that affect the operation of cpuset
partitions. The exclusive CPUs in dying cpusets will only be removed
when cpuset_css_offline() function is called after an RCU delay.

As a result, it is possible that a new partition can be created with
exclusive CPUs that overlap with those of a dying one. When that dying
partition is finally offlined, it removes those overlapping exclusive
CPUs from subpartitions_cpus and maybe isolated_cpus resulting in an
incorrect CPU configuration.

This bug was found when a warning was triggered in
remote_partition_disable() during testing because the subpartitions_cpus
mask was empty.

One possible way to fix this is to iterate the dying cpusets as well and
avoid using the exclusive CPUs in those dying cpusets. However, this
can still cause random partition creation failures or other anomalies
due to racing. A better way to fix this race is to reset the partition
state at the moment when a cpuset is being killed.

Introduce a new css_killed() CSS function pointer and call it, if
defined, before setting CSS_DYING flag in kill_css(). Also update the
css_is_dying() helper to use the CSS_DYING flag introduced by commit
33c35aa481 ("cgroup: Prevent kill_css() from being called more than
once") for proper synchronization.

Add a new cpuset_css_killed() function to reset the partition state of
a valid partition root if it is being killed.

Fixes: ee8dde0cd2 ("cpuset: Add new v2 cpuset.sched.partition flag")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-01 21:46:22 -10:00
Jeff Xu
3d38922abf mseal sysmap: uprobe mapping
Provide support to mseal the uprobe mapping.

Unlike other system mappings, the uprobe mapping is not established during
program startup.  However, its lifetime is the same as the process's
lifetime.  It could be sealed from creation.

Test was done with perf tool, and observe the uprobe mapping is sealed.

Link: https://lkml.kernel.org/r/20250305021711.3867874-6-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:16 -07:00
Linus Torvalds
2cd5769fb0 Driver core updates for 6.15-rc1
Here is the big set of driver core updates for 6.15-rc1.  Lots of stuff
 happened this development cycle, including:
   - kernfs scaling changes to make it even faster thanks to rcu
   - bin_attribute constify work in many subsystems
   - faux bus minor tweaks for the rust bindings
   - rust binding updates for driver core, pci, and platform busses,
     making more functionaliy available to rust drivers.  These are all
     due to people actually trying to use the bindings that were in 6.14.
   - make Rafael and Danilo full co-maintainers of the driver core
     codebase
   - other minor fixes and updates.
 
 This has been in linux-next for a while now, with the only reported
 issue being some merge conflicts with the rust tree.  Depending on which
 tree you pull first, you will have conflicts in one of them.  The merge
 resolution has been in linux-next as an example of what to do, or can be
 found here:
 	https://lore.kernel.org/r/CANiq72n3Xe8JcnEjirDhCwQgvWoE65dddWecXnfdnbrmuah-RQ@mail.gmail.com
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ+mMrg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylRgwCdH58OE3BgL0uoFY5vFImStpmPtqUAoL5HpVWI
 jtbJ+UuXGsnmO+JVNBEv
 =gy6W
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updatesk from Greg KH:
 "Here is the big set of driver core updates for 6.15-rc1. Lots of stuff
  happened this development cycle, including:

   - kernfs scaling changes to make it even faster thanks to rcu

   - bin_attribute constify work in many subsystems

   - faux bus minor tweaks for the rust bindings

   - rust binding updates for driver core, pci, and platform busses,
     making more functionaliy available to rust drivers. These are all
     due to people actually trying to use the bindings that were in
     6.14.

   - make Rafael and Danilo full co-maintainers of the driver core
     codebase

   - other minor fixes and updates"

* tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits)
  rust: platform: require Send for Driver trait implementers
  rust: pci: require Send for Driver trait implementers
  rust: platform: impl Send + Sync for platform::Device
  rust: pci: impl Send + Sync for pci::Device
  rust: platform: fix unrestricted &mut platform::Device
  rust: pci: fix unrestricted &mut pci::Device
  rust: device: implement device context marker
  rust: pci: use to_result() in enable_device_mem()
  MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers
  rust/kernel/faux: mark Registration methods inline
  driver core: faux: only create the device if probe() succeeds
  rust/faux: Add missing parent argument to Registration::new()
  rust/faux: Drop #[repr(transparent)] from faux::Registration
  rust: io: fix devres test with new io accessor functions
  rust: io: rename `io::Io` accessors
  kernfs: Move dput() outside of the RCU section.
  efi: rci2: mark bin_attribute as __ro_after_init
  rapidio: constify 'struct bin_attribute'
  firmware: qemu_fw_cfg: constify 'struct bin_attribute'
  powerpc/perf/hv-24x7: Constify 'struct bin_attribute'
  ...
2025-04-01 11:02:03 -07:00
Shakeel Butt
7d6c63c319 cgroup: rstat: call cgroup_rstat_updated_list with cgroup_rstat_lock
The commit 093c8812de ("cgroup: rstat: Cleanup flushing functions and
locking") during cleanup accidentally changed the code to call
cgroup_rstat_updated_list() without cgroup_rstat_lock which is required.
Fix it.

Fixes: 093c8812de ("cgroup: rstat: Cleanup flushing functions and locking")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: Breno Leitao <leitao@debian.org>
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/all/6564c3d6-9372-4352-9847-1eb3aea07ca4@linux.ibm.com/
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-01 07:33:26 -10:00
Linus Torvalds
d6b02199cd - The 7 patch series "powerpc/crash: use generic crashkernel
reservation" from Sourabh Jain changes powerpc's kexec code to use more
   of the generic layers.
 
 - The 2 patch series "get_maintainer: report subsystem status
   separately" from Vlastimil Babka makes some long-requested improvements
   to the get_maintainer output.
 
 - The 4 patch series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the ucount
   code.
 
 - The 12 patch series "reboot: support runtime configuration of
   emergency hw_protection action" from Ahmad Fatoum improves the ability
   for a driver to perform an emergency system shutdown or reboot.
 
 - The 16 patch series "Converge on using secs_to_jiffies() part two"
   from Easwar Hariharan performs further migrations from
   msecs_to_jiffies() to secs_to_jiffies().
 
 - The 7 patch series "lib/interval_tree: add some test cases and
   cleanup" from Wei Yang permits more userspace testing of kernel library
   code, adds some more tests and performs some cleanups.
 
 - The 2 patch series "hung_task: Dump the blocking task stacktrace" from
   Masami Hiramatsu arranges for the hung_task detector to dump the stack
   of the blocking task and not just that of the blocked task.
 
 - The 4 patch series "resource: Split and use DEFINE_RES*() macros" from
   Andy Shevchenko provides some cleanups to the resource definition
   macros.
 
 - Plus the usual shower of singleton patches - please see the individual
   changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nuqwAKCRDdBJ7gKXxA
 jtNqAQDxqJpjWkzn4yN9CNSs1ivVx3fr6SqazlYCrt3u89WQvwEA1oRrGpETzUGq
 r6khQUIcQImPPcjFqEFpuiSOU0MBZA0=
 =Kii8
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - The series "powerpc/crash: use generic crashkernel reservation" from
   Sourabh Jain changes powerpc's kexec code to use more of the generic
   layers.

 - The series "get_maintainer: report subsystem status separately" from
   Vlastimil Babka makes some long-requested improvements to the
   get_maintainer output.

 - The series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the
   ucount code.

 - The series "reboot: support runtime configuration of emergency
   hw_protection action" from Ahmad Fatoum improves the ability for a
   driver to perform an emergency system shutdown or reboot.

 - The series "Converge on using secs_to_jiffies() part two" from Easwar
   Hariharan performs further migrations from msecs_to_jiffies() to
   secs_to_jiffies().

 - The series "lib/interval_tree: add some test cases and cleanup" from
   Wei Yang permits more userspace testing of kernel library code, adds
   some more tests and performs some cleanups.

 - The series "hung_task: Dump the blocking task stacktrace" from Masami
   Hiramatsu arranges for the hung_task detector to dump the stack of
   the blocking task and not just that of the blocked task.

 - The series "resource: Split and use DEFINE_RES*() macros" from Andy
   Shevchenko provides some cleanups to the resource definition macros.

 - Plus the usual shower of singleton patches - please see the
   individual changelogs for details.

* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  mailmap: consolidate email addresses of Alexander Sverdlin
  fs/procfs: fix the comment above proc_pid_wchan()
  relay: use kasprintf() instead of fixed buffer formatting
  resource: replace open coded variant of DEFINE_RES()
  resource: replace open coded variants of DEFINE_RES_*_NAMED()
  resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
  resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
  samples: add hung_task detector mutex blocking sample
  hung_task: show the blocker task if the task is hung on mutex
  kexec_core: accept unaccepted kexec segments' destination addresses
  watchdog/perf: optimize bytes copied and remove manual NUL-termination
  lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
  lib/interval_tree: skip the check before go to the right subtree
  lib/interval_tree: add test case for span iteration
  lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
  lib/rbtree: add random seed
  lib/rbtree: split tests
  lib/rbtree: enable userland test suite for rbtree related data structure
  checkpatch: describe --min-conf-desc-length
  scripts/gdb/symbols: determine KASLR offset on s390
  ...
2025-04-01 10:06:52 -07:00
Linus Torvalds
eb0ece1602 - The 6 patch series "Enable strict percpu address space checks" from
Uros Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.
 
   This has caused a small amount of fallout - two or three issues were
   reported.  In all cases the calling code was founf to be incorrect.
 
 - The 4 patch series "Some cleanup for memcg" from Chen Ridong
   implements some relatively monir cleanups for the memcontrol code.
 
 - The 17 patch series "mm: fixes for device-exclusive entries (hmm)"
   from David Hildenbrand fixes a boatload of issues which David found then
   using device-exclusive PTE entries when THP is enabled.  More work is
   needed, but this makes thins better - our own HMM selftests now succeed.
 
 - The 2 patch series "mm: zswap: remove z3fold and zbud" from Yosry
   Ahmed remove the z3fold and zbud implementations.  They have been
   deprecated for half a year and nobody has complained.
 
 - The 5 patch series "mm: further simplify VMA merge operation" from
   Lorenzo Stoakes implements numerous simplifications in this area.  No
   runtime effects are anticipated.
 
 - The 4 patch series "mm/madvise: remove redundant mmap_lock operations
   from process_madvise()" from SeongJae Park rationalizes the locking in
   the madvise() implementation.  Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.
 
 - The 12 patch series "Tiny cleanup and improvements about SWAP code"
   from Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.
 
 - The 2 patch series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak user-visible
   output.
 
 - The 2 patch series "mm/damon/paddr: fix large folios access and
   schemes handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.
 
 - The 3 patch series "mm/damon/core: fix wrong and/or useless
   damos_walk() behaviors" from SeongJae Park fixes a few issues with the
   accuracy of kdamond's walking of DAMON regions.
 
 - The 3 patch series "expose mapping wrprotect, fix fb_defio use" from
   Lorenzo Stoakes changes the interaction between framebuffer deferred-io
   and core MM.  No functional changes are anticipated - this is
   preparatory work for the future removal of page structure fields.
 
 - The 4 patch series "mm/damon: add support for hugepage_size DAMOS
   filter" from Usama Arif adds a DAMOS filter which permits the filtering
   by huge page sizes.
 
 - The 4 patch series "mm: permit guard regions for file-backed/shmem
   mappings" from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state.  The feature now covers shmem and
   file-backed mappings.
 
 - The 4 patch series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping for
   pte-mapped large folios.
 
 - The 18 patch series "reimplement per-vma lock as a refcount" from
   Suren Baghdasaryan puts the vm_lock back into the vma.  Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy.  This patchset provides small (0-10%) improvements on one
   microbenchmark.
 
 - The 5 patch series "Docs/mm/damon: misc DAMOS filters documentation
   fixes and improves" from SeongJae Park does some maintenance work on the
   DAMON docs.
 
 - The 27 patch series "hugetlb/CMA improvements for large systems" from
   Frank van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for unmapped
   pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.
 
 - The 19 patch series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.
 
 - The 12 patch series "selftests/mm: Some cleanups from trying to run
   them" from Brendan Jackman fixes a pile of unrelated issues which
   Brendan encountered while runnimg our selftests.
 
 - The 2 patch series "fs/proc/task_mmu: add guard region bit to pagemap"
   from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.
 
 - The 7 patch series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply wasn't
   being effective.
 
 - The 5 patch series "mm: cleanups for device-exclusive entries (hmm)"
   from David Hildenbrand implements a number of unrelated cleanups in this
   code.
 
 - The 5 patch series "mm: Rework generic PTDUMP configs" from Anshuman
   Khandual implements a number of preparatoty cleanups to the
   GENERIC_PTDUMP Kconfig logic.
 
 - The 8 patch series "mm/damon: auto-tune aggregation interval" from
   SeongJae Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.
 
 - The 5 patch series "Fix lazy mmu mode" from Ryan Roberts fixes some
   issues in powerpc, sparc and x86 lazy MMU implementations.  Ryan did
   this in preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.
 
 - The 2 patch series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the code
   easier to follow.
 
 - The 3 patch series "page_counter cleanup and size reduction" from
   Shakeel Butt cleans up the page_counter code and fixes a size increase
   which we accidentally added late last year.
 
 - The 3 patch series "Add a command line option that enables control of
   how many threads should be used to allocate huge pages" from Thomas
   Prescher does that.  It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.
 
 - The 3 patch series "Fix calculations in trace_balance_dirty_pages()
   for cgwb" from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.
 
 - The 9 patch series "mm/damon: make allow filters after reject filters
   useful and intuitive" from SeongJae Park improves the handling of allow
   and reject filters.  Behaviour is made more consistent and the
   documention is updated accordingly.
 
 - The 5 patch series "Switch zswap to object read/write APIs" from Yosry
   Ahmed updates zswap to the new object read/write APIs and thus permits
   the removal of some legacy code from zpool and zsmalloc.
 
 - The 6 patch series "Some trivial cleanups for shmem" from Baolin Wang
   does as it claims.
 
 - The 20 patch series "fs/dax: Fix ZONE_DEVICE page reference counts"
   from Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.
 
 - The 4 patch series "refactor mremap and fix bug" from Lorenzo Stoakes
   is a preparatoty refactoring and cleanup of the mremap() code.
 
 - The 20 patch series "mm: MM owner tracking for large folios (!hugetlb)
   + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.
 
 - The 8 patch series "mm/damon: add sysfs dirs for managing DAMOS
   filters based on handling layers" from SeongJae Park adds a couple of
   new sysfs directories to ease the management of DAMON/DAMOS filters.
 
 - The 13 patch series "arch, mm: reduce code duplication in mem_init()"
   from Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.
 
 - The 13 patch series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.
 
 - The 3 patch series "mm: page_ext: Introduce new iteration API" from
   Luiz Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.
 
 - The 8 patch series "Buddy allocator like (or non-uniform) folio split"
   from Zi Yan reworks the code to split a folio into smaller folios.  The
   main benefit is lessened memory consumption: fewer post-split folios are
   generated.
 
 - The 2 patch series "Minimize xa_node allocation during xarry split"
   from Zi Yan reduces the number of xarray xa_nodes which are generated
   during an xarray split.
 
 - The 2 patch series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.
 
 - The 3 patch series "Add tracepoints for lowmem reserves, watermarks
   and totalreserve_pages" from Martin Liu adds some more tracepoints to
   the page allocator code.
 
 - The 4 patch series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which SeongJae
   observed during his earlier madvise work.
 
 - The 3 patch series "mm/hwpoison: Fix regressions in memory failure
   handling" from Shuai Xue addresses two quite serious regressions which
   Shuai has observed in the memory-failure implementation.
 
 - The 5 patch series "mm: reliable huge page allocator" from Johannes
   Weiner makes huge page allocations cheaper and more reliable by reducing
   fragmentation.
 
 - The 5 patch series "Minor memcg cleanups & prep for memdescs" from
   Matthew Wilcox is preparatory work for the future implementation of
   memdescs.
 
 - The 4 patch series "track memory used by balloon drivers" from Nico
   Pache introduces a way to track memory used by our various balloon
   drivers.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for active
   pages" from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.
 
 - The 2 patch series "Adding Proactive Memory Reclaim Statistics" from
   Hao Jia separates the proactive reclaim statistics from the direct
   reclaim statistics.
 
 - The 2 patch series "mm/vmscan: don't try to reclaim hwpoison folio"
   from Jinjiang Tu fixes our handling of hwpoisoned pages within the
   reclaim code.
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nZaAAKCRDdBJ7gKXxA
 jsOWAPiP4r7CJHMZRK4eyJOkvS1a1r+TsIarrFZtjwvf/GIfAQCEG+JDxVfUaUSF
 Ee93qSSLR1BkNdDw+931Pu0mXfbnBw==
 =Pn2K
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - The series "Enable strict percpu address space checks" from Uros
   Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.

   This has caused a small amount of fallout - two or three issues were
   reported. In all cases the calling code was found to be incorrect.

 - The series "Some cleanup for memcg" from Chen Ridong implements some
   relatively monir cleanups for the memcontrol code.

 - The series "mm: fixes for device-exclusive entries (hmm)" from David
   Hildenbrand fixes a boatload of issues which David found then using
   device-exclusive PTE entries when THP is enabled. More work is
   needed, but this makes thins better - our own HMM selftests now
   succeed.

 - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
   remove the z3fold and zbud implementations. They have been deprecated
   for half a year and nobody has complained.

 - The series "mm: further simplify VMA merge operation" from Lorenzo
   Stoakes implements numerous simplifications in this area. No runtime
   effects are anticipated.

 - The series "mm/madvise: remove redundant mmap_lock operations from
   process_madvise()" from SeongJae Park rationalizes the locking in the
   madvise() implementation. Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.

 - The series "Tiny cleanup and improvements about SWAP code" from
   Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.

 - The series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak
   user-visible output.

 - The series "mm/damon/paddr: fix large folios access and schemes
   handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.

 - The series "mm/damon/core: fix wrong and/or useless damos_walk()
   behaviors" from SeongJae Park fixes a few issues with the accuracy of
   kdamond's walking of DAMON regions.

 - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
   Stoakes changes the interaction between framebuffer deferred-io and
   core MM. No functional changes are anticipated - this is preparatory
   work for the future removal of page structure fields.

 - The series "mm/damon: add support for hugepage_size DAMOS filter"
   from Usama Arif adds a DAMOS filter which permits the filtering by
   huge page sizes.

 - The series "mm: permit guard regions for file-backed/shmem mappings"
   from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state. The feature now covers shmem and
   file-backed mappings.

 - The series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping
   for pte-mapped large folios.

 - The series "reimplement per-vma lock as a refcount" from Suren
   Baghdasaryan puts the vm_lock back into the vma. Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy. This patchset provides small (0-10%) improvements on one
   microbenchmark.

 - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
   improves" from SeongJae Park does some maintenance work on the DAMON
   docs.

 - The series "hugetlb/CMA improvements for large systems" from Frank
   van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.

 - The series "mm/damon: introduce DAMOS filter type for unmapped pages"
   from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.

 - The series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.

 - The series "selftests/mm: Some cleanups from trying to run them" from
   Brendan Jackman fixes a pile of unrelated issues which Brendan
   encountered while runnimg our selftests.

 - The series "fs/proc/task_mmu: add guard region bit to pagemap" from
   Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.

 - The series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply
   wasn't being effective.

 - The series "mm: cleanups for device-exclusive entries (hmm)" from
   David Hildenbrand implements a number of unrelated cleanups in this
   code.

 - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
   implements a number of preparatoty cleanups to the GENERIC_PTDUMP
   Kconfig logic.

 - The series "mm/damon: auto-tune aggregation interval" from SeongJae
   Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.

 - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
   powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
   preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.

 - The series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the
   code easier to follow.

 - The series "page_counter cleanup and size reduction" from Shakeel
   Butt cleans up the page_counter code and fixes a size increase which
   we accidentally added late last year.

 - The series "Add a command line option that enables control of how
   many threads should be used to allocate huge pages" from Thomas
   Prescher does that. It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.

 - The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
   from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.

 - The series "mm/damon: make allow filters after reject filters useful
   and intuitive" from SeongJae Park improves the handling of allow and
   reject filters. Behaviour is made more consistent and the documention
   is updated accordingly.

 - The series "Switch zswap to object read/write APIs" from Yosry Ahmed
   updates zswap to the new object read/write APIs and thus permits the
   removal of some legacy code from zpool and zsmalloc.

 - The series "Some trivial cleanups for shmem" from Baolin Wang does as
   it claims.

 - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
   Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.

 - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
   preparatoty refactoring and cleanup of the mremap() code.

 - The series "mm: MM owner tracking for large folios (!hugetlb) +
   CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.

 - The series "mm/damon: add sysfs dirs for managing DAMOS filters based
   on handling layers" from SeongJae Park adds a couple of new sysfs
   directories to ease the management of DAMON/DAMOS filters.

 - The series "arch, mm: reduce code duplication in mem_init()" from
   Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.

 - The series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.

 - The series "mm: page_ext: Introduce new iteration API" from Luiz
   Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.

 - The series "Buddy allocator like (or non-uniform) folio split" from
   Zi Yan reworks the code to split a folio into smaller folios. The
   main benefit is lessened memory consumption: fewer post-split folios
   are generated.

 - The series "Minimize xa_node allocation during xarry split" from Zi
   Yan reduces the number of xarray xa_nodes which are generated during
   an xarray split.

 - The series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.

 - The series "Add tracepoints for lowmem reserves, watermarks and
   totalreserve_pages" from Martin Liu adds some more tracepoints to the
   page allocator code.

 - The series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which
   SeongJae observed during his earlier madvise work.

 - The series "mm/hwpoison: Fix regressions in memory failure handling"
   from Shuai Xue addresses two quite serious regressions which Shuai
   has observed in the memory-failure implementation.

 - The series "mm: reliable huge page allocator" from Johannes Weiner
   makes huge page allocations cheaper and more reliable by reducing
   fragmentation.

 - The series "Minor memcg cleanups & prep for memdescs" from Matthew
   Wilcox is preparatory work for the future implementation of memdescs.

 - The series "track memory used by balloon drivers" from Nico Pache
   introduces a way to track memory used by our various balloon drivers.

 - The series "mm/damon: introduce DAMOS filter type for active pages"
   from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.

 - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
   separates the proactive reclaim statistics from the direct reclaim
   statistics.

 - The series "mm/vmscan: don't try to reclaim hwpoison folio" from
   Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
   code.

* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
  mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
  x86/mm: restore early initialization of high_memory for 32-bits
  mm/vmscan: don't try to reclaim hwpoison folio
  mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
  cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
  mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
  selftests/mm: speed up split_huge_page_test
  selftests/mm: uffd-unit-tests support for hugepages > 2M
  docs/mm/damon/design: document active DAMOS filter type
  mm/damon: implement a new DAMOS filter type for active pages
  fs/dax: don't disassociate zero page entries
  MM documentation: add "Unaccepted" meminfo entry
  selftests/mm: add commentary about 9pfs bugs
  fork: use __vmalloc_node() for stack allocation
  docs/mm: Physical Memory: Populate the "Zones" section
  xen: balloon: update the NR_BALLOON_PAGES state
  hv_balloon: update the NR_BALLOON_PAGES state
  balloon_compaction: update the NR_BALLOON_PAGES state
  meminfo: add a per node counter for balloon drivers
  mm: remove references to folio in __memcg_kmem_uncharge_page()
  ...
2025-04-01 09:29:18 -07:00
Waiman Long
52e039f9e2 cgroup/cpuset: Remove unneeded goto in sched_partition_write() and rename it
The goto statement in sched_partition_write() is not needed. Remove
it and rename sched_partition_write()/sched_partition_show() to
cpuset_partition_write()/cpuset_partition_show().

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31 13:27:34 -10:00
Waiman Long
f0a0bd3d23 cgroup/cpuset: Code cleanup and comment update
Rename partition_xcpus_newstate() to isolated_cpus_update(),
update_partition_exclusive() to update_partition_exclusive_flag() and
the new_xcpus_state variable to isolcpus_updated to make their meanings
more explicit. Also add some comments to further clarify the code.
No functional change is expected.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31 13:27:19 -10:00
Waiman Long
6da580ec65 cgroup/cpuset: Don't allow creation of local partition over a remote one
Currently, we don't allow the creation of a remote partition underneath
another local or remote partition. However, it is currently possible to
create a new local partition with an existing remote partition underneath
it if top_cpuset is the parent. However, the current cpuset code does
not set the effective exclusive CPUs correctly to account for those
that are taken by the remote partition.

Changing the code to properly account for those remote partition CPUs
under all possible circumstances can be complex. It is much easier to
not allow such a configuration which is not that useful. So forbid
that by making sure that exclusive_cpus mask doesn't overlap with
subpartitions_cpus and invalidate the partition if that happens.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31 13:26:53 -10:00
Waiman Long
f62a5d3936 cgroup/cpuset: Remove remote_partition_check() & make update_cpumasks_hier() handle remote partition
Currently, changes in exclusive CPUs are being handled in
remote_partition_check() by disabling conflicting remote partitions.
However, that may lead to results unexpected by the users. Fix
this problem by removing remote_partition_check() and making
update_cpumasks_hier() handle changes in descendant remote partitions
properly.

The compute_effective_exclusive_cpumask() function is enhanced to check
the exclusive_cpus and effective_xcpus from siblings and excluded them
in its effective exclusive CPUs computation and return a value to show if
there is any sibling conflicts.  This is somewhat like the cpu_exclusive
flag check in validate_change(). This is the initial step to enable us
to retire the use of cpu_exclusive flag in cgroup v2 in the future.

One of the tests in the TEST_MATRIX of the test_cpuset_prs.sh
script has to be updated due to changes in the way a child remote
partition root is being handled (updated instead of invalidation)
in update_cpumasks_hier().

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31 13:25:56 -10:00
Waiman Long
8bf450f3ae cgroup/cpuset: Fix error handling in remote_partition_disable()
When remote_partition_disable() is called to disable a remote partition,
it always sets the partition to an invalid partition state. It should
only do so if an error code (prs_err) has been set. Correct that and
add proper error code in places where remote_partition_disable() is
called due to error.

Fixes: 181c8e091a ("cgroup/cpuset: Introduce remote partition")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31 13:25:38 -10:00
Waiman Long
668e041662 cgroup/cpuset: Fix incorrect isolated_cpus update in update_parent_effective_cpumask()
Before commit f0af1bfc27 ("cgroup/cpuset: Relax constraints to
partition & cpus changes"), a cpuset partition cannot be enabled if not
all the requested CPUs can be granted from the parent cpuset. After
that commit, a cpuset partition can be created even if the requested
exclusive CPUs contain CPUs not allowed its parent.  The delmask
containing exclusive CPUs to be removed from its parent wasn't
adjusted accordingly.

That is not a problem until the introduction of a new isolated_cpus
mask in commit 11e5f407b6 ("cgroup/cpuset: Keep track of CPUs in
isolated partitions") as the CPUs in the delmask may be added directly
into isolated_cpus.

As a result, isolated_cpus may incorrectly contain CPUs that are not
isolated leading to incorrect data reporting. Fix this by adjusting
the delmask to reflect the actual exclusive CPUs for the creation of
the partition.

Fixes: 11e5f407b6 ("cgroup/cpuset: Keep track of CPUs in isolated partitions")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31 13:25:22 -10:00
Linus Torvalds
46d29f23a7 ring-buffer updates for v6.15
- Restructure the persistent memory to have a "scratch" area
 
   Instead of hard coding the KASLR offset in the persistent memory
   by the ring buffer, push that work up to the callers of the persistent
   memory as they are the ones that need this information. The offsets
   and such is not important to the ring buffer logic and it should
   not be part of that.
 
   A scratch pad is now created when the caller allocates a ring buffer
   from persistent memory by stating how much memory it needs to save.
 
 - Allow where modules are loaded to be saved in the new scratch pad
 
   Save the addresses of modules when they are loaded into the persistent
   memory scratch pad.
 
 - A new module_for_each_mod() helper function was created
 
   With the acknowledgement of the module maintainers a new module helper
   function was created to iterate over all the currently loaded modules.
   This has a callback to be called for each module. This is needed for
   when tracing is started in the persistent buffer and the currently loaded
   modules need to be saved in the scratch area.
 
 - Expose the last boot information where the kernel and modules were loaded
 
   The last_boot_info file is updated to print out the addresses of where
   the kernel "_text" location was loaded from a previous boot, as well
   as where the modules are loaded. If the buffer is recording the current
   boot, it only prints "# Current" so that it does not expose the KASLR
   offset of the currently running kernel.
 
 - Allow the persistent ring buffer to be released (freed)
 
   To have this in production environments, where the kernel command line can
   not be changed easily, the ring buffer needs to be freed when it is not
   going to be used. The memory for the buffer will always be allocated at
   boot up, but if the system isn't going to enable tracing, the memory needs
   to be freed. Allow it to be freed and added back to the kernel memory
   pool.
 
 - Allow stack traces to print the function names in the persistent buffer
 
   Now that the modules are saved in the persistent ring buffer, if the same
   modules are loaded, the printing of the function names will examine the
   saved modules. If the module is found in the scratch area and is also
   loaded, then it will do the offset shift and use kallsyms to display the
   function name. If the address is not found, it simply displays the address
   from the previous boot in hex.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+cUERQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrAsAQCFt2nfzxoe3wtF5EqIT1VHp/8bQVjG
 gBe8B6ouboreogD/dS7yK8MRy24ZAmObGwYG0RbVicd50S7P8Rf7+823ng8=
 =OJKk
 -----END PGP SIGNATURE-----

Merge tag 'trace-ringbuffer-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer updates from Steven Rostedt:

 - Restructure the persistent memory to have a "scratch" area

   Instead of hard coding the KASLR offset in the persistent memory by
   the ring buffer, push that work up to the callers of the persistent
   memory as they are the ones that need this information. The offsets
   and such is not important to the ring buffer logic and it should not
   be part of that.

   A scratch pad is now created when the caller allocates a ring buffer
   from persistent memory by stating how much memory it needs to save.

 - Allow where modules are loaded to be saved in the new scratch pad

   Save the addresses of modules when they are loaded into the
   persistent memory scratch pad.

 - A new module_for_each_mod() helper function was created

   With the acknowledgement of the module maintainers a new module
   helper function was created to iterate over all the currently loaded
   modules. This has a callback to be called for each module. This is
   needed for when tracing is started in the persistent buffer and the
   currently loaded modules need to be saved in the scratch area.

 - Expose the last boot information where the kernel and modules were
   loaded

   The last_boot_info file is updated to print out the addresses of
   where the kernel "_text" location was loaded from a previous boot, as
   well as where the modules are loaded. If the buffer is recording the
   current boot, it only prints "# Current" so that it does not expose
   the KASLR offset of the currently running kernel.

 - Allow the persistent ring buffer to be released (freed)

   To have this in production environments, where the kernel command
   line can not be changed easily, the ring buffer needs to be freed
   when it is not going to be used. The memory for the buffer will
   always be allocated at boot up, but if the system isn't going to
   enable tracing, the memory needs to be freed. Allow it to be freed
   and added back to the kernel memory pool.

 - Allow stack traces to print the function names in the persistent
   buffer

   Now that the modules are saved in the persistent ring buffer, if the
   same modules are loaded, the printing of the function names will
   examine the saved modules. If the module is found in the scratch area
   and is also loaded, then it will do the offset shift and use kallsyms
   to display the function name. If the address is not found, it simply
   displays the address from the previous boot in hex.

* tag 'trace-ringbuffer-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Use _text and the kernel offset in last_boot_info
  tracing: Show last module text symbols in the stacktrace
  ring-buffer: Remove the unused variable bmeta
  tracing: Skip update_last_data() if cleared and remove active check for save_mod()
  tracing: Initialize scratch_size to zero to prevent UB
  tracing: Fix a compilation error without CONFIG_MODULES
  tracing: Freeable reserved ring buffer
  mm/memblock: Add reserved memory release function
  tracing: Update modules to persistent instances when loaded
  tracing: Show module names and addresses of last boot
  tracing: Have persistent trace instances save module addresses
  module: Add module_for_each_mod() function
  tracing: Have persistent trace instances save KASLR offset
  ring-buffer: Add ring_buffer_meta_scratch()
  ring-buffer: Add buffer meta data for persistent ring buffer
  ring-buffer: Use kaslr address instead of text delta
  ring-buffer: Fix bytes_dropped calculation issue
2025-03-31 13:37:22 -07:00
Yeoreum Yun
a3c3c66670 perf/core: Fix child_total_time_enabled accounting bug at task exit
The perf events code fails to account for total_time_enabled of
inactive events.

Here is a failure case for accounting total_time_enabled for
CPU PMU events:

  sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 2s
  ...

  armv8_pmuv3_0/event=0x08/: 1138698008 2289429840 2174835740
  armv8_pmuv3_1/event=0x08/: 1826791390 1950025700 847648440
                             `          `          `
                             `          `          > total_time_running with child
                             `          > total_time_enabled with child
                             > count with child

  Performance counter stats for 'stress-ng --pthread=2 -t 2s':

       1,138,698,008      armv8_pmuv3_0/event=0x08/                                               (94.99%)
       1,826,791,390      armv8_pmuv3_1/event=0x08/                                               (43.47%)

The two events above are opened on two different CPU PMUs, for example,
each event is opened for a cluster in an Arm big.LITTLE system, they
will never run on the same CPU.  In theory, the total enabled time should
be same for both events, as two events are opened and closed together.

As the result show, the two events' total enabled time including
child event is different (2289429840 vs 1950025700).

This is because child events are not accounted properly
if a event is INACTIVE state when the task exits:

  perf_event_exit_event()
   `> perf_remove_from_context()
     `> __perf_remove_from_context()
       `> perf_child_detach()   -> Accumulate child_total_time_enabled
         `> list_del_event()    -> Update child event's time

The problem is the time accumulation happens prior to child event's
time updating. Thus, it misses to account the last period's time when
the event exits.

The perf core layer follows the rule that timekeeping is tied to state
change. To address the issue, make __perf_remove_from_context()
handle the task exit case by passing 'DETACH_EXIT' to it and
invoke perf_event_state() for state alongside with accounting the time.

Then, perf_child_detach() populates the time into the parent's time metrics.

After this patch, the bug is fixed:

  sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 10s
  ...
  armv8_pmuv3_0/event=0x08/: 15396770398 32157963940 21898169000
  armv8_pmuv3_1/event=0x08/: 22428964974 32157963940 10259794940

   Performance counter stats for 'stress-ng --pthread=2 -t 10s':

      15,396,770,398      armv8_pmuv3_0/event=0x08/                                               (68.10%)
      22,428,964,974      armv8_pmuv3_1/event=0x08/                                               (31.90%)

[ mingo: Clarified the changelog. ]

Fixes: ef54c1a476 ("perf: Rework perf_event_exit_event()")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250326082003.1630986-1-yeoreum.yun@arm.com
2025-03-31 12:57:38 +02:00
Linus Torvalds
01d5b167dc Modules changes for 6.15-rc1
- Use RCU instead of RCU-sched
 
   The mix of rcu_read_lock(), rcu_read_lock_sched() and preempt_disable()
   in the module code and its users has been replaced with just
   rcu_read_lock().
 
 - The rest of changes are smaller fixes and updates.
 
 The changes have been on linux-next for at least 2 weeks, with the RCU
 cleanup present for 2 months. One performance problem was reported with the
 RCU change when KASAN + lockdep were enabled, but it was effectively
 addressed by the already merged ee57ab5a32 ("locking/lockdep: Disable
 KASAN instrumentation of lockdep.c").
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEIduBR9MnFA82q/jtumpXJwqY6poFAmfmwrsUHHBldHIucGF2
 bHVAc3VzZS5jb20ACgkQumpXJwqY6prWxgf/S7Pvdywm10vJ6fooYa+GxXNMwhyh
 XRjZ4m9gjeTNf2KLwX0XHv0XZeFHOmHfjd3iI+pS6CXZnCFTN9J3XPLYsrTxXUb6
 U6zzLf8Zsz8TzeI4dgvSBsZln7oICSACkAgdJCq23hpNKeaeRo91dgiZaIwyZJG3
 FekqSFtP7pYhfFoNkrFKysqbgl1+RWWZ79L2qRJA0bPzVFlvRUuh6cOHQw+8RMqf
 BYLwnArjTkW8AcXpxIXSiwphDHVZ81B96xoplavyoprA5FDpv1W+8y4DtxdWFn+1
 QVWCs/ZV3KrwXWpZev625w3fIOOIXILqRINOzLfvXTw+1xFS3TzSQEpVeg==
 =4OKc
 -----END PGP SIGNATURE-----

Merge tag 'modules-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux

Pull modules updates from Petr Pavlu:

 - Use RCU instead of RCU-sched

   The mix of rcu_read_lock(), rcu_read_lock_sched() and
   preempt_disable() in the module code and its users has
   been replaced with just rcu_read_lock()

 - The rest of changes are smaller fixes and updates

* tag 'modules-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: (32 commits)
  MAINTAINERS: Update the MODULE SUPPORT section
  module: Remove unnecessary size argument when calling strscpy()
  module: Replace deprecated strncpy() with strscpy()
  params: Annotate struct module_param_attrs with __counted_by()
  bug: Use RCU instead RCU-sched to protect module_bug_list.
  static_call: Use RCU in all users of __module_text_address().
  kprobes: Use RCU in all users of __module_text_address().
  bpf: Use RCU in all users of __module_text_address().
  jump_label: Use RCU in all users of __module_text_address().
  jump_label: Use RCU in all users of __module_address().
  x86: Use RCU in all users of __module_address().
  cfi: Use RCU while invoking __module_address().
  powerpc/ftrace: Use RCU in all users of __module_text_address().
  LoongArch: ftrace: Use RCU in all users of __module_text_address().
  LoongArch/orc: Use RCU in all users of __module_address().
  arm64: module: Use RCU in all users of __module_text_address().
  ARM: module: Use RCU in all users of __module_text_address().
  module: Use RCU in all users of __module_text_address().
  module: Use RCU in all users of __module_address().
  module: Use RCU in search_module_extables().
  ...
2025-03-30 15:44:36 -07:00
Linus Torvalds
7405c0f01a Miscellaneous x86 fixes and updates:
- Fix a large number of x86 Kconfig dependency and help text accuracy
    bugs/problems, by Mateusz Jończyk and David Heideberg.
 
  - Fix a VM_PAT interaction with fork() crash. This also touches
    core kernel code.
 
  - Fix an ORC unwinder bug for interrupt entries
 
  - Fixes and cleanups.
 
  - Fix an AMD microcode loader bug that can promote verification failures
    into success.
 
  - Add early-printk support for MMIO based UARTs on an x86 board that
    had no other serial debugging facility and also experienced early
    boot crashes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfnFBERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iVDxAAmiB4soT3/WbaWJJdeVyxEL7sOmUNOm04
 5kAVHJVK8QGdje0eWa6h7xmuQD3UOxafE2coCrOxHhZi2qpAAY6CPIIy6oIBRwZK
 gLgT5xn1CHojfm4UFC3YUOyecBRPUF2C5jfkajWdZHumyPP/sOObqvGanpQRAYd5
 bfPHEvrBpeEeS7WkATCdyF2j+I5xYflD4g/MDAsMmqasQHOnjBuFX5VBeVxxkysC
 dMsFkFpxqcA95MnnyOnxXzgOtRTY0UystX07D3Bk1pqhG9zor+mp8OynsTRCU87T
 ZPPbUr2qACNmCqEEXl+F1mAkgj5H66xE2gaJdYx0/jBAIbX8Nwih7mMxhJShVU07
 Lhc0tukmVrDoDaVIr2HsxqI8iokuYLszUjDAqEQmQDrgelL6usPYghN1b2bDSJ9r
 0hCO/s79024H/U9oMrC+CF52D5UH/fE98ipigrbKRIO/hOsoxiiniF3DG2NVWZM2
 n5nPnOdbperqjCEteN1nxQfr7XZkvP95Bwmuqqc90XH+tzKJdHruUkbm4ua7NEEz
 WKgsUIYFjeN5ZrHbJaNtHlQueTyvsyGmL1nlaLi/MaJbSXPsM/WfwvHsaKTh3NrE
 BFwEAhMZVLDHEfnFT0Ev7Mm1MGpW8MbHoRBR1+E5FWWNS4X0yGLKXWRp8diw25Tm
 W3ZVsn65E6U=
 =/qKX
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes and updates from Ingo Molnar:

 - Fix a large number of x86 Kconfig dependency and help text accuracy
   bugs/problems, by Mateusz Jończyk and David Heideberg

 - Fix a VM_PAT interaction with fork() crash. This also touches core
   kernel code

 - Fix an ORC unwinder bug for interrupt entries

 - Fixes and cleanups

 - Fix an AMD microcode loader bug that can promote verification
   failures into success

 - Add early-printk support for MMIO based UARTs on an x86 board that
   had no other serial debugging facility and also experienced early
   boot crashes

* tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Fix __apply_microcode_amd()'s return value
  x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
  x86/fpu: Update the outdated comment above fpstate_init_user()
  x86/early_printk: Add support for MMIO-based UARTs
  x86/dumpstack: Fix inaccurate unwinding from exception stacks due to misplaced assignment
  x86/entry: Fix ORC unwinder for PUSH_REGS with save_ret=1
  x86/Kconfig: Fix lists in X86_EXTENDED_PLATFORM help text
  x86/Kconfig: Correct X86_X2APIC help text
  x86/speculation: Remove the extra #ifdef around CALL_NOSPEC
  x86/Kconfig: Document release year of glibc 2.3.3
  x86/Kconfig: Make CONFIG_PCI_CNB20LE_QUIRK depend on X86_32
  x86/Kconfig: Document CONFIG_PCI_MMCONFIG
  x86/Kconfig: Update lists in X86_EXTENDED_PLATFORM
  x86/Kconfig: Move all X86_EXTENDED_PLATFORM options together
  x86/Kconfig: Always enable ARCH_SPARSEMEM_ENABLE
  x86/Kconfig: Enable X86_X2APIC by default and improve help text
2025-03-30 15:25:15 -07:00
Linus Torvalds
b4c5c57c2d Miscellaneous locking fixes and updates:
- Fix a locking self-test FAIL on PREEMPT_RT kernels
  - Fix nr_unused_locks accounting bug
  - Simplify the split-lock debugging feature's fast-path
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfnDqERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jXVw/+IGVWjYnw5Ac+HyccKwY7+OEmKiGx+9C4
 yS3pc1wrofkpIVRsr0GuOv2h5V+YZJvuRrZPbatVidUkEJAQ++Iaeef+gnunfMX2
 0mRk5B0CG1M+y3lNFOhIEafkGDQOtdFTOmoRv508Xj3mb8cRMRCgkf8S3600l2IB
 ZEw67f8waHy1PwKC+dFlr0QjGL0+qbkrDqB1m4VSy0/egaSNMFjYVLLBAe6gn+/X
 zU/SWBqyV9jSfmwABSiKCpjt/GziuwGoADJpxAbSR/1NaCy866VSBLRjiiLI/c5q
 xdKd3GyD3IhLDCbzKrwVugeioxoPzXLmZKajVhZARv2kPW8DbzUMymP/eVkOf7VJ
 6dDNV3Yyq568YQBi3PQu2vESumHRctLOsRSAr2TnXlbFzBvjBM+Ia11e+p4mg9FU
 Hcn98Rt3FfgigrzH4IeLaYTbm6A5amj2ymoMwjR/vchnB8tlXddc3/KumIdak04Q
 hHRHA0cyNg1YvB/8yB0NKf5jETpbYaaKhCO9KlsceLDjuhrwlmi/XFz+bwBJIBgD
 zug2hevSW2RVRGidJ5Qz91qG76xBkhH038qorcegzGybMQ0p+Hw+/BN7OzPSueaq
 ZMRFde3CtrB332SS73KToG5XvyuffhW8XyCvkOt5W6z9P7n3UOFzDcKr/wrePV9d
 9kEJPdOCOwM=
 =jcdr
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc locking fixes and updates from Ingo Molnar:

 - Fix a locking self-test FAIL on PREEMPT_RT kernels

 - Fix nr_unused_locks accounting bug

 - Simplify the split-lock debugging feature's fast-path

* tag 'locking-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Decrease nr_unused_locks if lock unused in zap_class()
  lockdep: Fix wait context check on softirq for PREEMPT_RT
  x86/split_lock: Simplify reenabling
2025-03-30 15:18:36 -07:00
Linus Torvalds
aa918db707 bpf_try_alloc_pages
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmfkHCQACgkQ6rmadz2v
 bTrzWhAAnDcJsGgSJ9EbElpTfgBWE7aijXo/MsPxxRhORc0uR6MnhPx1iADP4KYj
 lTGEIBgRuDG3qaM4EXpPd32rUJJHv8hot7z9zfvUgSuFNLZEHWXJtz/i4ileOxin
 08zV+zA5WL2fqamAmMRFMI37DeSWy3xU0/qlbWgNnURjPjRri6CF4rVFUWq+QMY+
 XP8ITD/6nOLUR6Bq2M18aHnk2VJWkxVP9Oi+vz1VHbOjKaJC7ATa1+Q4qMqWyTb1
 8IAYWiZR1ZPc214ITaspVzLoLb/wxHxy3QMrdAWAL6sjp0B4J8YxIq1qsBuR1FN7
 TxTRQND/+LjqrAgs5AmFqz3ndKmahjGQWnQEh/rDYJtx+sLJk9hfsMIDF8Wmxuwl
 RftdV0g9bPljR5Qgc9i8DNtEjoAbNjoP8xLjt9HfQakVl8V9jPe0bxZ5tJDf+T0M
 n/VgEjaRzdXqFOLal6Z5wl/jkIn1l1kWQuCMI2z5Z0Ls+PlYX56xdZxfK2Rh3m+e
 3W89vqj9ytJ3rZKG8DRsxukuHwnJ+Gia3XI2h/5cc8kEM5ss1Ase8oIkmrwaLd9x
 +zVXNoDCCPRQgTStwItW+2YdFmE9uijhEZh9yPwT1/rtFuKd0oSebVIpjih/bGqH
 mMN9gYO4+ArSbqku9X2lP3VjMOf6M6SZGm+PzG25PAMGzjqGqwk=
 =AHTr
 -----END PGP SIGNATURE-----

Merge tag 'bpf_try_alloc_pages' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf try_alloc_pages() support from Alexei Starovoitov:
 "The pull includes work from Sebastian, Vlastimil and myself with a lot
  of help from Michal and Shakeel.

  This is a first step towards making kmalloc reentrant to get rid of
  slab wrappers: bpf_mem_alloc, kretprobe's objpool, etc. These patches
  make page allocator safe from any context.

  Vlastimil kicked off this effort at LSFMM 2024:

    https://lwn.net/Articles/974138/

  and we continued at LSFMM 2025:

    https://lore.kernel.org/all/CAADnVQKfkGxudNUkcPJgwe3nTZ=xohnRshx9kLZBTmR_E1DFEg@mail.gmail.com/

  Why:

  SLAB wrappers bind memory to a particular subsystem making it
  unavailable to the rest of the kernel. Some BPF maps in production
  consume Gbytes of preallocated memory. Top 5 in Meta: 1.5G, 1.2G,
  1.1G, 300M, 200M. Once we have kmalloc that works in any context BPF
  map preallocation won't be necessary.

  How:

  Synchronous kmalloc/page alloc stack has multiple stages going from
  fast to slow: cmpxchg16 -> slab_alloc -> new_slab -> alloc_pages ->
  rmqueue_pcplist -> __rmqueue, where rmqueue_pcplist was already
  relying on trylock.

  This set changes rmqueue_bulk/rmqueue_buddy to attempt a trylock and
  return ENOMEM if alloc_flags & ALLOC_TRYLOCK. It then wraps this
  functionality into try_alloc_pages() helper. We make sure that the
  logic is sane in PREEMPT_RT.

  End result: try_alloc_pages()/free_pages_nolock() are safe to call
  from any context.

  try_kmalloc() for any context with similar trylock approach will
  follow. It will use try_alloc_pages() when slab needs a new page.
  Though such try_kmalloc/page_alloc() is an opportunistic allocator,
  this design ensures that the probability of successful allocation of
  small objects (up to one page in size) is high.

  Even before we have try_kmalloc(), we already use try_alloc_pages() in
  BPF arena implementation and it's going to be used more extensively in
  BPF"

* tag 'bpf_try_alloc_pages' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  mm: Fix the flipped condition in gfpflags_allow_spinning()
  bpf: Use try_alloc_pages() to allocate pages for bpf needs.
  mm, bpf: Use memcg in try_alloc_pages().
  memcg: Use trylock to access memcg stock_lock.
  mm, bpf: Introduce free_pages_nolock()
  mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation
  locking/local_lock: Introduce localtry_lock_t
2025-03-30 13:45:28 -07:00
Linus Torvalds
494e7fe591 bpf_res_spin_lock
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmfcq3kACgkQ6rmadz2v
 bToxkw/8DHIqjVnzU2O9hbRM1anYo6yM8e34IxCt0ajHTSEVJ93+C161QDWo/6Dk
 +RNlaeGekaBUk+QOLb4u+rzZ2eR/pWSm37xuDRAiBCQ+3MgR60gGRaSljpS3IUem
 0FvS6C1HObBCEUXMU2rNv/5cJB5/qrQYa9FEEjRvBTLqgQkdS7yaW/KKuZaNb+Ts
 KiEeWvPrPSZXStfRGy8Wr4eS2rYhxPAikUR+xde9CM+HtMWwKTCTSp8qXrqA92Dj
 Cz9ix01scznuf78QCRDZp09im3lZys8ZQprmPgMxyEscN+CDL7n68wAhmTJq0uo3
 3NqIv7zBQ8wMChj0f0HjwZ0Wrj7BJAveY2Q0RterxdzT4vMKdtNkThX46ISaCoX/
 XQAAhZHemK6MvBJk+LKkqqMgrD+3FAzvY7O+SCyUBAMs4FK1myRJQihdLXHGfiBU
 DMDZE1jsE8qBaeUbz4LIuCy8fx2LhtVwVNwbNIBUZHdyfjxIXnQT/8Cnrgklwy2i
 tnYekhAsHDQY+QDkrvJpc4E1vUtiXwSDI5ErcnWdSzctEOyVeUg7OuuGD4riCd1c
 emdJmtASM1z9Ajqa1dytDxVaF6wjKlbhQgnKamuex5JLGCK6makk8ZoB+DBfKYHD
 VoWummTu8ldf+Dp4ehBh7AbeF2vn4kLqcF1PLRsBO6ytJs4HIt8=
 =5O7h
 -----END PGP SIGNATURE-----

Merge tag 'bpf_res_spin_lock' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf relisient spinlock support from Alexei Starovoitov:
 "This patch set introduces Resilient Queued Spin Lock (or rqspinlock
  with res_spin_lock() and res_spin_unlock() APIs).

  This is a qspinlock variant which recovers the kernel from a stalled
  state when the lock acquisition path cannot make forward progress.
  This can occur when a lock acquisition attempt enters a deadlock
  situation (e.g. AA, or ABBA), or more generally, when the owner of the
  lock (which we’re trying to acquire) isn’t making forward progress.
  Deadlock detection is the main mechanism used to provide instant
  recovery, with the timeout mechanism acting as a final line of
  defense. Detection is triggered immediately when beginning the waiting
  loop of a lock slow path.

  Additionally, BPF programs attached to different parts of the kernel
  can introduce new control flow into the kernel, which increases the
  likelihood of deadlocks in code not written to handle reentrancy.
  There have been multiple syzbot reports surfacing deadlocks in
  internal kernel code due to the diverse ways in which BPF programs can
  be attached to different parts of the kernel. By switching the BPF
  subsystem’s lock usage to rqspinlock, all of these issues are
  mitigated at runtime.

  This spin lock implementation allows BPF maps to become safer and
  remove mechanisms that have fallen short in assuring safety when
  nesting programs in arbitrary ways in the same context or across
  different contexts.

  We run benchmarks that stress locking scalability and perform
  comparison against the baseline (qspinlock). For the rqspinlock case,
  we replace the default qspinlock with it in the kernel, such that all
  spin locks in the kernel use the rqspinlock slow path. As such,
  benchmarks that stress kernel spin locks end up exercising rqspinlock.

  More details in the cover letter in commit 6ffb9017e9 ("Merge branch
  'resilient-queued-spin-lock'")"

* tag 'bpf_res_spin_lock' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (24 commits)
  selftests/bpf: Add tests for rqspinlock
  bpf: Maintain FIFO property for rqspinlock unlock
  bpf: Implement verifier support for rqspinlock
  bpf: Introduce rqspinlock kfuncs
  bpf: Convert lpm_trie.c to rqspinlock
  bpf: Convert percpu_freelist.c to rqspinlock
  bpf: Convert hashtab.c to rqspinlock
  rqspinlock: Add locktorture support
  rqspinlock: Add entry to Makefile, MAINTAINERS
  rqspinlock: Add macros for rqspinlock usage
  rqspinlock: Add basic support for CONFIG_PARAVIRT
  rqspinlock: Add a test-and-set fallback
  rqspinlock: Add deadlock detection and recovery
  rqspinlock: Protect waiters in trylock fallback from stalls
  rqspinlock: Protect waiters in queue from stalls
  rqspinlock: Protect pending bit owners from stalls
  rqspinlock: Hardcode cond_acquire loops for arm64
  rqspinlock: Add support for timeouts
  rqspinlock: Drop PV and virtualization support
  rqspinlock: Add rqspinlock.h header
  ...
2025-03-30 13:06:27 -07:00
Linus Torvalds
fa593d0f96 bpf-next-6.15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmfi6ZAACgkQ6rmadz2v
 bTpLOg/+J7xUddPMhlpFAUlifQEadE5hmw6v1tXpM3zyKHzUWJiv/qsx3j8/ckgD
 D+d4P8bqIbI9SSuIS4oZ0+D9pr/g7GYztnoYZmPiYJ7v2AijPuof5dsagFQE8E2y
 rhfbt9KHTMzzkdkTvaAZaITS/HWAoJ2YVRB6gfLex2ghcXYHcgmtKRZniQrbBiFZ
 MIXBN8Rg6HP+pUdIVllSXFcQCb3XIgjPONRAos4hr5tIm+3Ku7Jvkgk2H/9vUcoF
 bdXAcg8xygyH7eY+1l3e7nEPQlG0jUZEsL+tq+vpdoLRLqlIpAUYmwUvqcmq4dPS
 QGFjiUcpDbXlxsUFpzjXHIFto7fXCfND7HEICQPwAncdflIIfYaATSQUfkEexn0a
 wBCFlAChrEzAmg2vFl4EeEr0fdSe/3jswrgKx0m6ctKieMjgloBUeeH4fXOpfkhS
 9tvhuduVFuronlebM8ew4w9T/mBgbyxkE5KkvP4hNeB3ni3N0K6Mary5/u2HyN1e
 lqTlnZxRA4p6lrvxce/mDrR4VSwlKLcSeQVjxAL1afD5KRkuZJnUv7bUhS361vkG
 IjNrQX30EisDAz+X7tMn3ndBf9vVatwFT4+c3yaxlQRor1WofhDfT88HPiyB4QqQ
 Kdx2EHgbQxJp4vkzhp4/OXlTfkihsMEn8egzZuphdPEQ9Y+Jdwg=
 =aN/V
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:
 "For this merge window we're splitting BPF pull request into three for
  higher visibility: main changes, res_spin_lock, try_alloc_pages.

  These are the main BPF changes:

   - Add DFA-based live registers analysis to improve verification of
     programs with loops (Eduard Zingerman)

   - Introduce load_acquire and store_release BPF instructions and add
     x86, arm64 JIT support (Peilin Ye)

   - Fix loop detection logic in the verifier (Eduard Zingerman)

   - Drop unnecesary lock in bpf_map_inc_not_zero() (Eric Dumazet)

   - Add kfunc for populating cpumask bits (Emil Tsalapatis)

   - Convert various shell based tests to selftests/bpf/test_progs
     format (Bastien Curutchet)

   - Allow passing referenced kptrs into struct_ops callbacks (Amery
     Hung)

   - Add a flag to LSM bpf hook to facilitate bpf program signing
     (Blaise Boscaccy)

   - Track arena arguments in kfuncs (Ihor Solodrai)

   - Add copy_remote_vm_str() helper for reading strings from remote VM
     and bpf_copy_from_user_task_str() kfunc (Jordan Rome)

   - Add support for timed may_goto instruction (Kumar Kartikeya
     Dwivedi)

   - Allow bpf_get_netns_cookie() int cgroup_skb programs (Mahe Tardy)

   - Reduce bpf_cgrp_storage_busy false positives when accessing cgroup
     local storage (Martin KaFai Lau)

   - Introduce bpf_dynptr_copy() kfunc (Mykyta Yatsenko)

   - Allow retrieving BTF data with BTF token (Mykyta Yatsenko)

   - Add BPF kfuncs to set and get xattrs with 'security.bpf.' prefix
     (Song Liu)

   - Reject attaching programs to noreturn functions (Yafang Shao)

   - Introduce pre-order traversal of cgroup bpf programs (Yonghong
     Song)"

* tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (186 commits)
  selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid
  bpf: Fix out-of-bounds read in check_atomic_load/store()
  libbpf: Add namespace for errstr making it libbpf_errstr
  bpf: Add struct_ops context information to struct bpf_prog_aux
  selftests/bpf: Sanitize pointer prior fclose()
  selftests/bpf: Migrate test_xdp_vlan.sh into test_progs
  selftests/bpf: test_xdp_vlan: Rename BPF sections
  bpf: clarify a misleading verifier error message
  selftests/bpf: Add selftest for attaching fexit to __noreturn functions
  bpf: Reject attaching fexit/fmod_ret to __noreturn functions
  bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage
  bpf: Make perf_event_read_output accessible in all program types.
  bpftool: Using the right format specifiers
  bpftool: Add -Wformat-signedness flag to detect format errors
  selftests/bpf: Test freplace from user namespace
  libbpf: Pass BPF token from find_prog_btf_id to BPF_BTF_GET_FD_BY_ID
  bpf: Return prog btf_id without capable check
  bpf: BPF token support for BPF_BTF_GET_FD_BY_ID
  bpf, x86: Fix objtool warning for timed may_goto
  bpf: Check map->record at the beginning of check_and_free_fields()
  ...
2025-03-30 12:43:03 -07:00
Linus Torvalds
f90f2145b2 s390 updates for 6.15 merge window
- Add sorting of mcount locations at build time
 
 - Rework uaccess functions with C exception handling to shorten inline
   assembly size and enable full inlining. This yields near-optimal code
   for small constant copies with a ~40kb kernel size increase
 
 - Add support for a configurable STRICT_MM_TYPECHECKS which allows to
   generate better code, but also allows to have type checking for
   debug builds
 
 - Optimize get_lowcore() for common callers with alternatives that
   nearly revert to the pre-relocated lowcore code, while also slightly
   reducing syscall entry and exit time
 
 - Convert MACHINE_HAS_* checks for single facility tests into cpu_has_*
   style macros that call test_facility(), and for features with additional
   conditions, add a new ALT_TYPE_FEATURE alternative to provide a static
   branch via alternative patching. Also, move machine feature detection
   to the decompressor for early patching and add debugging functionality
   to easily show which alternatives are patched
 
 - Add exception table support to early boot / startup code to get rid
   of the open coded exception handling
 
 - Use asm_inline for all inline assemblies with EX_TABLE or ALTERNATIVE
   to ensure correct inlining and unrolling decisions
 
 - Remove 2k page table leftovers now that s390 has been switched to
   always allocate 4k page tables
 
 - Split kfence pool into 4k mappings in arch_kfence_init_pool() and
   remove the architecture-specific kfence_split_mapping()
 
 - Use READ_ONCE_NOCHECK() in regs_get_kernel_stack_nth() to silence
   spurious KASAN warnings from opportunistic ftrace argument tracing
 
 - Force __atomic_add_const() variants on s390 to always return void,
   ensuring compile errors for improper usage
 
 - Remove s390's ioremap_wt() and pgprot_writethrough() due to mismatched
   semantics and lack of known users, relying on asm-generic fallbacks
 
 - Signal eventfd in vfio-ap to notify userspace when the guest AP
   configuration changes, including during mdev removal
 
 - Convert mdev_types from an array to a pointer in vfio-ccw and vfio-ap
   drivers to avoid fake flex array confusion
 
 - Cleanup trap code
 
 - Remove references to the outdated linux390@de.ibm.com address
 
 - Other various small fixes and improvements all over the code
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmfmuPwACgkQjYWKoQLX
 FBgTDAgAjKmZ5OYjACRfYepTvKk9SDqa2CBlQZ+BhbAXEVIrxKnv8OkImAXoWNsM
 mFxiCxAHWdcD+nqTrxFsXhkNLsndijlwnj/IqZgvy6R/3yNtBlAYRPLujOmVrsQB
 dWB8Dl38p63Ip1JfAqyabiAOUjfhrclRcM5FX5tgciXA6N/vhY3OM6k0+k7wN4Nj
 Dei/rCrnYRXTrFQgtM4w8JTIrwdnXjeKvaTYCflh4Q5ISJ7TceSF7cqq8HOs5hhK
 o2ciaoTdx212522CIsxeN3Ls3jrn8bCOCoOeSCysc5RL84grAuFnmjSajo1LFide
 S/TQtHXYy78Wuei9xvHi561ogiv/ww==
 =Kxgc
 -----END PGP SIGNATURE-----

Merge tag 's390-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Add sorting of mcount locations at build time

 - Rework uaccess functions with C exception handling to shorten inline
   assembly size and enable full inlining. This yields near-optimal code
   for small constant copies with a ~40kb kernel size increase

 - Add support for a configurable STRICT_MM_TYPECHECKS which allows to
   generate better code, but also allows to have type checking for debug
   builds

 - Optimize get_lowcore() for common callers with alternatives that
   nearly revert to the pre-relocated lowcore code, while also slightly
   reducing syscall entry and exit time

 - Convert MACHINE_HAS_* checks for single facility tests into cpu_has_*
   style macros that call test_facility(), and for features with
   additional conditions, add a new ALT_TYPE_FEATURE alternative to
   provide a static branch via alternative patching. Also, move machine
   feature detection to the decompressor for early patching and add
   debugging functionality to easily show which alternatives are patched

 - Add exception table support to early boot / startup code to get rid
   of the open coded exception handling

 - Use asm_inline for all inline assemblies with EX_TABLE or ALTERNATIVE
   to ensure correct inlining and unrolling decisions

 - Remove 2k page table leftovers now that s390 has been switched to
   always allocate 4k page tables

 - Split kfence pool into 4k mappings in arch_kfence_init_pool() and
   remove the architecture-specific kfence_split_mapping()

 - Use READ_ONCE_NOCHECK() in regs_get_kernel_stack_nth() to silence
   spurious KASAN warnings from opportunistic ftrace argument tracing

 - Force __atomic_add_const() variants on s390 to always return void,
   ensuring compile errors for improper usage

 - Remove s390's ioremap_wt() and pgprot_writethrough() due to
   mismatched semantics and lack of known users, relying on asm-generic
   fallbacks

 - Signal eventfd in vfio-ap to notify userspace when the guest AP
   configuration changes, including during mdev removal

 - Convert mdev_types from an array to a pointer in vfio-ccw and vfio-ap
   drivers to avoid fake flex array confusion

 - Cleanup trap code

 - Remove references to the outdated linux390@de.ibm.com address

 - Other various small fixes and improvements all over the code

* tag 's390-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (78 commits)
  s390: Use inline qualifier for all EX_TABLE and ALTERNATIVE inline assemblies
  s390/kfence: Split kfence pool into 4k mappings in arch_kfence_init_pool()
  s390/ptrace: Avoid KASAN false positives in regs_get_kernel_stack_nth()
  s390/boot: Ignore vmlinux.map
  s390/sysctl: Remove "vm/allocate_pgste" sysctl
  s390: Remove 2k vs 4k page table leftovers
  s390/tlb: Use mm_has_pgste() instead of mm_alloc_pgste()
  s390/lowcore: Use lghi instead llilh to clear register
  s390/syscall: Merge __do_syscall() and do_syscall()
  s390/spinlock: Implement SPINLOCK_LOCKVAL with inline assembly
  s390/smp: Implement raw_smp_processor_id() with inline assembly
  s390/current: Implement current with inline assembly
  s390/lowcore: Use inline qualifier for get_lowcore() inline assembly
  s390: Move s390 sysctls into their own file under arch/s390
  s390/syscall: Simplify syscall_get_arguments()
  s390/vfio-ap: Notify userspace that guest's AP config changed when mdev removed
  s390: Remove ioremap_wt() and pgprot_writethrough()
  s390/mm: Add configurable STRICT_MM_TYPECHECKS
  s390/mm: Convert pgste_val() into function
  s390/mm: Convert pgprot_val() into function
  ...
2025-03-29 11:59:43 -07:00
Linus Torvalds
0ccff074d6 fwctl first pull request
fwctl is a new subsystem intended to bring some common rules and order to
 the growing pattern of exposing a secure FW interface directly to
 userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
 exposing a device for datapath operations fwctl is focused on debugging,
 configuration and provisioning of the device. It will not have the
 necessary features like interrupt delivery to support a datapath.
 
 This concept is similar to the long standing practice in the "HW" RAID
 space of having a device specific misc device to manage the RAID
 controller FW. fwctl generalizes this notion of a companion debug and
 management interface that goes along with a dataplane implemented in an
 appropriate subsystem.
 
 There have been three LWN articles written discussing various aspects of
 this:
 
  https://lwn.net/Articles/955001/
  https://lwn.net/Articles/969383/
  https://lwn.net/Articles/990802/
 
 This pull requests includes three drivers to launch the subsystem:
 
  - CXL provides a vendor scheme for executing commands and a way to learn
    the 'command effects' (ie the security properties) of such
    commands. The fwctl driver allows access to these mechanism within the
    fwctl security model
 
  - mlx5 is family of networking products, the driver supports all current
    Mellanox HW still receiving FW feature updates. This includes RDMA
    multiprotocol NICs like ConnectX and the Bluefield family of Smart
    NICs.
 
  - AMD/Pensando Distributed Services card is a multi protocol Smart NIC
    with a multi PCI function design. fwctl works on the management PCI
    function following a 'command effects' model similar to CXL.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ939zQAKCRCFwuHvBreF
 YdOoAQCJq59/UC7lXU+sOsR6LISaDVTT5jAweBo0o036P9+DNAEA1iQdZ/GK2yCJ
 Ub33Xo9L+hzIpIbCouI3BtCXqymybAg=
 =f5YG
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull fwctl subsystem from Jason Gunthorpe:
 "fwctl is a new subsystem intended to bring some common rules and order
  to the growing pattern of exposing a secure FW interface directly to
  userspace.

  Unlike existing places like RDMA/DRM/VFIO/uacce that are exposing a
  device for datapath operations fwctl is focused on debugging,
  configuration and provisioning of the device. It will not have the
  necessary features like interrupt delivery to support a datapath.

  This concept is similar to the long standing practice in the "HW" RAID
  space of having a device specific misc device to manage the RAID
  controller FW. fwctl generalizes this notion of a companion debug and
  management interface that goes along with a dataplane implemented in
  an appropriate subsystem.

  There have been three LWN articles written discussing various aspects
  of this:

	https://lwn.net/Articles/955001/
	https://lwn.net/Articles/969383/
	https://lwn.net/Articles/990802/

  This includes three drivers to launch the subsystem:

   - CXL provides a vendor scheme for executing commands and a way to
     learn the 'command effects' (ie the security properties) of such
     commands. The fwctl driver allows access to these mechanism within
     the fwctl security model

   - mlx5 is family of networking products, the driver supports all
     current Mellanox HW still receiving FW feature updates. This
     includes RDMA multiprotocol NICs like ConnectX and the Bluefield
     family of Smart NICs.

   - AMD/Pensando Distributed Services card is a multi protocol Smart
     NIC with a multi PCI function design. fwctl works on the management
     PCI function following a 'command effects' model similar to CXL"

* tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (30 commits)
  pds_fwctl: add Documentation entries
  pds_fwctl: add rpc and query support
  pds_fwctl: initial driver framework
  pds_core: add new fwctl auxiliary_device
  pds_core: specify auxiliary_device to be created
  pds_core: make pdsc_auxbus_dev_del() void
  cxl: Fixup kdoc issues for include/cxl/features.h
  fwctl/cxl: Add documentation to FWCTL CXL
  cxl/test: Add Set Feature support to cxl_test
  cxl/test: Add Get Feature support to cxl_test
  cxl: Add support to handle user feature commands for set feature
  cxl: Add support to handle user feature commands for get feature
  cxl: Add support for fwctl RPC command to enable CXL feature commands
  cxl: Move cxl feature command structs to user header
  cxl: Add FWCTL support to CXL
  mlx5: Create an auxiliary device for fwctl_mlx5
  fwctl/mlx5: Support for communicating with mlx5 fw
  fwctl: Add documentation
  fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
  taint: Add TAINT_FWCTL
  ...
2025-03-29 10:45:20 -07:00
Linus Torvalds
e5e0e6bebe This update includes the following changes:
API:
 
 - Remove legacy compression interface.
 - Improve scatterwalk API.
 - Add request chaining to ahash and acomp.
 - Add virtual address support to ahash and acomp.
 - Add folio support to acomp.
 - Remove NULL dst support from acomp.
 
 Algorithms:
 
 - Library options are fuly hidden (selected by kernel users only).
 - Add Kerberos5 algorithms.
 - Add VAES-based ctr(aes) on x86.
 - Ensure LZO respects output buffer length on compression.
 - Remove obsolete SIMD fallback code path from arm/ghash-ce.
 
 Drivers:
 
 - Add support for PCI device 0x1134 in ccp.
 - Add support for rk3588's standalone TRNG in rockchip.
 - Add Inside Secure SafeXcel EIP-93 crypto engine support in eip93.
 - Fix bugs in tegra uncovered by multi-threaded self-test.
 - Fix corner cases in hisilicon/sec2.
 
 Others:
 
 - Add SG_MITER_LOCAL to sg miter.
 - Convert ubifs, hibernate and xfrm_ipcomp from legacy API to acomp.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmfiQ9kACgkQxycdCkmx
 i6fFZg/9GWjC1FLEV66vNlYAIzFGwzwWdFGyQzXyP235Cphhm4qt9gx7P91N6Lvc
 pplVjNEeZHoP8lMw+AIeGc2cRhIwsvn8C+HA3tCBOoC1qSe8T9t7KHAgiRGd/0iz
 UrzVBFLYlR9i4tc0T5peyQwSctv8DfjWzduTmI3Ts8i7OQcfeVVgj3sGfWam7kjF
 1GJWIQH7aPzT8cwFtk8gAK1insuPPZelT1Ppl9kUeZe0XUibrP7Gb5G9simxXAyi
 B+nLCaJYS6Hc1f47cfR/qyZSeYQN35KTVrEoKb1pTYXfEtMv6W9fIvQVLJRYsqpH
 RUBdDJUseE+WckR6glX9USrh+Fv9d+HfsTXh1fhpApKU5sQJ7pDbUm4ge8p6htNG
 MIszbJPdqajYveRLuPUjFlUXaqomos8eT6BZA+RLHm1cogzEOm+5bjspbfRNAVPj
 x9KiDu5lXNiFj02v/MkLKUe3bnGIyVQnZNi7Rn0Rpxjv95tIjVpksZWMPJarxUC6
 5zdyM2I5X0Z9+teBpbfWyqfzSbAs/KpzV8S/xNvWDUT6NlpYGBeNXrCDTXcwJLAh
 PRW0w1EJUwsZbPi8GEh5jNzo/YK1cGsUKrihKv7YgqSSopMLI8e/WVr8nKZMVDFA
 O+6F6ec5lR7KsOIMGUqrBGFU1ccAeaLLvLK3H5J8//gMMg82Uik=
 =aQNt
 -----END PGP SIGNATURE-----

Merge tag 'v6.15-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:
   - Remove legacy compression interface
   - Improve scatterwalk API
   - Add request chaining to ahash and acomp
   - Add virtual address support to ahash and acomp
   - Add folio support to acomp
   - Remove NULL dst support from acomp

  Algorithms:
   - Library options are fuly hidden (selected by kernel users only)
   - Add Kerberos5 algorithms
   - Add VAES-based ctr(aes) on x86
   - Ensure LZO respects output buffer length on compression
   - Remove obsolete SIMD fallback code path from arm/ghash-ce

  Drivers:
   - Add support for PCI device 0x1134 in ccp
   - Add support for rk3588's standalone TRNG in rockchip
   - Add Inside Secure SafeXcel EIP-93 crypto engine support in eip93
   - Fix bugs in tegra uncovered by multi-threaded self-test
   - Fix corner cases in hisilicon/sec2

  Others:
   - Add SG_MITER_LOCAL to sg miter
   - Convert ubifs, hibernate and xfrm_ipcomp from legacy API to acomp"

* tag 'v6.15-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (187 commits)
  crypto: testmgr - Add multibuffer acomp testing
  crypto: acomp - Fix synchronous acomp chaining fallback
  crypto: testmgr - Add multibuffer hash testing
  crypto: hash - Fix synchronous ahash chaining fallback
  crypto: arm/ghash-ce - Remove SIMD fallback code path
  crypto: essiv - Replace memcpy() + NUL-termination with strscpy()
  crypto: api - Call crypto_alg_put in crypto_unregister_alg
  crypto: scompress - Fix incorrect stream freeing
  crypto: lib/chacha - remove unused arch-specific init support
  crypto: remove obsolete 'comp' compression API
  crypto: compress_null - drop obsolete 'comp' implementation
  crypto: cavium/zip - drop obsolete 'comp' implementation
  crypto: zstd - drop obsolete 'comp' implementation
  crypto: lzo - drop obsolete 'comp' implementation
  crypto: lzo-rle - drop obsolete 'comp' implementation
  crypto: lz4hc - drop obsolete 'comp' implementation
  crypto: lz4 - drop obsolete 'comp' implementation
  crypto: deflate - drop obsolete 'comp' implementation
  crypto: 842 - drop obsolete 'comp' implementation
  crypto: nx - Migrate to scomp API
  ...
2025-03-29 10:01:55 -07:00
Paul E. McKenney
1dc1e0b9d6 srcu: Make FORCE_NEED_SRCU_NMI_SAFE depend on RCU_EXPERT
The FORCE_NEED_SRCU_NMI_SAFE is useful only for those wishing to test
the SRCU code paths that accommodate architectures that do not have
NMI-safe per-CPU operations, that is, those architectures that do not
select the ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.  As such, this
is a specialized Kconfig option that is not intended for casual users.

This commit therefore hides it behind the RCU_EXPERT Kconfig option.
Given that this new FORCE_NEED_SRCU_NMI_SAFE Kconfig option has no effect
unless the ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option is also selected,
it also depends on this Kconfig option.

[ paulmck: Apply Geert Uytterhoeven feedback. ]

[ boqun: Add the "Fixes" tag. ]

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/all/CAMuHMdX6dy9_tmpLkpcnGzxyRbe6qSWYukcPp=H1GzZdyd3qBQ@mail.gmail.com/
Fixes: 536e8b9b80 ("srcu: Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-28 21:19:17 -07:00
Thorsten Blum
afdbe49276 kdb: Remove optional size arguments from strscpy() calls
If the destination buffer has a fixed length, strscpy() automatically
determines the size of the destination buffer using sizeof() if the
argument is omitted. This makes the explicit sizeof() unnecessary.

Furthermore, CMD_BUFLEN is equal to sizeof(kdb_prompt_str) and can also
be removed. Remove them to shorten and simplify the code.

No functional changes intended.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20250319163341.2123-2-thorsten.blum@linux.dev
Signed-off-by: Daniel Thompson <daniel@riscstar.com>
2025-03-28 21:28:28 +00:00
Nir Lichtman
a30d4ff819 kdb: remove usage of static environment buffer
Problem: The set environment variable logic uses a static "heap" like
buffer to store the values of the variables, and they are never freed,
on top of that this is redundant since the kernel supplies allocation
facilities which are even used also in this file.

Solution: Remove the weird static buffer logic and use kmalloc instead,
call kfree when overriding an existing variable.

Signed-off-by: Nir Lichtman <nir@lichtman.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20250204054741.GB1219827@lichtman.org
Signed-off-by: Daniel Thompson <daniel@riscstar.com>
2025-03-28 21:10:53 +00:00
Linus Torvalds
78fb88eca6 Capabilities update for 6.15
This branch contains one patch:
 
     capability: Remove unused has_capability
 
 This removes a helper function whose last user (smack) stopped using
 it in 2018.
 
 This has been in linux-next for most of the the last cycle with no
 apparent issues.  It is available at:
 
 git@git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git #caps-20250327
 
 on top of commit 2014c95afe (tag: v6.14-rc1)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEqb0/8XByttt4D8+UNXDaFycKziQFAmfmHIcACgkQNXDaFycK
 ziRQFggAl1fQCS/LNhFY7SAmkv8S6yJemYt8hNvRjlOwwMB2aZbFFfZeVGdO73od
 mkQAoZA0ppPT8aXH2pbfcCsKI2UFQQ/cm6Jpl4wCEZmKr2ZTxcH5zCRbUQeiR/7t
 QvcWVXCC47BnI9ab+OiCCaCw5HtAuweax6IGjFTregjUAEcBk8aoF9PJMvlT0e6j
 WQqWegPVS1GSiQqWGxPzdfnUOSkGuak9setg53pCTQE6UjMIK7uVxfSEJpe3pKvH
 ZvRvkif5UJrdUAr9MKMvf/gkKEl5xpAoTRdoaiWdzZ/arZ77HD32qoi/o3s4P+7a
 7ZHvAc+Xjm0AIO/4JrjzFw0zH/svSg==
 =rbE0
 -----END PGP SIGNATURE-----

Merge tag 'caps-pr-20250327' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux

Pull capabilities update from Serge Hallyn:
 "This contains just one patch that removes a helper function whose last
  user (smack) stopped using it in 2018"

* tag 'caps-pr-20250327' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux:
  capability: Remove unused has_capability
2025-03-28 12:09:33 -07:00
Linus Torvalds
112e43e9fd Revert "Merge tag 'irq-msi-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"
This reverts commit 36f5f026df, reversing
changes made to 43a7eec035.

Thomas says:
 "I just noticed that for some incomprehensible reason, probably sheer
  incompetemce when trying to utilize b4, I managed to merge an outdated
  _and_ buggy version of that series.

  Can you please revert that merge completely?"

Done.

Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-28 11:22:54 -07:00
Steven Rostedt
028a58ec15 tracing: Use _text and the kernel offset in last_boot_info
Instead of using kaslr_offset() just record the location of "_text". This
makes it possible for user space to use either the System.map or
/proc/kallsyms as what to map all addresses to functions with.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250326220304.38dbedcd@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:56 -04:00
Masami Hiramatsu (Google)
35a380ddbc tracing: Show last module text symbols in the stacktrace
Since the previous boot trace buffer can include module text address in
the stacktrace. As same as the kernel text address, convert the module
text address using the module address information.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/174282689201.356346.17647540360450727687.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:56 -04:00
Jiapeng Chong
de48d7fff7 ring-buffer: Remove the unused variable bmeta
Variable bmeta is not effectively used, so delete it.

kernel/trace/ring_buffer.c:1952:27: warning: variable ‘bmeta’ set but not used.

Link: https://lore.kernel.org/20250317015524.3902-1-jiapeng.chong@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=19524
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:56 -04:00
Masami Hiramatsu (Google)
486fbcb380 tracing: Skip update_last_data() if cleared and remove active check for save_mod()
If the last boot data is already cleared, there is no reason to update it
again. Skip if the TRACE_ARRAY_FL_LAST_BOOT is cleared.
Also, for calling save_mod() when module loading, we don't need to check
the trace is active or not because any module address can be on the
stacktrace.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/174165660328.1173316.15529357882704817499.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:56 -04:00
Steven Rostedt
5dbeb56bb9 tracing: Initialize scratch_size to zero to prevent UB
In allocate_trace_buffer() the following code:

  buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0,
				      tr->range_addr_start,
				      tr->range_addr_size,
				      struct_size(tscratch, entries, 128));

  tscratch = ring_buffer_meta_scratch(buf->buffer, &scratch_size);
  setup_trace_scratch(tr, tscratch, scratch_size);

Has undefined behavior if ring_buffer_alloc_range() fails because
"scratch_size" is not initialize. If the allocation fails, then
buf->buffer will be NULL. The ring_buffer_meta_scratch() will return
NULL immediately if it is passed a NULL buffer and it will not update
scratch_size. Then setup_trace_scratch() will return immediately if
tscratch is NULL.

Although there's no real issue here, but it is considered undefined
behavior to pass an uninitialized variable to a function as input, and
UBSan may complain about it.

Just initialize scratch_size to zero to make the code defined behavior and
a little more robust.

Link: https://lore.kernel.org/all/44c5deaa-b094-4852-90f9-52f3fb10e67a@stanley.mountain/

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:56 -04:00
Masami Hiramatsu (Google)
f00c9201f9 tracing: Fix a compilation error without CONFIG_MODULES
There are some code which depends on CONFIG_MODULES. #ifdef
to enclose it.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/174230515367.2909896.8132122175220657625.stgit@mhiramat.tok.corp.google.com
Fixes: dca91c1c5468 ("tracing: Have persistent trace instances save module addresses")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:56 -04:00
Masami Hiramatsu (Google)
fb6d03238e tracing: Freeable reserved ring buffer
Make the ring buffer on reserved memory to be freeable. This allows us
to free the trace instance on the reserved memory without changing
cmdline and rebooting. Even if we can not change the kernel cmdline
for security reason, we can release the reserved memory for the ring
buffer as free (available) memory.

For example, boot kernel with reserved memory;
"reserve_mem=20M:2M:trace trace_instance=boot_mapped^traceoff@trace"

~ # free
              total        used        free      shared  buff/cache   available
Mem:        1995548       50544     1927568       14964       17436     1911480
Swap:             0           0           0
~ # rmdir /sys/kernel/tracing/instances/boot_mapped/
[   23.704023] Freeing reserve_mem:trace memory: 20476K
~ # free
              total        used        free      shared  buff/cache   available
Mem:        2016024       41844     1956740       14968       17440     1940572
Swap:             0           0           0

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Link: https://lore.kernel.org/173989134814.230693.18199312930337815629.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:28 -04:00
Steven Rostedt
5f3719f697 tracing: Update modules to persistent instances when loaded
When a module is loaded and a persistent buffer is actively tracing, add
it to the list of modules in the persistent memory.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164609.469844721@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:27 -04:00
Steven Rostedt
1bd25a6f71 tracing: Show module names and addresses of last boot
Add the last boot module's names and addresses to the last_boot_info file.
This only shows the module information from a previous boot. If the buffer
is started and is recording the current boot, this file still will only
show "current".

  ~# cat instances/boot_mapped/last_boot_info
  10c00000		[kernel]
  ffffffffc00ca000	usb_serial_simple
  ffffffffc00ae000	usbserial
  ffffffffc008b000	bfq

  ~# echo function > instances/boot_mapped/current_tracer
  ~# cat instances/boot_mapped/last_boot_info
  # Current

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164609.299186021@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:27 -04:00
Steven Rostedt
fd39e48fe8 tracing: Have persistent trace instances save module addresses
For trace instances that are mapped to persistent memory, have them use
the scratch area to save the currently loaded modules. This will allow
where the modules have been loaded on the next boot so that their
addresses can be deciphered by using where they were loaded previously.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164609.129741650@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:27 -04:00
Steven Rostedt
966b7d0e52 module: Add module_for_each_mod() function
The tracing system needs a way to save all the currently loaded modules
and their addresses into persistent memory so that it can evaluate the
addresses on a reboot from a crash. When the persistent memory trace
starts, it will load the module addresses and names into the persistent
memory. To do so, it will call the module_for_each_mod() function and pass
it a function and data structure to get called on each loaded module. Then
it can record the memory.

This only implements that function.

Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: linux-modules@vger.kernel.org
Link: https://lore.kernel.org/20250305164608.962615966@goodmis.org
Acked-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:27 -04:00
Steven Rostedt
b65334825f tracing: Have persistent trace instances save KASLR offset
There's no reason to save the KASLR offset for the ring buffer itself.
That is used by the tracer. Now that the tracer has a way to save data in
the persistent memory of the ring buffer, have the tracing infrastructure
take care of the saving of the KASLR offset.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164608.792722274@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:27 -04:00
Steven Rostedt
4af0a9c518 ring-buffer: Add ring_buffer_meta_scratch()
Now that there's one meta data at the start of the persistent memory used by
the ring buffer, allow the caller to request some memory right after that
data that it can use as its own persistent memory.

Also fix some white space issues with ring_buffer_alloc().

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164608.619631731@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:27 -04:00
Steven Rostedt
4009cc31e7 ring-buffer: Add buffer meta data for persistent ring buffer
Instead of just having a meta data at the first page of each sub buffer
that has duplicate data, add a new meta page to the entire block of memory
that holds the duplicate data and remove it from the sub buffer meta data.

This will open up the extra memory in this first page to be used by the
tracer for its own persistent data.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164608.446351513@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:26 -04:00
Steven Rostedt
bcba8d4dbe ring-buffer: Use kaslr address instead of text delta
Instead of saving off the text and data pointers and using them to compare
with the current boot's text and data pointers, just save off the KASLR
offset. Then that can be used to figure out how to read the previous boots
buffer.

The last_boot_info will now show this offset, but only if it is for a
previous boot:

  ~# cat instances/boot_mapped/last_boot_info
  39000000	[kernel]

  ~# echo function > instances/boot_mapped/current_tracer
  ~# cat instances/boot_mapped/last_boot_info
  # Current

If the KASLR offset saved is for the current boot, the last_boot_info will
show the value of "current".

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164608.274956504@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:26 -04:00