2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

47836 Commits

Author SHA1 Message Date
Linus Torvalds
59c9ab3e8c tracing updates for v6.15
- Fix read out of bounds bug in tracing_splice_read_pipe()
 
   The size of the sub page being read can now be greater than a page. But
   the buffer used in tracing_splice_read_pipe() only allocates a page size.
   The data copied to the buffer is the amount in sub buffer which can
   overflow the buffer. Use min((size_t)trace_seq_used(&iter->seq), PAGE_SIZE)
   to limit the amount copied to the buffer to a max of PAGE_SIZE.
 
 - Fix the test for NULL from "!filter_hash" to "!*filter_hash"
 
   The add_next_hash() function checked for NULL at the wrong pointer level.
 
 - Do not use the array in trace_adjust_address() if there are no elements
 
   The trace_adjust_address() finds the offset of a module that was stored in
   the persistent buffer when reading the previous boot buffer to see if the
   address belongs to a module that was loaded in the previous boot. An array
   is created that matches currently loaded modules with previously loaded
   modules. The trace_adjust_address() uses that array to find the new offset
   of the address that's in the previous buffer.  But if no module was
   loaded, it ends up reading the last element in an array that was never
   allocated. Check if nr_entries is zero and exit out early if it is.
 
 - Remove nested lock of trace_event_sem in print_event_fields()
 
   The print_event_fields() function iterates over the ftrace_events list and
   requires the trace_event_sem semaphore held for read. But this function is
   always called with that semaphore held for read. Remove the taking of the
   semaphore and replace it with lockdep_assert_held_read(&trace_event_sem);
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaBeXEBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qvXFAP9JNgi0+ainOppsEP6u9KH+sttxKl76
 14EslzuPqbzgOwD/Sm00a8n7m858iv6UN3AAW9AsX2QK5yG0Wbvterm8FgI=
 =s9qk
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix read out of bounds bug in tracing_splice_read_pipe()

   The size of the sub page being read can now be greater than a page.
   But the buffer used in tracing_splice_read_pipe() only allocates a
   page size. The data copied to the buffer is the amount in sub buffer
   which can overflow the buffer.

   Use min((size_t)trace_seq_used(&iter->seq), PAGE_SIZE) to limit the
   amount copied to the buffer to a max of PAGE_SIZE.

 - Fix the test for NULL from "!filter_hash" to "!*filter_hash"

   The add_next_hash() function checked for NULL at the wrong pointer
   level.

 - Do not use the array in trace_adjust_address() if there are no
   elements

   The trace_adjust_address() finds the offset of a module that was
   stored in the persistent buffer when reading the previous boot buffer
   to see if the address belongs to a module that was loaded in the
   previous boot. An array is created that matches currently loaded
   modules with previously loaded modules. The trace_adjust_address()
   uses that array to find the new offset of the address that's in the
   previous buffer. But if no module was loaded, it ends up reading the
   last element in an array that was never allocated.

   Check if nr_entries is zero and exit out early if it is.

 - Remove nested lock of trace_event_sem in print_event_fields()

   The print_event_fields() function iterates over the ftrace_events
   list and requires the trace_event_sem semaphore held for read. But
   this function is always called with that semaphore held for read.

   Remove the taking of the semaphore and replace it with
   lockdep_assert_held_read(&trace_event_sem)

* tag 'trace-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Do not take trace_event_sem in print_event_fields()
  tracing: Fix trace_adjust_address() when there is no modules in scratch area
  ftrace: Fix NULL memory allocation check
  tracing: Fix oob write in trace_seq_to_buffer()
2025-05-04 10:15:42 -07:00
Linus Torvalds
5aac99c6b5 Two fixes:
- Prevent NULL pointer dereference in msi_domain_debug_show()
 
  - Fix crash in the qcom-mpm irqchip driver when configuring
    interrupts for non-wake GPIOs
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgXEhsRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g5kg//evpNs+ulOhUzr3bT+XbOO+hgYS+N/ySq
 JWG+ChJYZItYczPpnYaH/s++fq72ssIhbfgZNcdZiYc2LnTQ0ZXFN450UuB4jeKk
 2Gk7PoZJ3wc/glYG3EXmh7ciGSr1nrbYndiLCfV2x+tD33VsZV21WbPcVIFIzKsD
 hfRzMEmPE0cSQnFOciFm4aj+ZA/wMkkvdFU5FCRMfq2qdfkwQEamXLGSX6pZYiDT
 cAtr5ZiD3AXgY61tNXE0K6wZs3qYuNPJT/9OtdlDJn1VsK9iumczwzp8gAKX4dbH
 eogiaFqymnBXlD4Ah22TMml7dzywY6yjv8Mk/v8UgSbxTmPC2mYMoBGXlXwGHTxY
 ay2VEIZd0SN6+15p4YUgc2KQhsNB573KT6oR6lEgEB9LfIvlEpxAgMxkp5PDRNTJ
 JqkBMJFpdk0JcvUuT19XmRQjZ7cwrghW2wS8C5S7u+1tK+9r4EXy/mg7fGlBcmd2
 3yOaGAyBNdItcg1H1/WaJkScPHmoI68V9GW3wuIpjW4Bo8YhQCxStLMrUGEKxY1M
 INAkXvkiRcGQ9264aZJrDLIPdD60y9uyA+E+EZ3uP7hZn0YCpHH0CMugY9XtUpy9
 Wvs9i7Ztn11BxDcyCzk/yMdVwGGImeM7AUqyuyKgfHQJxei01o5/op2DvkrjWlZT
 V6URyY5pkwc=
 =Kq8w
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2025-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Ingo Molnar:

 - Prevent NULL pointer dereference in msi_domain_debug_show()

 - Fix crash in the qcom-mpm irqchip driver when configuring
   interrupts for non-wake GPIOs

* tag 'irq-urgent-2025-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/qcom-mpm: Prevent crash when trying to handle non-wake GPIOs
  genirq/msi: Prevent NULL pointer dereference in msi_domain_debug_show()
2025-05-04 07:58:53 -07:00
Steven Rostedt
0a8f11f856 tracing: Do not take trace_event_sem in print_event_fields()
On some paths in print_event_fields() it takes the trace_event_sem for
read, even though it should always be held when the function is called.

Remove the taking of that mutex and add a lockdep_assert_held_read() to
make sure the trace_event_sem is held when print_event_fields() is called.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250501224128.0b1f0571@batman.local.home
Fixes: 80a76994b2 ("tracing: Add "fields" option to show raw trace event fields")
Reported-by: syzbot+441582c1592938fccf09@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6813ff5e.050a0220.14dd7d.001b.GAE@google.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-01 22:44:52 -04:00
Steven Rostedt
1be8e54a1e tracing: Fix trace_adjust_address() when there is no modules in scratch area
The function trace_adjust_address() is used to map addresses of modules
stored in the persistent memory and are also loaded in the current boot to
return the current address for the module.

If there's only one module entry, it will simply use that, otherwise it
performs a bsearch of the entry array to find the modules to offset with.

The issue is if there are no modules in the array. The code does not
account for that and ends up referencing the first element in the array
which does not exist and causes a crash.

If nr_entries is zero, exit out early as if this was a core kernel
address.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250501151909.65910359@gandalf.local.home
Fixes: 35a380ddbc ("tracing: Show last module text symbols in the stacktrace")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-01 16:06:55 -04:00
Colin Ian King
3c1d9cfa84 ftrace: Fix NULL memory allocation check
The check for a failed memory location is incorrectly checking
the wrong level of pointer indirection by checking !filter_hash
rather than !*filter_hash.  Fix this.

Cc: asami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250422221335.89896-1-colin.i.king@gmail.com
Fixes: 0ae6b8ce20 ("ftrace: Fix accounting of subop hashes")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-01 15:46:19 -04:00
Jeongjun Park
f5178c41bb tracing: Fix oob write in trace_seq_to_buffer()
syzbot reported this bug:
==================================================================
BUG: KASAN: slab-out-of-bounds in trace_seq_to_buffer kernel/trace/trace.c:1830 [inline]
BUG: KASAN: slab-out-of-bounds in tracing_splice_read_pipe+0x6be/0xdd0 kernel/trace/trace.c:6822
Write of size 4507 at addr ffff888032b6b000 by task syz.2.320/7260

CPU: 1 UID: 0 PID: 7260 Comm: syz.2.320 Not tainted 6.15.0-rc1-syzkaller-00301-g3bde70a2c827 #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:408 [inline]
 print_report+0xc3/0x670 mm/kasan/report.c:521
 kasan_report+0xe0/0x110 mm/kasan/report.c:634
 check_region_inline mm/kasan/generic.c:183 [inline]
 kasan_check_range+0xef/0x1a0 mm/kasan/generic.c:189
 __asan_memcpy+0x3c/0x60 mm/kasan/shadow.c:106
 trace_seq_to_buffer kernel/trace/trace.c:1830 [inline]
 tracing_splice_read_pipe+0x6be/0xdd0 kernel/trace/trace.c:6822
 ....
==================================================================

It has been reported that trace_seq_to_buffer() tries to copy more data
than PAGE_SIZE to buf. Therefore, to prevent this, we should use the
smaller of trace_seq_used(&iter->seq) and PAGE_SIZE as an argument.

Link: https://lore.kernel.org/20250422113026.13308-1-aha310510@gmail.com
Reported-by: syzbot+c8cd2d2c412b868263fb@syzkaller.appspotmail.com
Fixes: 3c56819b14 ("tracing: splice support for tracing_pipe")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-01 15:24:15 -04:00
Andrew Jones
e6a3fc4f10 genirq/msi: Prevent NULL pointer dereference in msi_domain_debug_show()
irq_domain_debug_show_one() calls msi_domain_debug_show() with a non-NULL
domain pointer and a NULL irq_data pointer. irq_debug_show_data() calls it
with a NULL domain pointer.

The domain pointer is not used, but the irq_data pointer is required to be
non-NULL and lacks a NULL pointer check.

Add the missing NULL pointer check to ensure there is a non-NULL irq_data
pointer in msi_domain_debug_show() before dereferencing it.

[ tglx: Massaged change log ]

Fixes: 01499ae673 ("genirq/msi: Expose MSI message data in debugfs")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250430124836.49964-2-ajones@ventanamicro.com
2025-04-30 23:25:10 +02:00
Linus Torvalds
3929527918 Modules fixes for v6.15-rc5
A single series is present to properly handle the module_kobject creation.
 It fixes a problem with missing /sys/module/<module>/drivers for built-in
 modules.
 
 The fix has been on linux-next for two weeks with no reported issues.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEIduBR9MnFA82q/jtumpXJwqY6poFAmgSH2YUHHBldHIucGF2
 bHVAc3VzZS5jb20ACgkQumpXJwqY6poL/wf/TZEux9aieu8VOhPbV1Mo1npVAeT7
 MJ5R4M6QKxPNvvBiXK5lWSVy5IPtcuwkbyEfKxV/CS668FwJeFpGFb91rRY108He
 EUHjj5NtZ1WhEHFRBgJPLydGZGGtJzxy3yg26x6wO58VJrIx/H3HU3jgsnj1m32a
 fA1cbo4Yo9gnk0YzI2KDu6A+bXi8zJVpyYDU9Ir4mdy+CVd5+vN9WypzrjHXbMya
 2xhd9768sVmShY9K5+DlOXF4stVsP6CbgWGxhwIbdfLvY977QaBhrr+emrPE3uYt
 5g+rg3v7ciuW14D5rLPWqZ5aXinjNt4vc7maNA9sJLW5wLOiWGXjhseUFg==
 =6rT4
 -----END PGP SIGNATURE-----

Merge tag 'modules-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux

Pull modules fixes from Petr Pavlu:
 "A single series to properly handle the module_kobject creation.

  This fixes a problem with missing /sys/module/<module>/drivers for
  built-in modules"

* tag 'modules-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
  drivers: base: handle module_kobject creation
  kernel: globalize lookup_or_create_module_kobject()
  kernel: refactor lookup_or_create_module_kobject()
  kernel: param: rename locate_module_kobject
2025-04-30 08:37:52 -07:00
Linus Torvalds
3d23ef05c3 Fix sporadic crashes in dequeue_entities() due to ... bad math.
[ Arguably if pick_eevdf()/pick_next_entity() was less trusting
   of complex math being correct it could have de-escalated a
   crash into a warning, but that's for a different patch. ]
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgMnbERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iWcg//Tj96bTRqh9c31n0r4+k6xPZE8kyf7FVd
 vT/LraGe9I731+omWQQrybwgxe01Ja/bk2FNShQhVN3safk43OXReTWyNcd213Un
 q4vYvwX8gcVOl8qZibnAUHO2N0B+uTPlFBniG+VyhWqb+zPq0/dRCSy7nc0TNc54
 hHhERcLYOn+WWrUzKSbL5vm2JRowmlthTiw8/li7N5aappM8Hr4XbIZuhvd2aaB4
 ocsXJpOJyDUXP51Zi7jWEbWPr8O3VS/Zdz/F9/MGomPZ6rPBmRyNnadn0w1gjrGB
 ccTvJgBMMRH7Ltp05TslvVsnRnUIRIRe6bx/kW5pkSANxpSP+Ztw90ssAwq1v11G
 38+XIVnRJCjgP9O8/YByyW3dgWrp2o6rrZJIt++50BfQzASmT66//1Z1iV4nQIC0
 szoSa/tOm/WOFNK357pFDhAZyhZZUYlq5etvReG7q4OEHZb0y9/axw40wkzY/rpy
 Jc9XnaMoey/SvyvNHMKNJxFJNHuodosfY3rXRaeuhP22FW3qPqbtCxTM+8nRWMTs
 HojbqlrMFH9rAV9K1STgdH5YjIWsWHwJ9siHfw6SZtocMLOvzWDtfKyCJ+cvtvUz
 z82EBuPsltgq6+LLoGXyY+wlEUo/qvY3Ywv0Y9cAqolbBw5Cw/aEaDSH+K1cmubb
 AgtloYCkmXo=
 =pJ11
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Ingo Molnar:
 "Fix sporadic crashes in dequeue_entities() due to ... bad math.

  [ Arguably if pick_eevdf()/pick_next_entity() was less trusting of
    complex math being correct it could have de-escalated a crash into
    a warning, but that's for a different patch ]"

* tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash
2025-04-26 09:23:20 -07:00
Linus Torvalds
86baa5499c Misc perf events fixes:
- Use POLLERR for events in error state, instead of
    the ambiguous POLLHUP error value
 
  - Fix non-sampling (counting) events on certain x86 platforms
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgMmoERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1i6lg//Z2fxDHOXxSxNaNtin6wNb52vSRfmtFFD
 +6lxCbJP+qT66rWR8ZpRNKKQ+vZAKYXm8wGNakhb4wpFe+PJwsQhl5sWOHnoMO5a
 TBQFkvGrHxDxDa8xoQy6IFgee4ckpwxiVaMe0jhwG9/2rbOhXgDZ5dFZvxV4sbAT
 uT0Qfsm4gC+2oRVOx430zYSNlLRieux7mrXcTRpszLWy7n7kG2fzd+f7OFgKHrGd
 Bnx+X2DE2R3k8lNhJGZBc92zhJAjgoBw3R4ajFqsH6v7Fw0DFIhJ3zEn0EBbPvVo
 6hdkdYtpCog7Ek841lhzXlIz4Ofu05q+iUquEtbU3q51QeHF3a00i4SHfLT5L1NS
 xhOLR1nCSi9PMSfBHsdDfQbHr4WqK5NsyFvgQNnH7h31MybhkROzlP2JWN+tA/nJ
 DxBs14DiscA7zIYtl8gx8nVPgo7PBxupqJjorPgW6Fq11diKBe9thcPfjR763QKR
 jt6xyw40KAC8HZKntzrqugeWUGpf/LPwbH4QNX5M9TfgTum8duHaLFR2wGWUb3gr
 jPPxaSIBEPTENb2w9Z+N/5xGRwKlQo/QmROoygcr0Qox7qelp4GfFOxbQYGyppZX
 6k0BCRlgpNIy6EIgORgA8fpL6k5hZS7Jkjrs2nJd07pklYOuRQDYTBd0gh0eAwU5
 8wLrnBKCDCA=
 =LLOs
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc perf events fixes from Ingo Molnar:

 - Use POLLERR for events in error state, instead of the ambiguous
   POLLHUP error value

 - Fix non-sampling (counting) events on certain x86 platforms

* tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix non-sampling (counting) events on certain x86 platforms
  perf/core: Change to POLLERR for pinned events with error
2025-04-26 09:13:09 -07:00
Omar Sandoval
bbce3de72b sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash
There is a code path in dequeue_entities() that can set the slice of a
sched_entity to U64_MAX, which sometimes results in a crash.

The offending case is when dequeue_entities() is called to dequeue a
delayed group entity, and then the entity's parent's dequeue is delayed.
In that case:

1. In the if (entity_is_task(se)) else block at the beginning of
   dequeue_entities(), slice is set to
   cfs_rq_min_slice(group_cfs_rq(se)). If the entity was delayed, then
   it has no queued tasks, so cfs_rq_min_slice() returns U64_MAX.
2. The first for_each_sched_entity() loop dequeues the entity.
3. If the entity was its parent's only child, then the next iteration
   tries to dequeue the parent.
4. If the parent's dequeue needs to be delayed, then it breaks from the
   first for_each_sched_entity() loop _without updating slice_.
5. The second for_each_sched_entity() loop sets the parent's ->slice to
   the saved slice, which is still U64_MAX.

This throws off subsequent calculations with potentially catastrophic
results. A manifestation we saw in production was:

6. In update_entity_lag(), se->slice is used to calculate limit, which
   ends up as a huge negative number.
7. limit is used in se->vlag = clamp(vlag, -limit, limit). Because limit
   is negative, vlag > limit, so se->vlag is set to the same huge
   negative number.
8. In place_entity(), se->vlag is scaled, which overflows and results in
   another huge (positive or negative) number.
9. The adjusted lag is subtracted from se->vruntime, which increases or
   decreases se->vruntime by a huge number.
10. pick_eevdf() calls entity_eligible()/vruntime_eligible(), which
    incorrectly returns false because the vruntime is so far from the
    other vruntimes on the queue, causing the
    (vruntime - cfs_rq->min_vruntime) * load calulation to overflow.
11. Nothing appears to be eligible, so pick_eevdf() returns NULL.
12. pick_next_entity() tries to dereference the return value of
    pick_eevdf() and crashes.

Dumping the cfs_rq states from the core dumps with drgn showed tell-tale
huge vruntime ranges and bogus vlag values, and I also traced se->slice
being set to U64_MAX on live systems (which was usually "benign" since
the rest of the runqueue needed to be in a particular state to crash).

Fix it in dequeue_entities() by always setting slice from the first
non-empty cfs_rq.

Fixes: aef6987d89 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/f0c2d1072be229e1bdddc73c0703919a8b00c652.1745570998.git.osandov@fb.com
2025-04-26 10:44:36 +02:00
Linus Torvalds
f1a3944c86 bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmgMIjIACgkQ6rmadz2v
 bTqeDw/+Mjrvdfav0Xg1OrhPcJKNzi6Wy3q8mkAPf+2tGxJQCOtTJuBESSSbGkM5
 H03dpOQ9TtHhrS67HxzxhyjQdEU7j8F/ZSDfELznT6tYJHdhsbCX73wgJcX+x+Cl
 YhgtQty0Y2UhIrIO+bs3oAYHhRAKKmK4qcCUgRbmPc2stMRxuv28AbxxNaTDnweA
 qsw/2rXb1RYULRXMaWCAqWcejIUwGa16ATMiyLEDw8OCVnulXwMLKXGy4SXw5zUE
 Hw/vh9i6xMrkOgsMgeB6YBKBURX70pfS1WWDLqpBhOqyr4rdu93EGZ2Pg8ae32Qx
 XGyeO2J3Xjr2iUYxcGcOwCrKiz2FoNFLLq9qSOTRq8RQbHBnzGWpgPBaaMryjYId
 ARxqX3tcLdNVIgpy8h6d5Zo/uqxkMwa0Ttnz0tj3RVNiyFfY6EtZpC2YC2IHnriK
 QthDiXRAyS8o3oTO5dMp7lM9Oz1j7JslRym5jZNJNSfO2nXi3GHwRFRA532/7x1i
 JhfoKVwOFe6WybHDU/Dg8EuV8tuFKiPanu9Id9hCKJdW+Ze/KpnxAzZsT3gZ6U3n
 tj9MROu1GPIihpqgiLNgITKWO6G+6Z02ATKfP5PiTUmzhiAe4r/MFux9FpZ2CfUL
 LmryGu7fLb6KXLarHAkN++0yHBHcaw7faAJ9prFv7vFI8CNBMyE=
 =ghBA
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Add namespace to BPF internal symbols (Alexei Starovoitov)

 - Fix possible endless loop in BPF map iteration (Brandon Kammerdiener)

 - Fix compilation failure for samples/bpf on LoongArch (Haoran Jiang)

 - Disable a part of sockmap_ktls test (Ihor Solodrai)

 - Correct typo in __clang_major__ macro (Peilin Ye)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Correct typo in __clang_major__ macro
  samples/bpf: Fix compilation failure for samples/bpf on LoongArch Fedora
  bpf: Add namespace to BPF internal symbols
  selftests/bpf: add test for softlock when modifying hashmap while iterating
  bpf: fix possible endless loop in BPF map iteration
  selftests/bpf: Mitigate sockmap_ktls disconnect_after_delete failure
2025-04-25 17:53:09 -07:00
Linus Torvalds
882cd65288 dma-maping fixes for Linux 6.15
- avoid unused variable warnings (Arnd Bergmann, Marek Szyprowski)
 - add runtume warnings and debug messages for devices with limited DMA
   capabilities (Balbir Singh, Chen-Yu Tsai)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaAtFeAAKCRCJp1EFxbsS
 RGJwAQDIwyLQdk4XbWUZYokxIl/5jIiuaTqQBoPGPILnwoTkuAD8DlxKQvsnzkdT
 QK7TSFpKwrboSaveGWEG5oB60wIsKgU=
 =TSJY
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.15-2025-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-maping fixes from Marek Szyprowski:

 - avoid unused variable warnings (Arnd Bergmann, Marek Szyprowski)

 - add runtume warnings and debug messages for devices with limited DMA
   capabilities (Balbir Singh, Chen-Yu Tsai)

* tag 'dma-mapping-6.15-2025-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma-coherent: Warn if OF reserved memory is beyond current coherent DMA mask
  dma-mapping: Fix warning reported for missing prototype
  dma-mapping: avoid potential unused data compilation warning
  dma/mapping.c: dev_dbg support for dma_addressing_limited
  dma/contiguous: avoid warning about unused size_bytes
2025-04-25 09:44:53 -07:00
Alexei Starovoitov
f88886de09 bpf: Add namespace to BPF internal symbols
Add namespace to BPF internal symbols used by light skeleton
to prevent abuse and document with the code their allowed usage.

Fixes: b1d18a7574 ("bpf: Extend sys_bpf commands for bpf_syscall programs.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20250425014542.62385-1-alexei.starovoitov@gmail.com
2025-04-25 09:21:23 -07:00
Brandon Kammerdiener
75673fda0c bpf: fix possible endless loop in BPF map iteration
The _safe variant used here gets the next element before running the callback,
avoiding the endless loop condition.

Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@intel.com>
Link: https://lore.kernel.org/r/20250424153246.141677-2-brandon.kammerdiener@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
2025-04-25 08:36:59 -07:00
Linus Torvalds
0251ddbffb virtio, vhost: fixes
A small number of fixes.
 
 virtgpu is exempt from reset shutdown fow now -
 	 a more complete fix is in the works
 spec compliance fixes in:
 	virtio-pci cap commands
 	vhost_scsi_send_bad_target
 	virtio console resize
 missing locking fix in vhost-scsi
 virtio ring - a KCSAN false positive fix
 VHOST_*_OWNER documentation fix
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmgIi3cPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpGU8H/1Fq8pH+irGvyH4E21O03qx0wiM+lcYVhNH5
 2a3rjOwuJBiLvscZTJG/w07hIpx0O4WrbygdT0BTll4Uen2C+OpGn/Y1LfhW6wsr
 3yyeBpTr5hKiY8sOD08rMTHTCM4mD8UdYr13RcNq+eUxNZ6bA+kiGaXpIk0AiRPR
 5pdbx16cTZM7k+/9aXp68hRO7yHnyilGzAJG1hHmfx1L5Mt++RVKsf2KI+3YHWcI
 0ZZj/NP3iZfNm57+QpKX6zYikH4IFIer1r9wotMaR74brpuq8w7HKZUqe3VfG11Y
 TBgq6NfDZVq8G8bCGPv+C+DfDnpYMFVYqytCLn4/AyOhLNCRDs8=
 =m8wk
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
 "A small number of fixes:

   - virtgpu is exempt from reset shutdown fow now - a more complete fix
     is in the works

   - spec compliance fixes in:
       - virtio-pci cap commands
       - vhost_scsi_send_bad_target
       - virtio console resize

   - missing locking fix in vhost-scsi

   - virtio ring - a KCSAN false positive fix

   - VHOST_*_OWNER documentation fix"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost-scsi: Fix vhost_scsi_send_status()
  vhost-scsi: Fix vhost_scsi_send_bad_target()
  vhost-scsi: protect vq->log_used with vq->mutex
  vhost_task: fix vhost_task_create() documentation
  virtio_console: fix order of fields cols and rows
  virtio_console: fix missing byte order handling for cols and rows
  virtgpu: don't reset on shutdown
  virtio_ring: Fix data race by tagging event_triggered as racy for KCSAN
  vhost: fix VHOST_*_OWNER documentation
  virtio_pci: Use self group type for cap commands
2025-04-23 08:25:56 -07:00
Namhyung Kim
0db61388b3 perf/core: Change to POLLERR for pinned events with error
Commit:

  f4b07fd62d ("perf/core: Use POLLHUP for pinned events in error")

started to emit POLLHUP for pinned events in an error state.

But the POLLHUP is also used to signal events that the attached task is
terminated.  To distinguish pinned per-task events in the error state
it would need to check if the task is live.

Change it to POLLERR to make it clear.

Suggested-by: Gabriel Marin <gmx@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250422223318.180343-1-namhyung@kernel.org
2025-04-23 09:39:06 +02:00
Chen-Yu Tsai
89461db349 dma-coherent: Warn if OF reserved memory is beyond current coherent DMA mask
When a reserved memory region described in the device tree is attached
to a device, it is expected that the device's limitations are correctly
included in that description.

However, if the device driver failed to implement DMA address masking
or addressing beyond the default 32 bits (on arm64), then bad things
could happen because the DMA address was truncated, such as playing
back audio with no actual audio coming out, or DMA overwriting random
blocks of kernel memory.

Check against the coherent DMA mask when the memory regions are attached
to the device. Give a warning when the memory region can not be covered
by the mask.

A warning instead of a hard error was chosen, because it is possible
that existing drivers could be working fine even if they forgot to
extend the coherent DMA mask.

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20250421083930.374173-1-wenst@chromium.org
2025-04-22 17:44:09 +02:00
Balbir Singh
cae5572ec9 dma-mapping: Fix warning reported for missing prototype
lkp reported a warning about missing prototype for a recent patch.

The kernel-doc style comments are out of sync, move them to the right
function.

Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Christoph Hellwig <hch@lst.de>

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202504190615.g9fANxHw-lkp@intel.com/

Signed-off-by: Balbir Singh <balbirs@nvidia.com>
[mszyprow: reformatted subject]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20250422114034.3535515-1-balbirs@nvidia.com
2025-04-22 15:06:33 +02:00
Linus Torvalds
a33b5a08cb sched_ext: Fixes for v6.15-rc3
- Use kvzalloc() so that large exit_dump buffer allocations don't fail
   easily.
 
 - Remove cpu.weight / cpu.idle unimplemented warnings which are more
   annoying than helpful. This makes SCX_OPS_HAS_CGROUP_WEIGHT unnecessary.
   Mark it for deprecation.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaAaeKw4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGcDjAQDM14FReObIKOOuqzlNaYXQqcSW2///ZVf/FR8j
 HpWtyAD/Tsqg6CzBpTxKkpMRLsE2iKI1t770vkUnDbjcnR0Rxgc=
 =rIV5
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-6.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

 - Use kvzalloc() so that large exit_dump buffer allocations don't fail
   easily

 - Remove cpu.weight / cpu.idle unimplemented warnings which are more
   annoying than helpful.

   This makes SCX_OPS_HAS_CGROUP_WEIGHT unnecessary. Mark it for
   deprecation

* tag 'sched_ext-for-6.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Mark SCX_OPS_HAS_CGROUP_WEIGHT for deprecation
  sched_ext: Remove cpu.weight / cpu.idle unimplemented warnings
  sched_ext: Use kvzalloc for large exit_dump allocation
2025-04-21 19:16:29 -07:00
Linus Torvalds
a22509a4ee cgroup: Fixes for v6.15-rc3
- Fix compilation in CONFIG_LOCKDEP && !CONFIG_PROVE_RCU configurations.
 
 - Allow "cpuset_v2_mode" mount option for "cpuset" filesystem type to make
   life easier for android.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaAac+Q4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGUiAAQCbw6eOFAE+sjI6GgAeMVORbLqyufGDNPvBwgzJ
 xPxgcwD/ZLlsJWRG6BzQ/KHeFZnGWSJEiqSSFHGCCr0l4QkIdgA=
 =ay1E
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - Fix compilation in CONFIG_LOCKDEP && !CONFIG_PROVE_RCU configurations

 - Allow "cpuset_v2_mode" mount option for "cpuset" filesystem type to
   make life easier for android

* tag 'cgroup-for-6.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset-v1: Add missing support for cpuset_v2_mode
  cgroup: Fix compilation issue due to cgroup_mutex not being exported
2025-04-21 19:13:25 -07:00
Linus Torvalds
119009db26 vfs-6.15-rc3.fixes.2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaAQM5QAKCRCRxhvAZXjc
 olcwAP0RETZn15Jkt5+mKjcx99fuVE7je3lp56UH4Y4XjZmthgEA1n65RDr4Tq6E
 548A2/9Hnt4NWdvoi9VhrG4+5dNRowM=
 =cFFa
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc3.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Revert the hfs{plus} deprecation warning that's also included in this
   pull request. The commit introducing the deprecation warning resides
   rather early in this branch. So simply dropping it would've rebased
   all other commits which I decided to avoid. Hence the revert in the
   same branch

   [ Background - the deprecation warning discussion resulted in people
     stepping up, and so hfs{plus} will have a maintainer taking care of
     it after all..   - Linus ]

 - Switch CONFIG_SYSFS_SYCALL default to n and decouple from
   CONFIG_EXPERT

 - Fix an audit bug caused by changes to our kernel path lookup helpers
   this cycle. Audit needs the parent path even if the dentry it tried
   to look up is negative

 - Ensure that the kernel path lookup helpers leave the passed in path
   argument clean when they return an error. This is consistent with all
   our other helpers

 - Ensure that vfs_getattr_nosec() calls bdev_statx() so the relevant
   information is available to kernel consumers as well

 - Don't set a timer and call schedule() if the timer will expire
   immediately in epoll

 - Make netfs lookup tables with __nonstring

* tag 'vfs-6.15-rc3.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  Revert "hfs{plus}: add deprecation warning"
  fs: move the bdex_statx call to vfs_getattr_nosec
  netfs: Mark __nonstring lookup tables
  eventpoll: Set epoll timeout if it's in the future
  fs: ensure that *path_locked*() helpers leave passed path pristine
  fs: add kern_path_locked_negative()
  hfs{plus}: add deprecation warning
  Kconfig: switch CONFIG_SYSFS_SYCALL default to n
2025-04-19 14:31:08 -07:00
Linus Torvalds
fa6ad96dca tracing fixes for v6.15
- Initialize hash variables in ftrace subops logic
 
   The fix that simplified the ftrace subops logic opened a path where some
   variables could be used without being initialized, and done subtly where
   the compiler did not catch it. Initialize those variables to the
   EMPTY_HASH, which is the default hash.
 
 - Reinitialize the hash pointers after they are freed
 
   Some of the hash pointers in the subop logic were freed but may still be
   referenced later. To prevent use-after-free bugs, initialize them back to
   the EMPTY_HASH.
 
 - Free the ftrace hashes when they are replaced
 
   The fix that simplified the subops logic updated some hash pointers, but
   left the original hash that they were pointing to where they are no longer
   used. This caused a memory leak. Free the hashes that are pointed to by
   the pointers when they are replaced.
 
 - Fix size initialization of ftrace direct function hash
 
   The ftrace direct function hash used by BPF initialized the hash size
   incorrectly. It checked the size of items to a hard coded 32, which made
   the hash bit size of 5. The hash size is supposed to be limited by the bit
   size of the hash, as the bitmask is allowed to be greater than 5. Rework
   the size check to first pass the number of elements to fls() and then
   compare that to FTRACE_HASH_MAX_BITS before allocating the hash.
 
 - Fix format output of ftrace_graph_ent_entry event
 
   The field depth of the ftrace_graph_ent_entry event is of size 4 but the
   output showed it as unsigned long and use "%lu". Change it to unsigned int
   and use "%u" in the print format that is displayed to user space.
 
 - Fix the trace event filter on strings
 
   Events can be filtered on numbers or string values. The return value
   checked from strncpy_from_kernel_nofault() and strncpy_from_user_nofault()
   was used to determine if reading the strings would fault or not. It would
   return fault if the value was non zero, which is basically meant that it
   was always considering the read as a fault.
 
 - Add selftest to test trace event string filtering
 
   In order to catch the breakage of the string filtering, add a self test to
   make sure that it continues to work.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaAPqNRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qv5nAP4mqIgne7tzMhHIH/nQGM/7Dj98n+Vt
 BXm6VifVdVJvtAD+KCDipZ2MspGEeZX3SDSnvBuj0S+OX9T9CTWPv+rFUwE=
 =AWY4
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Initialize hash variables in ftrace subops logic

   The fix that simplified the ftrace subops logic opened a path where
   some variables could be used without being initialized, and done
   subtly where the compiler did not catch it. Initialize those
   variables to the EMPTY_HASH, which is the default hash.

 - Reinitialize the hash pointers after they are freed

   Some of the hash pointers in the subop logic were freed but may still
   be referenced later. To prevent use-after-free bugs, initialize them
   back to the EMPTY_HASH.

 - Free the ftrace hashes when they are replaced

   The fix that simplified the subops logic updated some hash pointers,
   but left the original hash that they were pointing to where they are
   no longer used. This caused a memory leak. Free the hashes that are
   pointed to by the pointers when they are replaced.

 - Fix size initialization of ftrace direct function hash

   The ftrace direct function hash used by BPF initialized the hash size
   incorrectly. It checked the size of items to a hard coded 32, which
   made the hash bit size of 5. The hash size is supposed to be limited
   by the bit size of the hash, as the bitmask is allowed to be greater
   than 5. Rework the size check to first pass the number of elements to
   fls() and then compare that to FTRACE_HASH_MAX_BITS before allocating
   the hash.

 - Fix format output of ftrace_graph_ent_entry event

   The field depth of the ftrace_graph_ent_entry event is of size 4 but
   the output showed it as unsigned long and use "%lu". Change it to
   unsigned int and use "%u" in the print format that is displayed to
   user space.

 - Fix the trace event filter on strings

   Events can be filtered on numbers or string values. The return value
   checked from strncpy_from_kernel_nofault() and
   strncpy_from_user_nofault() was used to determine if reading the
   strings would fault or not. It would return fault if the value was
   non zero, which is basically meant that it was always considering the
   read as a fault.

 - Add selftest to test trace event string filtering

   In order to catch the breakage of the string filtering, add a self
   test to make sure that it continues to work.

* tag 'trace-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: selftests: Add testing a user string to filters
  tracing: Fix filter string testing
  ftrace: Fix type of ftrace_graph_ent_entry.depth
  ftrace: fix incorrect hash size in register_ftrace_direct()
  ftrace: Free ftrace hashes after they are replaced in the subops code
  ftrace: Reinitialize hash to EMPTY_HASH after freeing
  ftrace: Initialize variables for ftrace_startup/shutdown_subops()
2025-04-19 11:57:36 -07:00
Stefano Garzarella
fec0abf526 vhost_task: fix vhost_task_create() documentation
Commit cb380909ae ("vhost: return task creation error instead of NULL")
changed the return value of vhost_task_create(), but did not update the
documentation.

Reflect the change in the documentation: on an error, vhost_task_create()
returns an ERR_PTR() and no longer NULL.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20250327124435.142831-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-04-18 10:08:11 -04:00
Steven Rostedt
a8c5b0ed89 tracing: Fix filter string testing
The filter string testing uses strncpy_from_kernel/user_nofault() to
retrieve the string to test the filter against. The if() statement was
incorrect as it considered 0 as a fault, when it is only negative that it
faulted.

Running the following commands:

  # cd /sys/kernel/tracing
  # echo "filename.ustring ~ \"/proc*\"" > events/syscalls/sys_enter_openat/filter
  # echo 1 > events/syscalls/sys_enter_openat/enable
  # ls /proc/$$/maps
  # cat trace

Would produce nothing, but with the fix it will produce something like:

      ls-1192    [007] .....  8169.828333: sys_openat(dfd: ffffffffffffff9c, filename: 7efc18359904, flags: 80000, mode: 0)

Link: https://lore.kernel.org/all/CAEf4BzbVPQ=BjWztmEwBPRKHUwNfKBkS3kce-Rzka6zvbQeVpg@mail.gmail.com/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250417183003.505835fb@gandalf.local.home
Fixes: 77360f9bbc ("tracing: Add test for user space strings when filtering on string pointers")
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Reported-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-17 22:16:56 -04:00
Ilya Leoshkevich
3b4e87e6a5 ftrace: Fix type of ftrace_graph_ent_entry.depth
ftrace_graph_ent.depth is int, but ftrace_graph_ent_entry.depth is
unsigned long. This confuses trace-cmd on 64-bit big-endian systems and
makes it print a huge amount of spaces. Fix this by using unsigned int,
which has a matching size, instead.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/20250412221847.17310-2-iii@linux.ibm.com
Fixes: ff5c9c576e ("ftrace: Add support for function argument to graph tracer")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-17 15:19:15 -04:00
Menglong Dong
92f1d3b401 ftrace: fix incorrect hash size in register_ftrace_direct()
The maximum of the ftrace hash bits is made fls(32) in
register_ftrace_direct(), which seems illogical. So, we fix it by making
the max hash bits FTRACE_HASH_MAX_BITS instead.

Link: https://lore.kernel.org/20250413014444.36724-1-dongml2@chinatelecom.cn
Fixes: d05cb47066 ("ftrace: Fix modification of direct_function hash while in use")
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-17 15:16:51 -04:00
Steven Rostedt
c45c585dde ftrace: Free ftrace hashes after they are replaced in the subops code
The subops processing creates new hashes when adding and removing subops.
There were some places that the old hashes that were replaced were not
freed and this caused some memory leaks.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250417135939.245b128d@gandalf.local.home
Fixes: 0ae6b8ce20 ("ftrace: Fix accounting of subop hashes")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-17 15:16:07 -04:00
Steven Rostedt
08275e59a7 ftrace: Reinitialize hash to EMPTY_HASH after freeing
There's several locations that free a ftrace hash pointer but may be
referenced again. Reset them to EMPTY_HASH so that a u-a-f bug doesn't
happen.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250417110933.20ab718b@gandalf.local.home
Fixes: 0ae6b8ce20 ("ftrace: Fix accounting of subop hashes")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-17 15:15:28 -04:00
Steven Rostedt
31d1139956 ftrace: Initialize variables for ftrace_startup/shutdown_subops()
The reworking to fix and simplify the ftrace_startup_subops() and the
ftrace_shutdown_subops() made it possible for the filter_hash and
notrace_hash variables to be used uninitialized in a way that the compiler
did not catch it.

Initialize both filter_hash and notrace_hash to the EMPTY_HASH as that is
what they should be if they never are used.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250417104017.3aea66c2@gandalf.local.home
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Fixes: 0ae6b8ce20 ("ftrace: Fix accounting of subop hashes")
Closes: https://lore.kernel.org/all/1db64a42-626d-4b3a-be08-c65e47333ce2@linux.ibm.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-17 15:15:05 -04:00
T.J. Mercier
1bf67c8fdb cgroup/cpuset-v1: Add missing support for cpuset_v2_mode
Android has mounted the v1 cpuset controller using filesystem type
"cpuset" (not "cgroup") since 2015 [1], and depends on the resulting
behavior where the controller name is not added as a prefix for cgroupfs
files. [2]

Later, a problem was discovered where cpu hotplug onlining did not
affect the cpuset/cpus files, which Android carried an out-of-tree patch
to address for a while. An attempt was made to upstream this patch, but
the recommendation was to use the "cpuset_v2_mode" mount option
instead. [3]

An effort was made to do so, but this fails with "cgroup: Unknown
parameter 'cpuset_v2_mode'" because commit e1cba4b85d ("cgroup: Add
mount flag to enable cpuset to use v2 behavior in v1 cgroup") did not
update the special cased cpuset_mount(), and only the cgroup (v1)
filesystem type was updated.

Add parameter parsing to the cpuset filesystem type so that
cpuset_v2_mode works like the cgroup filesystem type:

$ mkdir /dev/cpuset
$ mount -t cpuset -ocpuset_v2_mode none /dev/cpuset
$ mount|grep cpuset
none on /dev/cpuset type cgroup (rw,relatime,cpuset,noprefix,cpuset_v2_mode,release_agent=/sbin/cpuset_release_agent)

[1] b769c8d24f
[2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libprocessgroup/setup/cgroup_map_write.cpp;drc=2dac5d89a0f024a2d0cc46a80ba4ee13472f1681;l=192
[3] https://lore.kernel.org/lkml/f795f8be-a184-408a-0b5a-553d26061385@redhat.com/T/

Fixes: e1cba4b85d ("cgroup: Add mount flag to enable cpuset to use v2 behavior in v1 cgroup")
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Waiman Long <longman@redhat.com>
Reviewed-by: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-17 07:32:53 -10:00
gaoxu
87c259a7a3 cgroup: Fix compilation issue due to cgroup_mutex not being exported
When adding folio_memcg function call in the zram module for
Android16-6.12, the following error occurs during compilation:
ERROR: modpost: "cgroup_mutex" [../soc-repo/zram.ko] undefined!

This error is caused by the indirect call to lockdep_is_held(&cgroup_mutex)
within folio_memcg. The export setting for cgroup_mutex is controlled by
the CONFIG_PROVE_RCU macro. If CONFIG_LOCKDEP is enabled while
CONFIG_PROVE_RCU is not, this compilation error will occur.

To resolve this issue, add a parallel macro CONFIG_LOCKDEP control to
ensure cgroup_mutex is properly exported when needed.

Signed-off-by: gao xu <gaoxu2@honor.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-17 06:27:31 -10:00
Rafael J. Wysocki
75da043d8f cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit()
Notice that ignore_dl_rate_limit() need not piggy back on the
limits_changed handling to achieve its goal (which is to enforce a
frequency update before its due time).

Namely, if sugov_should_update_freq() is updated to check
sg_policy->need_freq_update and return 'true' if it is set when
sg_policy->limits_changed is not set, ignore_dl_rate_limit() may
set the former directly instead of setting the latter, so it can
avoid hitting the memory barrier in sugov_should_update_freq().

Update the code accordingly.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/10666429.nUPlyArG6x@rjwysocki.net
2025-04-17 17:54:44 +02:00
Rafael J. Wysocki
79443a7e9d cpufreq/sched: Explicitly synchronize limits_changed flag handling
The handling of the limits_changed flag in struct sugov_policy needs to
be explicitly synchronized to ensure that cpufreq policy limits updates
will not be missed in some cases.

Without that synchronization it is theoretically possible that
the limits_changed update in sugov_should_update_freq() will be
reordered with respect to the reads of the policy limits in
cpufreq_driver_resolve_freq() and in that case, if the limits_changed
update in sugov_limits() clobbers the one in sugov_should_update_freq(),
the new policy limits may not take effect for a long time.

Likewise, the limits_changed update in sugov_limits() may theoretically
get reordered with respect to the updates of the policy limits in
cpufreq_set_policy() and if sugov_should_update_freq() runs between
them, the policy limits change may be missed.

To ensure that the above situations will not take place, add memory
barriers preventing the reordering in question from taking place and
add READ_ONCE() and WRITE_ONCE() annotations around all of the
limits_changed flag updates to prevent the compiler from messing up
with that code.

Fixes: 600f5badb7 ("cpufreq: schedutil: Don't skip freq update when limits change")
Cc: 5.3+ <stable@vger.kernel.org> # 5.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/3376719.44csPzL39Z@rjwysocki.net
2025-04-17 17:54:44 +02:00
Rafael J. Wysocki
cfde542df7 cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS
Commit 8e461a1cb4 ("cpufreq: schedutil: Fix superfluous updates caused
by need_freq_update") modified sugov_should_update_freq() to set the
need_freq_update flag only for drivers with CPUFREQ_NEED_UPDATE_LIMITS
set, but that flag generally needs to be set when the policy limits
change because the driver callback may need to be invoked for the new
limits to take effect.

However, if the return value of cpufreq_driver_resolve_freq() after
applying the new limits is still equal to the previously selected
frequency, the driver callback needs to be invoked only in the case
when CPUFREQ_NEED_UPDATE_LIMITS is set (which means that the driver
specifically wants its callback to be invoked every time the policy
limits change).

Update the code accordingly to avoid missing policy limits changes for
drivers without CPUFREQ_NEED_UPDATE_LIMITS.

Fixes: 8e461a1cb4 ("cpufreq: schedutil: Fix superfluous updates caused by need_freq_update")
Closes: https://lore.kernel.org/lkml/Z_Tlc6Qs-tYpxWYb@linaro.org/
Reported-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/3010358.e9J7NaK4W3@rjwysocki.net
2025-04-17 17:54:43 +02:00
Shyam Saini
7c76c813cf kernel: globalize lookup_or_create_module_kobject()
lookup_or_create_module_kobject() is marked as static and __init,
to make it global drop static keyword.
Since this function can be called from non-init code, use __modinit
instead of __init, __modinit marker will make it __init if
CONFIG_MODULES is not defined.

Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250227184930.34163-4-shyamsaini@linux.microsoft.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-04-16 14:54:53 +02:00
Shyam Saini
1c7777feb0 kernel: refactor lookup_or_create_module_kobject()
In the unlikely event of the allocation failing, it is better to let
the machine boot with a not fully populated sysfs than to kill it with
this BUG_ON(). All callers are already prepared for
lookup_or_create_module_kobject() returning NULL.

This is also preparation for calling this function from non __init
code, where using BUG_ON for allocation failure handling is not
acceptable.

Since we are here, also start using IS_ENABLED instead of #ifdef
construct.

Suggested-by: Thomas Weißschuh <linux@weissschuh.net>
Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250227184930.34163-3-shyamsaini@linux.microsoft.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-04-16 14:54:35 +02:00
Shyam Saini
bbc9462f0c kernel: param: rename locate_module_kobject
The locate_module_kobject() function looks up an existing
module_kobject for a given module name. If it cannot find the
corresponding module_kobject, it creates one for the given name.

This commit renames locate_module_kobject() to
lookup_or_create_module_kobject() to better describe its operations.

This doesn't change anything functionality wise.

Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250227184930.34163-2-shyamsaini@linux.microsoft.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-04-16 14:35:46 +02:00
Christian Brauner
c86b300b1e
fs: add kern_path_locked_negative()
The audit code relies on the fact that kern_path_locked() returned a
path even for a negative dentry. If it doesn't find a valid dentry it
immediately calls:

    audit_find_parent(d_backing_inode(parent_path.dentry));

which assumes that parent_path.dentry is still valid. But it isn't since
kern_path_locked() has been changed to path_put() also for a negative
dentry.

Fix this by adding a helper that implements the required audit semantics
and allows us to fix the immediate bleeding. We can find a unified
solution for this afterwards.

Link: https://lore.kernel.org/20250414-rennt-wimmeln-f186c3a780f1@brauner
Fixes: 1c3cb50b58 ("VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry")
Reported-and-tested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-15 11:32:34 +02:00
Balbir Singh
2042c352e2 dma/mapping.c: dev_dbg support for dma_addressing_limited
In the debug and resolution of an issue involving forced use of bounce
buffers, 7170130e4c ("x86/mm/init: Handle the special case of device
private pages in add_pages(), to not increase max_pfn and trigger
dma_addressing_limited() bounce buffers"). It would have been easier
to debug the issue if dma_addressing_limited() had debug information
about the device not being able to address all of memory and thus forcing
all accesses through a bounce buffer. Please see[2]

Implement dev_dbg to debug the potential use of bounce buffers
when we hit the condition. When swiotlb is used,
dma_addressing_limited() is used to determine the size of maximum dma
buffer size in dma_direct_max_mapping_size(). The debug prints could be
triggered in that check as well (when enabled).

Link: https://lore.kernel.org/lkml/20250401000752.249348-1-balbirs@nvidia.com/ [1]
Link: https://lore.kernel.org/lkml/20250310112206.4168-1-spasswolf@web.de/ [2]

Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Christoph Hellwig <hch@infradead.org>

Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20250414113752.3298276-1-balbirs@nvidia.com
2025-04-14 16:10:50 +02:00
Linus Torvalds
7cdabafc00 tracing fixes for v6.15
- Hide get_vm_area() from MMUless builds
 
   The function get_vm_area() is not defined when CONFIG_MMU is not defined.
   Hide that function within #ifdef CONFIG_MMU.
 
 - Fix output of synthetic events when they have dynamic strings
 
   The print fmt of the synthetic event's format file use to have "%.*s" for
   dynamic size strings even though the user space exported arguments had
   only __get_str() macro that provided just a nul terminated string. This
   was fixed so that user space could parse this properly. But the reason
   that it had "%.*s" was because internally it provided the maximum size of
   the string as one of the arguments. The fix that replaced "%.*s" with "%s"
   caused the trace output (when the kernel reads the event) to write
   "(efault)" as it would now read the length of the string as "%s".
 
   As the string provided is always nul terminated, there's no reason for the
   internal code to use "%.*s" anyway. Just remove the length argument to
   match the "%s" that is now in the format.
 
 - Fix the ftrace subops hash logic of the manager ops hash
 
   The function_graph uses the ftrace subops code. The subops code is a way
   to have a single ftrace_ops registered with ftrace to determine what
   functions will call the ftrace_ops callback. More than one user of
   function graph can register a ftrace_ops with it. The function graph
   infrastructure will then add this ftrace_ops as a subops with the main
   ftrace_ops it registers with ftrace. This is because the functions will
   always call the function graph callback which in turn calls the subops
   ftrace_ops callbacks.
 
   The main ftrace_ops must add a callback to all the functions that the
   subops want a callback from. When a subops is registered, it will update
   the main ftrace_ops hash to include the functions it wants. This is the
   logic that was broken.
 
   The ftrace_ops hash has a "filter_hash" and a "notrace_hash" were all the
   functions in the filter_hash but not in the notrace_hash are attached by
   ftrace. The original logic would have the main ftrace_ops filter_hash be a
   union of all the subops filter_hashes and the main notrace_hash would be a
   intersect of all the subops filter hashes. But this was incorrect because
   the notrace hash depends on the filter_hash it is associated to and not
   the union of all filter_hashes.
 
   Instead, when a subops is added, just include all the functions of the
   subops hash that are in its filter_hash but not in its notrace_hash. The
   main subops hash should not use its notrace hash, unless all of its subops
   hashes have an empty filter_hash (which means to attach to all functions),
   and then, and only then, the main ftrace_ops notrace hash can be the
   intersect of all the subops hashes.
 
   This not only fixes the bug, but also simplifies the code.
 
 - Add a selftest to better test the subops filtering
 
   Add a selftest that would catch the bug fixed by the above change.
 
 - Fix extra newline printed in function tracing with retval
 
   The function parameter code changed the output logic slightly and called
   print_graph_retval() and also printed a newline. The print_graph_retval()
   also prints a newline which caused blank lines to be printed in the
   function graph tracer when retval was added. This caused one of the
   selftests to fail if retvals were enabled. Instead remove the new line
   output from print_graph_retval() and have the callers always print the
   new line so that it doesn't have to do special logic if it calls
   print_graph_retval() or not.
 
 - Fix out-of-bound memory access in the runtime verifier
 
   When rv_is_container_monitor() is called on the last entry on the link
   list it references the next entry, which is the list head and causes an
   out-of-bound memory access.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ/rXQxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qoj7AQC0C2awpJSUIRj91qjPtMYuNUE3AVpB
 EEZEkt19LfE//gEA1fOx3Cors/LrY9dthn/3LMKL23vo9c4i0ffhs2X+1gE=
 =XJL5
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Hide get_vm_area() from MMUless builds

   The function get_vm_area() is not defined when CONFIG_MMU is not
   defined. Hide that function within #ifdef CONFIG_MMU.

 - Fix output of synthetic events when they have dynamic strings

   The print fmt of the synthetic event's format file use to have "%.*s"
   for dynamic size strings even though the user space exported
   arguments had only __get_str() macro that provided just a nul
   terminated string. This was fixed so that user space could parse this
   properly.

   But the reason that it had "%.*s" was because internally it provided
   the maximum size of the string as one of the arguments. The fix that
   replaced "%.*s" with "%s" caused the trace output (when the kernel
   reads the event) to write "(efault)" as it would now read the length
   of the string as "%s".

   As the string provided is always nul terminated, there's no reason
   for the internal code to use "%.*s" anyway. Just remove the length
   argument to match the "%s" that is now in the format.

 - Fix the ftrace subops hash logic of the manager ops hash

   The function_graph uses the ftrace subops code. The subops code is a
   way to have a single ftrace_ops registered with ftrace to determine
   what functions will call the ftrace_ops callback. More than one user
   of function graph can register a ftrace_ops with it. The function
   graph infrastructure will then add this ftrace_ops as a subops with
   the main ftrace_ops it registers with ftrace. This is because the
   functions will always call the function graph callback which in turn
   calls the subops ftrace_ops callbacks.

   The main ftrace_ops must add a callback to all the functions that the
   subops want a callback from. When a subops is registered, it will
   update the main ftrace_ops hash to include the functions it wants.
   This is the logic that was broken.

   The ftrace_ops hash has a "filter_hash" and a "notrace_hash" where
   all the functions in the filter_hash but not in the notrace_hash are
   attached by ftrace. The original logic would have the main ftrace_ops
   filter_hash be a union of all the subops filter_hashes and the main
   notrace_hash would be a intersect of all the subops filter hashes.
   But this was incorrect because the notrace hash depends on the
   filter_hash it is associated to and not the union of all
   filter_hashes.

   Instead, when a subops is added, just include all the functions of
   the subops hash that are in its filter_hash but not in its
   notrace_hash. The main subops hash should not use its notrace hash,
   unless all of its subops hashes have an empty filter_hash (which
   means to attach to all functions), and then, and only then, the main
   ftrace_ops notrace hash can be the intersect of all the subops
   hashes.

   This not only fixes the bug, but also simplifies the code.

 - Add a selftest to better test the subops filtering

   Add a selftest that would catch the bug fixed by the above change.

 - Fix extra newline printed in function tracing with retval

   The function parameter code changed the output logic slightly and
   called print_graph_retval() and also printed a newline. The
   print_graph_retval() also prints a newline which caused blank lines
   to be printed in the function graph tracer when retval was added.
   This caused one of the selftests to fail if retvals were enabled.
   Instead remove the new line output from print_graph_retval() and have
   the callers always print the new line so that it doesn't have to do
   special logic if it calls print_graph_retval() or not.

 - Fix out-of-bound memory access in the runtime verifier

   When rv_is_container_monitor() is called on the last entry on the
   link list it references the next entry, which is the list head and
   causes an out-of-bound memory access.

* tag 'trace-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv: Fix out-of-bound memory access in rv_is_container_monitor()
  ftrace: Do not have print_graph_retval() add a newline
  tracing/selftest: Add test to better test subops filtering of function graph
  ftrace: Fix accounting of subop hashes
  ftrace: Properly merge notrace hashes
  tracing: Do not add length to print format in synthetic events
  tracing: Hide get_vm_area() from MMUless builds
2025-04-12 15:37:40 -07:00
Linus Torvalds
b676ac484f bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmf6sD8ACgkQ6rmadz2v
 bTq86w//bbg2S1ZhSXXQvgRSbxfecvJ0r6XGDOaMsKxPXcqpbaMoSCYx2D8puO+b
 xm0vc+5qXlzuTHq9I8flDKrWdA+/sHxLQhXjcBA796vaY6IgJEnapf3kENyzZ3Vp
 agpNPlZe9FLaANDRivTFPVgzVjr07/3eL7VKItASksb/3yjBSa+vrIJVfGF1krQT
 slxTMzVMzB+p0MdKVjmeGn5EodWXp8TdVzQBPb8vnCn7U1h1HULSh4j1+nZ/Z1yr
 zC4/pVPmdDJe1H8ghBGm4f0nY+EwXPtZiVbXnYS2FhgjvthRKFYIyxN9F6kg7AD7
 NG0T6xw/QYNfPTR40PSiV/WHhH5qa2zRVtlepVU7tqqmsyRXi+0Eq/MfJyiuNzgN
 WWmJec0O/Ax4r2Xs/QgX3mFlRnLNi5gmc7fuOARmayAlqElZ9QdB2x6ebW5Fk4Qx
 9oyQACpcu6/oUKgeMSo52MDa82wUPPxpC6qdsefmQYaAcOKM5MD4SNd+eEnfX03E
 RAaItTW9az57a2BL9C/ejJO/SwY4Er+O8B3PO7GaKiURMSZa5nVlY+2QB2fJy6TA
 7IvSYjFD5E4risMbZgPFCqWkQ0yHbY7zEn/tbcNC5AFZoKv70jELPQTLPXq7UPLe
 BuKoL9VJyeXF7E1MQqQH33q3tfcwlIL++piCNHvTQoPadEba2dM=
 =Mezb
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Followup fixes for resilient spinlock (Kumar Kartikeya Dwivedi):
     - Make res_spin_lock test less verbose, since it was spamming BPF
       CI on failure, and make the check for AA deadlock stronger
     - Fix rebasing mistake and use architecture provided
       res_smp_cond_load_acquire
     - Convert BPF maps (queue_stack and ringbuf) to resilient spinlock
       to address long standing syzbot reports

 - Make sure that classic BPF load instruction from SKF_[NET|LL]_OFF
   offsets works when skb is fragmeneted (Willem de Bruijn)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Convert ringbuf map to rqspinlock
  bpf: Convert queue_stack map to rqspinlock
  bpf: Use architecture provided res_smp_cond_load_acquire
  selftests/bpf: Make res_spin_lock AA test condition stronger
  selftests/net: test sk_filter support for SKF_NET_OFF on frags
  bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
  selftests/bpf: Make res_spin_lock test less verbose
2025-04-12 12:48:10 -07:00
Nam Cao
8d7861ac50 rv: Fix out-of-bound memory access in rv_is_container_monitor()
When rv_is_container_monitor() is called on the last monitor in
rv_monitors_list, KASAN yells:

  BUG: KASAN: global-out-of-bounds in rv_is_container_monitor+0x101/0x110
  Read of size 8 at addr ffffffff97c7c798 by task setup/221

  The buggy address belongs to the variable:
   rv_monitors_list+0x18/0x40

This is due to list_next_entry() is called on the last entry in the list.
It wraps around to the first list_head, and the first list_head is not
embedded in struct rv_monitor_def.

Fix it by checking if the monitor is last in the list.

Cc: stable@vger.kernel.org
Cc: Gabriele Monaco <gmonaco@redhat.com>
Fixes: cb85c660fc ("rv: Add option for nested monitors and include sched")
Link: https://lore.kernel.org/e85b5eeb7228bfc23b8d7d4ab5411472c54ae91b.1744355018.git.namcao@linutronix.de
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-12 12:13:30 -04:00
Steven Rostedt
485acd207d ftrace: Do not have print_graph_retval() add a newline
The retval and retaddr options for function_graph tracer will add a
comment at the end of a function for both leaf and non leaf functions that
looks like:

               __wake_up_common(); /* ret=0x1 */

               } /* pick_next_task_fair ret=0x0 */

The function print_graph_retval() adds a newline after the "*/". But if
that's not called, the caller function needs to make sure there's a
newline added.

This is confusing and when the function parameters code was added, it
added a newline even when calling print_graph_retval() as the fact that
the print_graph_retval() function prints a newline isn't obvious.

This caused an extra newline to be printed and that made it fail the
selftests when the retval option was set, as the selftests were not
expecting blank lines being injected into the trace.

Instead of having print_graph_retval() print a newline, just have the
caller always print the newline regardless if it calls print_graph_retval()
or not. This not only fixes this bug, but it also simplifies the code.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250411133015.015ca393@gandalf.local.home
Reported-by: Mark Brown <broonie@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/ccc40f2b-4b9e-4abd-8daf-d22fce2a86f0@sirena.org.uk/
Fixes: ff5c9c576e ("ftrace: Add support for function argument to graph tracer")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-12 12:13:30 -04:00
Steven Rostedt
0ae6b8ce20 ftrace: Fix accounting of subop hashes
The function graph infrastructure uses ftrace to hook to functions. It has
a single ftrace_ops to manage all the users of function graph. Each
individual user (tracing, bpf, fprobes, etc) has its own ftrace_ops to
track the functions it will have its callback called from. These
ftrace_ops are "subops" to the main ftrace_ops of the function graph
infrastructure.

Each ftrace_ops has a filter_hash and a notrace_hash that is defined as:

  Only trace functions that are in the filter_hash but not in the
  notrace_hash.

If the filter_hash is empty, it means to trace all functions.
If the notrace_hash is empty, it means do not disable any function.

The function graph main ftrace_ops needs to be a superset containing all
the functions to be traced by all the subops it has. The algorithm to
perform this merge was incorrect.

When the first subops was added to the main ops, it simply made the main
ops a copy of the subops (same filter_hash and notrace_hash).

When a second ops was added, it joined the new subops filter_hash with the
main ops filter_hash as a union of the two sets. The intersect between the
new subops notrace_hash and the main ops notrace_hash was created as the
new notrace_hash of the main ops.

The issue here is that it would then start tracing functions than no
subops were tracing. For example if you had two subops that had:

subops 1:

  filter_hash = '*sched*' # trace all functions with "sched" in it
  notrace_hash = '*time*' # except do not trace functions with "time"

subops 2:

  filter_hash = '*lock*' # trace all functions with "lock" in it
  notrace_hash = '*clock*' # except do not trace functions with "clock"

The intersect of '*time*' functions with '*clock*' functions could be the
empty set. That means the main ops will be tracing all functions with
'*time*' and all "*clock*" in it!

Instead, modify the algorithm to be a bit simpler and correct.

First, when adding a new subops, even if it's the first one, do not add
the notrace_hash if the filter_hash is not empty. Instead, just add the
functions that are in the filter_hash of the subops but not in the
notrace_hash of the subops into the main ops filter_hash. There's no
reason to add anything to the main ops notrace_hash.

The notrace_hash of the main ops should only be non empty iff all subops
filter_hashes are empty (meaning to trace all functions) and all subops
notrace_hashes include the same functions.

That is, the main ops notrace_hash is empty if any subops filter_hash is
non empty.

The main ops notrace_hash only has content in it if all subops
filter_hashes are empty, and the content are only functions that intersect
all the subops notrace_hashes. If any subops notrace_hash is empty, then
so is the main ops notrace_hash.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/20250409152720.216356767@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-11 16:02:08 -04:00
Andy Chiu
04a80a34c2 ftrace: Properly merge notrace hashes
The global notrace hash should be jointly decided by the intersection of
each subops's notrace hash, but not the filter hash.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250408160258.48563-1-andybnac@gmail.com
Fixes: 5fccc7552c ("ftrace: Add subops logic to allow one ops to manage many")
Signed-off-by: Andy Chiu <andybnac@gmail.com>
[ fixed removing of freeing of filter_hash ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-11 15:14:54 -04:00
Kumar Kartikeya Dwivedi
a650d38915 bpf: Convert ringbuf map to rqspinlock
Convert the raw spinlock used by BPF ringbuf to rqspinlock. Currently,
we have an open syzbot report of a potential deadlock. In addition, the
ringbuf can fail to reserve spuriously under contention from NMI
context.

It is potentially attractive to enable unconstrained usage (incl. NMIs)
while ensuring no deadlocks manifest at runtime, perform the conversion
to rqspinlock to achieve this.

This change was benchmarked for BPF ringbuf's multi-producer contention
case on an Intel Sapphire Rapids server, with hyperthreading disabled
and performance governor turned on. 5 warm up runs were done for each
case before obtaining the results.

Before (raw_spinlock_t):

Ringbuf, multi-producer contention
==================================
rb-libbpf nr_prod 1  11.440 ± 0.019M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 2  2.706 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 3  3.130 ± 0.004M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 4  2.472 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 8  2.352 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 12 2.813 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 16 1.988 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 20 2.245 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 24 2.148 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 28 2.190 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 32 2.490 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 36 2.180 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 40 2.201 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 44 2.226 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 48 2.164 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 52 1.874 ± 0.001M/s (drops 0.000 ± 0.000M/s)

After (rqspinlock_t):

Ringbuf, multi-producer contention
==================================
rb-libbpf nr_prod 1  11.078 ± 0.019M/s (drops 0.000 ± 0.000M/s) (-3.16%)
rb-libbpf nr_prod 2  2.801 ± 0.014M/s (drops 0.000 ± 0.000M/s) (3.51%)
rb-libbpf nr_prod 3  3.454 ± 0.005M/s (drops 0.000 ± 0.000M/s) (10.35%)
rb-libbpf nr_prod 4  2.567 ± 0.002M/s (drops 0.000 ± 0.000M/s) (3.84%)
rb-libbpf nr_prod 8  2.468 ± 0.001M/s (drops 0.000 ± 0.000M/s) (4.93%)
rb-libbpf nr_prod 12 2.510 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-10.77%)
rb-libbpf nr_prod 16 2.075 ± 0.001M/s (drops 0.000 ± 0.000M/s) (4.38%)
rb-libbpf nr_prod 20 2.640 ± 0.001M/s (drops 0.000 ± 0.000M/s) (17.59%)
rb-libbpf nr_prod 24 2.092 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-2.61%)
rb-libbpf nr_prod 28 2.426 ± 0.005M/s (drops 0.000 ± 0.000M/s) (10.78%)
rb-libbpf nr_prod 32 2.331 ± 0.004M/s (drops 0.000 ± 0.000M/s) (-6.39%)
rb-libbpf nr_prod 36 2.306 ± 0.003M/s (drops 0.000 ± 0.000M/s) (5.78%)
rb-libbpf nr_prod 40 2.178 ± 0.002M/s (drops 0.000 ± 0.000M/s) (-1.04%)
rb-libbpf nr_prod 44 2.293 ± 0.001M/s (drops 0.000 ± 0.000M/s) (3.01%)
rb-libbpf nr_prod 48 2.022 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-6.56%)
rb-libbpf nr_prod 52 1.809 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-3.47%)

There's a fair amount of noise in the benchmark, with numbers on reruns
going up and down by 10%, so all changes are in the range of this
disturbance, and we see no major regressions.

Reported-by: syzbot+850aaf14624dc0c6d366@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000004aa700061379547e@google.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250411101759.4061366-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-11 10:28:26 -07:00
Linus Torvalds
34833819d2 Miscellaneous timer fixes:
- Fix missing ACCESS_PRIVATE() that triggered a Sparse warning
 
  - Fix lockdep false positive in tick_freeze() on CONFIG_PREEMPT_RT=y
 
  - Avoid <vdso/unaligned.h> macro's variable shadowing to address build
    warning that triggers under W=2 builds
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmf4OQURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jKmQ/+MyaBxoFwUknE9IcFb7E2TGI4dJew0JeJ
 zSwhkvve63jo0B8//BzdVLN5BlYsYhKwkHn3KT5GJ258PlO3j9vbwadboHVCGQ/E
 3bCRWvMvE2B1p+nve67WXnm5+s8+3O/Y+FEWfY5r94M/SUhun0uN4CiJ8WTruqAk
 zoX7fwnX7f1EmDNZmCwWrTt1iSisxFQM0C5ELOv4n7Zs9KKDw623bhKtoxPgVwW+
 5uVRua6fdz19IV7j2N5pon3Hu5qbgFEP3K9elGMxPQfHfSoySZReqaU6a+3xmDje
 QrhT20N8NTXs3+Xkjc6vxg8jzBo7hsx2qKv0dAvF9tIXxD7L7eXwZwSQTk6QOl9O
 T5bcoMY2aIY01khlN7xDNIdC+7OEDJGZU+F1hqWkX+teSNuUwyf/MBlOw24dfqT3
 Dh5B9BVLfnTFqd6RHNieMv/kcemA9S2im7gleBrr7IkjGAs5KTH+WcV2UcNvUJ76
 j1oVFvF6GVrNWuQSJrRvWyi1JZz7lZxR71elUqtlow2DcVsz8PoiDywmnvG3LeuA
 OH1bXWwS6Rq7biyfZYFxsj/+BBS5ShPcwvkL0gw2hlXE9VB5X5gzi5TzSiDcMXQl
 e+eWKhHL4lIlU8s1prQKrVgXMd0OFFYE7lTKIrGLRt3mvp4U9gKmm7hD6s4SBTC4
 qWScmmIGAyo=
 =Yv6j
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc timer fixes from Ingo Molnar:

 - Fix missing ACCESS_PRIVATE() that triggered a Sparse warning

 - Fix lockdep false positive in tick_freeze() on CONFIG_PREEMPT_RT=y

 - Avoid <vdso/unaligned.h> macro's variable shadowing to address build
   warning that triggers under W=2 builds

* tag 'timers-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  vdso: Address variable shadowing in macros
  timekeeping: Add a lockdep override in tick_freeze()
  hrtimer: Add missing ACCESS_PRIVATE() for hrtimer::function
2025-04-10 15:39:39 -07:00
Linus Torvalds
ac253a537d Miscellaneous perf events fixes:
- Fix __free_event() corner case splat
 
  - Fix false-positive uprobes related lockdep
    splat on CONFIG_PREEMPT_RT=y kernels
 
  - Fix a complicated perf sigtrap race that may
    result in hangs
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmf4M0MRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jAag//RwS5kSQJsEyzzTVqeEt7bpHlzwcNeJ2y
 mu0bNYr1iS8/a+mMiW+XNKldWFT6Z9BvcPelrYsxRVePD7/qjSXwV0ln0+hXjqm6
 aIMG/AQEIHSXaKq/hmMOpZqg+VTG9kum7nGNqRKpkc4CT9KPRJSvzx5tBf4Y6iP1
 h7I1d/Z9mdYkMhNgjq3l/Le6yw6I9PCpXHbgLA3XHNCmWCGg4TF4pOU8ad+kBmur
 QkMB2A4uRr/mhi4DCbde5gXvPig4GTSWWdyivwle7Llof56I4KrrLb4HbPMtdIkR
 w+wHl+d5hqXjnC4Wh9IxYbGGPflQ4N4vzTJBRhomgtZ5tlaZKfbRQLsl4PGNkUq5
 Lz6KZEDY2686brap20wkxQu5SNmyXCA/H/ryZswZfg+ooCmMpWIR8r11NCJgvc3l
 +o2vXIIs++LmDuY5ZQ6MEshC7213I9cKp4MIO8KLMrfSWkgWXHKqZo6HI7q+kJgm
 Hpz75Bm+PPFkWftq9zCUz7D/N/aLbh8kK0XArtmKNSkNAeIO/3IgGVfHfmzD0QZr
 Fwu9sC5FYVyanvwHVbUFzrKlXmqaghYn7H4ShPEBNmWTpaEO38RssPzt4uozggj1
 mjbTQ6rHN3r1UP5bF5TIiXVPcO7r0KcuYuKnZoAYUSHQmap5Bb3ghqx3eMdJOdmx
 yaIaJNoYduw=
 =V5bo
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc perf events fixes from Ingo Molnar:

 - Fix __free_event() corner case splat

 - Fix false-positive uprobes related lockdep splat on
   CONFIG_PREEMPT_RT=y kernels

 - Fix a complicated perf sigtrap race that may result in hangs

* tag 'perf-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix hang while freeing sigtrap event
  uprobes: Avoid false-positive lockdep splat on CONFIG_PREEMPT_RT=y in the ri_timer() uprobe timer callback, use raw_write_seqcount_*()
  perf/core: Fix WARN_ON(!ctx) in __free_event() for partial init
2025-04-10 14:47:36 -07:00
Kumar Kartikeya Dwivedi
2f41503d64 bpf: Convert queue_stack map to rqspinlock
Replace all usage of raw_spinlock_t in queue_stack_maps.c with
rqspinlock. This is a map type with a set of open syzbot reports
reproducing possible deadlocks. Prior attempt to fix the issues
was at [0], but was dropped in favor of this approach.

Make sure we return the -EBUSY error in case of possible deadlocks or
timeouts, just to make sure user space or BPF programs relying on the
error code to detect problems do not break.

With these changes, the map should be safe to access in any context,
including NMIs.

  [0]: https://lore.kernel.org/all/20240429165658.1305969-1-sidchintamaneni@gmail.com

Reported-by: syzbot+8bdfc2c53fb2b63e1871@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000004c3fc90615f37756@google.com
Reported-by: syzbot+252bc5c744d0bba917e1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000c80abd0616517df9@google.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250410153142.2064340-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-10 12:51:10 -07:00