Commit Graph

19201 Commits

Author SHA1 Message Date
Slava Imameev
f8b19aeca1 selftests/bpf: Add test for bpftool access to read-only protected maps
Add selftest cases that validate bpftool's expected behavior when
accessing maps protected from modification via security_bpf_map.

The test includes a BPF program attached to security_bpf_map with two maps:
- A protected map that only allows read-only access
- An unprotected map that allows full access

The test script attaches the BPF program to security_bpf_map and
verifies that for the bpftool map command:
- Read access works on both maps
- Write access fails on the protected map
- Write access succeeds on the unprotected map
- These behaviors remain consistent when the maps are pinned

Signed-off-by: Slava Imameev <slava.imameev@crowdstrike.com>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20250620151812.13952-2-slava.imameev@crowdstrike.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-20 11:13:03 -07:00
Eduard Zingerman
cd7312a78f selftests/bpf: include limits.h needed for PATH_MAX directly
Constant PATH_MAX is used in function unpriv_helpers.c:open_config().
This constant is provided by include file <limits.h>.
The dependency was added by commit [1], which does not include
<limits.h> directly, relying instead on <limits.h> being included from
zlib.h -> zconf.h.
As it turns out, this is not the case for all systems, e.g. on
Fedora 41 zlib 1.3.1 is used, and there <limits.h> is not included
from zconf.h. Hence, there is a compilation error on Fedora 41.

[1] commit fc2915bb8b ("selftests/bpf: More precise cpu_mitigations state detection")

Fixes: fc2915bb8b ("selftests/bpf: More precise cpu_mitigations state detection")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/20250618093134.3078870-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-18 06:52:07 -07:00
James Bottomley
bd07bd12f2 bpf: Fix key serial argument of bpf_lookup_user_key()
The underlying lookup_user_key() function uses a signed 32 bit integer
for key serial numbers because legitimate serial numbers are positive
(and > 3) and keyrings are negative.  Using a u32 for the keyring in
the bpf function doesn't currently cause any conversion problems but
will start to trip the signed to unsigned conversion warnings when the
kernel enables them, so convert the argument to signed (and update the
tests accordingly) before it acquires more users.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Link: https://lore.kernel.org/r/84cdb0775254d297d75e21f577089f64abdfbd28.camel@HansenPartnership.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-17 18:15:27 -07:00
Mykyta Yatsenko
66ab68c9de selftests/bpf: Fix unintentional switch case fall through
Break from switch expression after parsing -n CLI argument in veristat,
instead of falling through and enabling comparison mode.

Fixes: a5c57f81eb ("veristat: add ability to set BPF_F_TEST_SANITY_STRICT flag with -r flag")
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250617121536.1320074-1-mykyta.yatsenko5@gmail.com
2025-06-17 13:24:31 -07:00
Eduard Zingerman
fc2915bb8b selftests/bpf: More precise cpu_mitigations state detection
test_progs and test_verifier binaries execute unpriv tests under the
following conditions:
- unpriv BPF is enabled;
- CPU mitigations are enabled (see [1] for details).

The detection of the "mitigations enabled" state is performed by
unpriv_helpers.c:get_mitigations_off() via inspecting kernel boot
command line, looking for a parameter "mitigations=off".

Such detection scheme won't work for certain configurations,
e.g. when CONFIG_CPU_MITIGATIONS is disabled and boot parameter is
not supplied.

Miss-detection leads to test_progs executing tests meant to be run
only with mitigations enabled, e.g.
verifier_and.c:known_subreg_with_unknown_reg(), and reporting false
failures.

Internally, verifier sets bpf_verifier_env->bypass_spec_{v1,v4}
basing on the value returned by kernel/cpu.c:cpu_mitigations_off().
This function is backed by a variable kernel/cpu.c:cpu_mitigations.

This state is not fully introspect-able via sysfs. The closest proxy
is /sys/devices/system/cpu/vulnerabilities/spectre_v1, but it reports
"vulnerable" state only if mitigations are disabled *and* current cpu
is vulnerable, while verifier does not check cpu state.

There are only two ways the kernel/cpu.c:cpu_mitigations can be set:
- via boot parameter;
- via CONFIG_CPU_MITIGATIONS option.

This commit updates unpriv_helpers.c:get_mitigations_off() to scan
/boot/config-$(uname -r) and /proc/config.gz for
CONFIG_CPU_MITIGATIONS value in addition to boot command line check.

Tested using the following configurations:
- mitigations enabled (unpriv tests are enabled)
- mitigations disabled via boot cmdline (unpriv tests skipped)
- mitigations disabled via CONFIG_CPU_MITIGATIONS
  (unpriv tests skipped)

[1] https://lore.kernel.org/bpf/20231025031144.5508-1-laoar.shao@gmail.com/

Reported-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250617005710.1066165-2-eddyz87@gmail.com
2025-06-17 13:23:49 -07:00
Yonghong Song
a633dab4b4 selftests/bpf: Fix RELEASE build failure with gcc14
With gcc14, when building with RELEASE=1, I hit four below compilation
failure:

Error 1:
  In file included from test_loader.c:6:
  test_loader.c: In function ‘run_subtest’: test_progs.h:194:17:
      error: ‘retval’ may be used uninitialized in this function
   [-Werror=maybe-uninitialized]
    194 |                 fprintf(stdout, ##format);           \
        |                 ^~~~~~~
  test_loader.c:958:13: note: ‘retval’ was declared here
    958 |         int retval, err, i;
        |             ^~~~~~

  The uninitialized var 'retval' actually could cause incorrect result.

Error 2:
  In function ‘test_fd_array_cnt’:
  prog_tests/fd_array.c:71:14: error: ‘btf_id’ may be used uninitialized in this
      function [-Werror=maybe-uninitialized]
     71 |         fd = bpf_btf_get_fd_by_id(id);
        |              ^~~~~~~~~~~~~~~~~~~~~~~~
  prog_tests/fd_array.c:302:15: note: ‘btf_id’ was declared here
    302 |         __u32 btf_id;
        |               ^~~~~~

  Changing ASSERT_GE to ASSERT_EQ can fix the compilation error. Otherwise,
  there is no functionality change.

Error 3:
  prog_tests/tailcalls.c: In function ‘test_tailcall_hierarchy_count’:
  prog_tests/tailcalls.c:1402:23: error: ‘fentry_data_fd’ may be used uninitialized
      in this function [-Werror=maybe-uninitialized]
     1402 |                 err = bpf_map_lookup_elem(fentry_data_fd, &i, &val);
          |                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  The code is correct. The change intends to silence gcc errors.

Error 4: (this error only happens on arm64)
  In file included from prog_tests/log_buf.c:4:
  prog_tests/log_buf.c: In function ‘bpf_prog_load_log_buf’:
  ./test_progs.h:390:22: error: ‘log_buf’ may be used uninitialized [-Werror=maybe-uninitialized]
    390 |         int ___err = libbpf_get_error(___res);             \
        |                      ^~~~~~~~~~~~~~~~~~~~~~~~
  prog_tests/log_buf.c:158:14: note: in expansion of macro ‘ASSERT_OK_PTR’
    158 |         if (!ASSERT_OK_PTR(log_buf, "log_buf_alloc"))
        |              ^~~~~~~~~~~~~
  In file included from selftests/bpf/tools/include/bpf/bpf.h:32,
                 from ./test_progs.h:36:
  selftests/bpf/tools/include/bpf/libbpf_legacy.h:113:17:
    note: by argument 1 of type ‘const void *’ to ‘libbpf_get_error’ declared here
    113 | LIBBPF_API long libbpf_get_error(const void *ptr);
        |                 ^~~~~~~~~~~~~~~~

  Adding a pragma to disable maybe-uninitialized fixed the issue.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250617044956.2686668-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-17 10:18:30 -07:00
Eduard Zingerman
4a4b84ba9e selftests/bpf: verify jset handling in CFG computation
A test case to check if both branches of jset are explored when
computing program CFG.

At 'if r1 & 0x7 ...':
- register 'r2' is computed alive only if jump branch of jset
  instruction is followed;
- register 'r0' is computed alive only if fallthrough branch of jset
  instruction is followed.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250613175331.3238739-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-13 11:51:19 -07:00
Eduard Zingerman
67cdcc405b veristat: Memory accounting for bpf programs
This commit adds a new field mem_peak / "Peak memory (MiB)" field to a
set of gathered statistics. The field is intended as an estimate for
peak verifier memory consumption for processing of a given program.

Mechanically stat is collected as follows:
- At the beginning of handle_verif_mode() a new cgroup is created
  and veristat process is moved into this cgroup.
- At each program load:
  - bpf_object__load() is split into bpf_object__prepare() and
    bpf_object__load() to avoid accounting for memory allocated for
    maps;
  - before bpf_object__load():
    - a write to "memory.peak" file of the new cgroup is used to reset
      cgroup statistics;
    - updated value is read from "memory.peak" file and stashed;
  - after bpf_object__load() "memory.peak" is read again and
    difference between new and stashed values is used as a metric.

If any of the above steps fails veristat proceeds w/o collecting
mem_peak information for a program, reporting mem_peak as -1.

While memcg provides data in bytes (converted from pages), veristat
converts it to megabytes to avoid jitter when comparing results of
different executions.

The change has no measurable impact on veristat running time.

A correlation between "Peak states" and "Peak memory" fields provides
a sanity check for gathered statistics, e.g. a sample of data for
sched_ext programs:

Program                   Peak states  Peak memory (MiB)
------------------------  -----------  -----------------
lavd_select_cpu                  2153                 44
lavd_enqueue                     1982                 41
lavd_dispatch                    3480                 28
layered_dispatch                 1417                 17
layered_enqueue                   760                 11
lavd_cpu_offline                  349                  6
lavd_cpu_online                   349                  6
lavd_init                         394                  6
rusty_init                        350                  5
layered_select_cpu                391                  4
...
rusty_stopping                    134                  1
arena_topology_node_init          170                  0

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250613072147.3938139-3-eddyz87@gmail.com
2025-06-13 10:44:12 -07:00
Song Liu
ccefa19335 bpf/veristat: Fix veristat for map type BPF_MAP_TYPE_CGRP_STORAGE
BPF_MAP_TYPE_CGRP_STORAGE doesn't allow non-zero max_entries. So veristat
should not set it to 1.

Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250613050001.1058733-1-song@kernel.org
2025-06-13 10:08:22 -07:00
Yonghong Song
44df9e0d4e selftests/bpf: Fix xdp_do_redirect failure with 64KB page size
On arm64 with 64KB page size, the selftest xdp_do_redirect failed like
below:
  ...
  test_xdp_do_redirect:PASS:pkt_count_tc 0 nsec
  test_max_pkt_size:PASS:prog_run_max_size 0 nsec
  test_max_pkt_size:FAIL:prog_run_too_big unexpected prog_run_too_big: actual -28 != expected -22

With 64KB page size, the xdp frame size will be much bigger so
the existing test will fail.

Adjust various parameters so the test can also work on 64K page size.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612035042.2208630-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 19:07:51 -07:00
Yonghong Song
96fcf7e7a7 selftests/bpf: Fix two net related test failures with 64K page size
When running BPF selftests on arm64 with a 64K page size, I encountered
the following two test failures:
  sockmap_basic/sockmap skb_verdict change tail:FAIL
  tc_change_tail:FAIL

With further debugging, I identified the root cause in the following
kernel code within __bpf_skb_change_tail():

    u32 max_len = BPF_SKB_MAX_LEN;
    u32 min_len = __bpf_skb_min_len(skb);
    int ret;

    if (unlikely(flags || new_len > max_len || new_len < min_len))
        return -EINVAL;

With a 4K page size, new_len = 65535 and max_len = 16064, the function
returns -EINVAL. However, With a 64K page size, max_len increases to
261824, allowing execution to proceed further in the function. This is
because BPF_SKB_MAX_LEN scales with the page size and larger page sizes
result in higher max_len values.

Updating the new_len parameter in both tests based on actual kernel
page size resolved both failures.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612035037.2207911-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 19:07:51 -07:00
Yonghong Song
4fc012daf9 bpf: Fix an issue in bpf_prog_test_run_xdp when page size greater than 4K
The bpf selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow failed on
arm64 with 64KB page:
   xdp_adjust_tail/xdp_adjust_frags_tail_grow:FAIL

In bpf_prog_test_run_xdp(), the xdp->frame_sz is set to 4K, but later on
when constructing frags, with 64K page size, the frag data_len could
be more than 4K. This will cause problems in bpf_xdp_frags_increase_tail().

To fix the failure, the xdp->frame_sz is set to be PAGE_SIZE so kernel
can test different page size properly. With the kernel change, the user
space and bpf prog needs adjustment. Currently, the MAX_SKB_FRAGS default
value is 17, so for 4K page, the maximum packet size will be less than 68K.
To test 64K page, a bigger maximum packet size than 68K is desired. So two
different functions are implemented for subtest xdp_adjust_frags_tail_grow.
Depending on different page size, different data input/output sizes are used
to adapt with different page size.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612035032.2207498-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 19:07:51 -07:00
Fushuai Wang
6a4bd31f68 selftests/bpf: fix signedness bug in redir_partial()
When xsend() returns -1 (error), the check 'n < sizeof(buf)' incorrectly
treats it as success due to unsigned promotion. Explicitly check for -1
first.

Fixes: a4b7193d8e ("selftests/bpf: Add sockmap test for redirecting partial skb data")
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Link: https://lore.kernel.org/r/20250612084208.27722-1-wangfushuai@baidu.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:43 -07:00
Eduard Zingerman
5159482fdb selftests/bpf: tests with a loop state missing read/precision mark
The test case absent_mark_in_the_middle_state is equivalent of the
following C program:

   1: r8 = bpf_get_prandom_u32();
   2: r6 = -32;
   3: bpf_iter_num_new(&fp[-8], 0, 10);
   4: if (unlikely(bpf_get_prandom_u32()))
   5:   r6 = -31;
   6: for (;;) {
   7:   if (!bpf_iter_num_next(&fp[-8]))
   8:     break;
   9:   if (unlikely(bpf_get_prandom_u32()))
  10:     *(u64 *)(fp + r6) = 7;
  11: }
  12: bpf_iter_num_destroy(&fp[-8]);
  13: return 0;

W/o a fix that instructs verifier to ignore branches count for loop
entries verification proceeds as follows:
- 1-4, state is {r6=-32,fp-8=active};
- 6, checkpoint A is created with {r6=-32,fp-8=active};
- 7, checkpoint B is created with {r6=-32,fp-8=active},
     push state {r6=-32,fp-8=active} from 7 to 9;
- 8,12,13, {r6=-32,fp-8=drained}, exit;
- pop state with {r6=-32,fp-8=active} from 7 to 9;
- 9, push state {r6=-32,fp-8=active} from 9 to 10;
- 6, checkpoint C is created with {r6=-32,fp-8=active};
- 7, checkpoint A is hit, no precision propagated for r6 to C;
- pop state {r6=-32,fp-8=active} from 9 to 10;
- 10, state is {r6=-31,fp-8=active}, r6 is marked as read and precise,
      these marks are propagated to checkpoints A and B (but not C, as
      it is not the parent of current state;
- 6, {r6=-31,fp-8=active} checkpoint C is hit, because r6 is not
     marked precise for this checkpoint;
- the program is accepted, despite a possibility of unaligned u64
  stack access at offset -31.

The test case absent_mark_in_the_middle_state2 is similar except the
following change:

       r8 = bpf_get_prandom_u32();
       r6 = -32;
       bpf_iter_num_new(&fp[-8], 0, 10);
       if (unlikely(bpf_get_prandom_u32())) {
         r6 = -31;
 + jump_into_loop:
 +       goto +0;
 +       goto loop;
 +     }
 +     if (unlikely(bpf_get_prandom_u32()))
 +       goto jump_into_loop;
 + loop:
       for (;;) {
         if (!bpf_iter_num_next(&fp[-8]))
           break;
         if (unlikely(bpf_get_prandom_u32()))
           *(u64 *)(fp + r6) = 7;
       }
       bpf_iter_num_destroy(&fp[-8])
       return 0

The goal is to check that read/precision marks are propagated to
checkpoint created at 'goto +0' that resides outside of the loop.

The test case absent_mark_in_the_middle_state3 is a bit different and
is equivalent to the C program below:

    int absent_mark_in_the_middle_state3(void)
    {
      bpf_iter_num_new(&fp[-8], 0, 10)
      loop1(-32, &fp[-8])
      loop1_wrapper(&fp[-8])
      bpf_iter_num_destroy(&fp[-8])
    }

    int loop1(num, iter)
    {
      while (bpf_iter_num_next(iter)) {
        if (unlikely(bpf_get_prandom_u32()))
          *(fp + num) = 7;
      }
      return 0
    }

    int loop1_wrapper(iter)
    {
      r6 = -32;
      if (unlikely(bpf_get_prandom_u32()))
        r6 = -31;
      loop1(r6, iter);
      return 0;
    }

The unsafe state is reached in a similar manner, but the loop is
located inside a subprogram that is called from two locations in the
main subprogram. This detail is important for exercising
bpf_scc_visit->backedges memory management.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611200836.4135542-11-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:43 -07:00
Yonghong Song
517b088a84 selftests/bpf: Fix cgroup_mprog_ordering failure due to uninitialized variable
On arm64, the cgroup_mprog_ordering selftest failed with test_progs run
when building with clang compiler. The reason is due to socklen_t optlen
not initialized.

In kernel function do_ip_getsockopt(), we have

        if (copy_from_sockptr(&len, optlen, sizeof(int)))
                return -EFAULT;
        if (len < 0)
                return -EINVAL;

The above 'len' variable is a negative value and hence the test failed.

But the test is okay on x86_64. I checked the x86_64 asm code and I didn't
see explicit initialization of 'optlen' but its value is 0 so kernel
didn't return error. This should be a pure luck.

Fix the bug by initializing 'oplen' var properly.

Fixes: e422d5f118 ("selftests/bpf: Add two selftests for mprog API based cgroup progs")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250611162103.1623692-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-11 12:32:50 -07:00
Luis Gerhorst
4a8765d9a5 selftests/bpf: Add test for Spectre v1 mitigation
This is based on the gadget from the description of commit 9183671af6db
("bpf: Fix leakage under speculation on mispredicted branches").

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250603212814.338867-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09 20:11:10 -07:00
Luis Gerhorst
d6f1c85f22 bpf: Fall back to nospec for Spectre v1
This implements the core of the series and causes the verifier to fall
back to mitigating Spectre v1 using speculation barriers. The approach
was presented at LPC'24 [1] and RAID'24 [2].

If we find any forbidden behavior on a speculative path, we insert a
nospec (e.g., lfence speculation barrier on x86) before the instruction
and stop verifying the path. While verifying a speculative path, we can
furthermore stop verification of that path whenever we encounter a
nospec instruction.

A minimal example program would look as follows:

	A = true
	B = true
	if A goto e
	f()
	if B goto e
	unsafe()
e:	exit

There are the following speculative and non-speculative paths
(`cur->speculative` and `speculative` referring to the value of the
push_stack() parameters):

- A = true
- B = true
- if A goto e
  - A && !cur->speculative && !speculative
    - exit
  - !A && !cur->speculative && speculative
    - f()
    - if B goto e
      - B && cur->speculative && !speculative
        - exit
      - !B && cur->speculative && speculative
        - unsafe()

If f() contains any unsafe behavior under Spectre v1 and the unsafe
behavior matches `state->speculative &&
error_recoverable_with_nospec(err)`, do_check() will now add a nospec
before f() instead of rejecting the program:

	A = true
	B = true
	if A goto e
	nospec
	f()
	if B goto e
	unsafe()
e:	exit

Alternatively, the algorithm also takes advantage of nospec instructions
inserted for other reasons (e.g., Spectre v4). Taking the program above
as an example, speculative path exploration can stop before f() if a
nospec was inserted there because of Spectre v4 sanitization.

In this example, all instructions after the nospec are dead code (and
with the nospec they are also dead code speculatively).

For this, it relies on the fact that speculation barriers generally
prevent all later instructions from executing if the speculation was not
correct:

* On Intel x86_64, lfence acts as full speculation barrier, not only as
  a load fence [3]:

    An LFENCE instruction or a serializing instruction will ensure that
    no later instructions execute, even speculatively, until all prior
    instructions complete locally. [...] Inserting an LFENCE instruction
    after a bounds check prevents later operations from executing before
    the bound check completes.

  This was experimentally confirmed in [4].

* On AMD x86_64, lfence is dispatch-serializing [5] (requires MSR
  C001_1029[1] to be set if the MSR is supported, this happens in
  init_amd()). AMD further specifies "A dispatch serializing instruction
  forces the processor to retire the serializing instruction and all
  previous instructions before the next instruction is executed" [8]. As
  dispatch is not specific to memory loads or branches, lfence therefore
  also affects all instructions there. Also, if retiring a branch means
  it's PC change becomes architectural (should be), this means any
  "wrong" speculation is aborted as required for this series.

* ARM's SB speculation barrier instruction also affects "any instruction
  that appears later in the program order than the barrier" [6].

* PowerPC's barrier also affects all subsequent instructions [7]:

    [...] executing an ori R31,R31,0 instruction ensures that all
    instructions preceding the ori R31,R31,0 instruction have completed
    before the ori R31,R31,0 instruction completes, and that no
    subsequent instructions are initiated, even out-of-order, until
    after the ori R31,R31,0 instruction completes. The ori R31,R31,0
    instruction may complete before storage accesses associated with
    instructions preceding the ori R31,R31,0 instruction have been
    performed

Regarding the example, this implies that `if B goto e` will not execute
before `if A goto e` completes. Once `if A goto e` completes, the CPU
should find that the speculation was wrong and continue with `exit`.

If there is any other path that leads to `if B goto e` (and therefore
`unsafe()`) without going through `if A goto e`, then a nospec will
still be needed there. However, this patch assumes this other path will
be explored separately and therefore be discovered by the verifier even
if the exploration discussed here stops at the nospec.

This patch furthermore has the unfortunate consequence that Spectre v1
mitigations now only support architectures which implement BPF_NOSPEC.
Before this commit, Spectre v1 mitigations prevented exploits by
rejecting the programs on all architectures. Because some JITs do not
implement BPF_NOSPEC, this patch therefore may regress unpriv BPF's
security to a limited extent:

* The regression is limited to systems vulnerable to Spectre v1, have
  unprivileged BPF enabled, and do NOT emit insns for BPF_NOSPEC. The
  latter is not the case for x86 64- and 32-bit, arm64, and powerpc
  64-bit and they are therefore not affected by the regression.
  According to commit a6f6a95f25 ("LoongArch, bpf: Fix jit to skip
  speculation barrier opcode"), LoongArch is not vulnerable to Spectre
  v1 and therefore also not affected by the regression.

* To the best of my knowledge this regression may therefore only affect
  MIPS. This is deemed acceptable because unpriv BPF is still disabled
  there by default. As stated in a previous commit, BPF_NOSPEC could be
  implemented for MIPS based on GCC's speculation_barrier
  implementation.

* It is unclear which other architectures (besides x86 64- and 32-bit,
  ARM64, PowerPC 64-bit, LoongArch, and MIPS) supported by the kernel
  are vulnerable to Spectre v1. Also, it is not clear if barriers are
  available on these architectures. Implementing BPF_NOSPEC on these
  architectures therefore is non-trivial. Searching GCC and the kernel
  for speculation barrier implementations for these architectures
  yielded no result.

* If any of those regressed systems is also vulnerable to Spectre v4,
  the system was already vulnerable to Spectre v4 attacks based on
  unpriv BPF before this patch and the impact is therefore further
  limited.

As an alternative to regressing security, one could still reject
programs if the architecture does not emit BPF_NOSPEC (e.g., by removing
the empty BPF_NOSPEC-case from all JITs except for LoongArch where it
appears justified). However, this will cause rejections on these archs
that are likely unfounded in the vast majority of cases.

In the tests, some are now successful where we previously had a
false-positive (i.e., rejection). Change them to reflect where the
nospec should be inserted (using __xlated_unpriv) and modify the error
message if the nospec is able to mitigate a problem that previously
shadowed another problem (in that case __xlated_unpriv does not work,
therefore just add a comment).

Define SPEC_V1 to avoid duplicating this ifdef whenever we check for
nospec insns using __xlated_unpriv, define it here once. This also
improves readability. PowerPC can probably also be added here. However,
omit it for now because the BPF CI currently does not include a test.

Limit it to EPERM, EACCES, and EINVAL (and not everything except for
EFAULT and ENOMEM) as it already has the desired effect for most
real-world programs. Briefly went through all the occurrences of EPERM,
EINVAL, and EACCESS in verifier.c to validate that catching them like
this makes sense.

Thanks to Dustin for their help in checking the vendor documentation.

[1] https://lpc.events/event/18/contributions/1954/ ("Mitigating
    Spectre-PHT using Speculation Barriers in Linux eBPF")
[2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and
    Precise Spectre Defenses for Untrusted Linux Kernel Extensions")
[3] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/runtime-speculative-side-channel-mitigations.html
    ("Managed Runtime Speculative Execution Side Channel Mitigations")
[4] https://dl.acm.org/doi/pdf/10.1145/3359789.3359837 ("Speculator: a
    tool to analyze speculative execution attacks and mitigations" -
    Section 4.6 "Stopping Speculative Execution")
[5] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/software-techniques-for-managing-speculation.pdf
    ("White Paper - SOFTWARE TECHNIQUES FOR MANAGING SPECULATION ON AMD
    PROCESSORS - REVISION 5.09.23")
[6] https://developer.arm.com/documentation/ddi0597/2020-12/Base-Instructions/SB--Speculation-Barrier-
    ("SB - Speculation Barrier - Arm Armv8-A A32/T32 Instruction Set
    Architecture (2020-12)")
[7] https://wiki.raptorcs.com/w/images/5/5f/OPF_PowerISA_v3.1C.pdf
    ("Power ISA™ - Version 3.1C - May 26, 2024 - Section 9.2.1 of Book
    III")
[8] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf
    ("AMD64 Architecture Programmer’s Manual Volumes 1–5 - Revision 4.08
    - April 2024 - 7.6.4 Serializing Instructions")

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Henriette Herzog <henriette.herzog@rub.de>
Cc: Dustin Nguyen <nguyen@cs.fau.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603212428.338473-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09 20:11:10 -07:00
Tao Chen
d77efc0ef5 selftests/bpf: Add cookies check for tracing fill_link_info test
Adding tests for getting cookie with fill_link_info for tracing.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-2-chen.dylane@linux.dev
2025-06-09 16:45:17 -07:00
Ihor Solodrai
260b862918 selftests/bpf: Add test cases with CONST_PTR_TO_MAP null checks
A test requires the following to happen:
  * CONST_PTR_TO_MAP value is checked for null
  * the code in the null branch fails verification

Add test cases:
* direct global map_ptr comparison to null
* lookup inner map, then two checks (the first transforms
  map_value_or_null into map_ptr)
* lookup inner map, spill-fill it, then check for null
* use an array of ringbufs to recreate a common coding pattern [1]

[1] https://lore.kernel.org/bpf/CAEf4BzZNU0gX_sQ8k8JaLe1e+Veth3Rk=4x7MDhv=hQxvO8EDw@mail.gmail.com/

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-4-isolodrai@meta.com
2025-06-09 16:42:04 -07:00
Ihor Solodrai
eb6c992784 selftests/bpf: Add cmp_map_pointer_with_const test
Add a test for CONST_PTR_TO_MAP comparison with a non-0 constant. A
BPF program with this code must not pass verification in unpriv.

Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-3-isolodrai@meta.com
2025-06-09 16:42:04 -07:00
Ihor Solodrai
5534e58f2e bpf: Make reg_not_null() true for CONST_PTR_TO_MAP
When reg->type is CONST_PTR_TO_MAP, it can not be null. However the
verifier explores the branches under rX == 0 in check_cond_jmp_op()
even if reg->type is CONST_PTR_TO_MAP, because it was not checked for
in reg_not_null().

Fix this by adding CONST_PTR_TO_MAP to the set of types that are
considered non nullable in reg_not_null().

An old "unpriv: cmp map pointer with zero" selftest fails with this
change, because now early out correctly triggers in
check_cond_jmp_op(), making the verification to pass.

In practice verifier may allow pointer to null comparison in unpriv,
since in many cases the relevant branch and comparison op are removed
as dead code. So change the expected test result to __success_unpriv.

Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-2-isolodrai@meta.com
2025-06-09 16:42:04 -07:00
Yonghong Song
e422d5f118 selftests/bpf: Add two selftests for mprog API based cgroup progs
Two tests are added:
  - cgroup_mprog_opts, which mimics tc_opts.c ([1]). Both prog and link
    attach are tested. Some negative tests are also included.
  - cgroup_mprog_ordering, which actually runs the program with some mprog
    API flags.

  [1] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/tc_opts.c

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163156.2429955-1-yonghong.song@linux.dev
2025-06-09 16:28:31 -07:00
Yonghong Song
c1bb68656b selftests/bpf: Move some tc_helpers.h functions to test_progs.h
Move static inline functions id_from_prog_fd() and id_from_link_fd()
from prog_tests/tc_helpers.h to test_progs.h so these two functions
can be reused for later cgroup mprog selftests.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163151.2429325-1-yonghong.song@linux.dev
2025-06-09 16:28:30 -07:00
Yonghong Song
bbc7bd658d selftests/bpf: Fix a user_ringbuf failure with arm64 64KB page size
The ringbuf max_entries must be PAGE_ALIGNED. See kernel function
ringbuf_map_alloc(). So for arm64 64KB page size, adjust max_entries
properly.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250607013626.1553001-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:21:43 -07:00
Yonghong Song
8c8c5e3c85 selftests/bpf: Fix ringbuf/ringbuf_write test failure with arm64 64KB page size
The ringbuf max_entries must be PAGE_ALIGNED. See kernel function
ringbuf_map_alloc(). So for arm64 64KB page size, adjust max_entries
and other related metrics properly.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250607013621.1552332-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:21:43 -07:00
Yonghong Song
377d371590 selftests/bpf: Fix bpf_mod_race test failure with arm64 64KB page size
Currently, uffd_register.range.len is set to 4096 for command
'ioctl(uffd, UFFDIO_REGISTER, &uffd_register)'. For arm64 64KB page size,
the len must be 64KB size aligned as page size alignment is required.
See fs/userfaultfd.c:validate_unaligned_range().

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250607013615.1551783-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:21:43 -07:00
Yonghong Song
ae8824037a selftests/bpf: Reduce test_xdp_adjust_frags_tail_grow logs
For selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow, if tested failure,
I see a long list of log output like

    ...
    test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec
    test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec
    test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec
    test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec
    ...

There are total 7374 lines of the above which is too much. Let us
only issue such logs when it is an assert failure.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250607013610.1551399-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:21:43 -07:00
Rong Tao
64a064ce33 selftests/bpf: rbtree: Fix incorrect global variable usage
Within __add_three() function, should use function parameters instead of
global variables. So that the variables groot_nested.inner.root and
groot_nested.inner.glock in rbtree_add_nodes_nested() are tested
correctly.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Link: https://lore.kernel.org/r/tencent_3DD7405C0839EBE2724AC5FA357B5402B105@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-05 13:55:26 -07:00
Blake Jones
a570f386f3 Tests for the ".emit_strings" functionality in the BTF dumper.
When this mode is turned on, "emit_zeroes" and "compact" have no effect,
and embedded NUL characters always terminate printing of an array.

Signed-off-by: Blake Jones <blakejones@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250603203701.520541-2-blakejones@google.com
2025-06-05 13:45:16 -07:00
Tao Chen
25a0d04d38 selftests/bpf: Add cookies check for raw_tp fill_link_info test
Adding tests for getting cookie with fill_link_info for raw_tp.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250603154309.3063644-2-chen.dylane@linux.dev
2025-06-05 11:44:52 -07:00
Linus Torvalds
64980441d2 Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
 "Two small fixes to selftests"

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Fix selftest btf_tag/btf_type_tag_percpu_vmlinux_helper failure
  selftests/bpf: Fix bpf selftest build error
2025-06-04 19:46:22 -07:00
Linus Torvalds
29e9359005 Merge tag 'cxl-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) updates from Dave Jiang:

 - Remove always true condition in cxl features code

 - Add verification of CHBS length for CXL 2.0

 - Ignore interleave granularity when interleave ways is 1

 - Add update addressing mising MODULE_DESCRIPTION for cxl_test

 - A series of cleanups/refactor to prep for AMD Zen5 translate code

 - Clean %pa debug printk in core/hdm.c

 - Documentation updates:
     - Update to CXL Maturity Map
     - Fixes to source linking in CXL documentation
     - CXL documentation fixes, spelling corrections
     - A large collection of CXL documentation for the entire CXL
       subsystem, including documentation on CXL related platform and
       firmware notes

 - Remove redundant code of cxlctl_get_supported_features()

 - Series to support CXL RAS Features
     - Including "Patrol Scrub Control", "Error Check Scrub",
       "Performance Maitenance" and "Memory Sparing". The series
       connects CXL to EDAC.

* tag 'cxl-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (53 commits)
  cxl/edac: Add CXL memory device soft PPR control feature
  cxl/edac: Add CXL memory device memory sparing control feature
  cxl/edac: Support for finding memory operation attributes from the current boot
  cxl/edac: Add support for PERFORM_MAINTENANCE command
  cxl/edac: Add CXL memory device ECS control feature
  cxl/edac: Add CXL memory device patrol scrub control feature
  cxl: Update prototype of function get_support_feature_info()
  EDAC: Update documentation for the CXL memory patrol scrub control feature
  cxl/features: Remove the inline specifier from to_cxlfs()
  cxl/feature: Remove redundant code of get supported features
  docs: ABI: Fix "firwmare" to "firmware"
  cxl/Documentation: Fix typo in sysfs write_bandwidth attribute path
  cxl: doc/linux/access-coordinates Update access coordinates calculation methods
  cxl: docs/platform/acpi/srat Add generic target documentation
  cxl: docs/platform/cdat reference documentation
  Documentation: Update the CXL Maturity Map
  cxl: Sync up the driver-api/cxl documentation
  cxl: docs - add self-referencing cross-links
  cxl: docs/allocation/hugepages
  cxl: docs/allocation/reclaim
  ...
2025-06-03 13:24:14 -07:00
Linus Torvalds
fd1f847350 Merge tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:

 - "zram: support algorithm-specific parameters" from Sergey Senozhatsky
   adds infrastructure for passing algorithm-specific parameters into
   zram. A single parameter `winbits' is implemented at this time.

 - "memcg: nmi-safe kmem charging" from Shakeel Butt makes memcg
   charging nmi-safe, which is required by BFP, which can operate in NMI
   context.

 - "Some random fixes and cleanup to shmem" from Kemeng Shi implements
   small fixes and cleanups in the shmem code.

 - "Skip mm selftests instead when kernel features are not present" from
   Zi Yan fixes some issues in the MM selftest code.

 - "mm/damon: build-enable essential DAMON components by default" from
   SeongJae Park reworks DAMON Kconfig to make it easier to enable
   CONFIG_DAMON.

 - "sched/numa: add statistics of numa balance task migration" from Libo
   Chen adds more info into sysfs and procfs files to improve visibility
   into the NUMA balancer's task migration activity.

 - "selftests/mm: cow and gup_longterm cleanups" from Mark Brown
   provides various updates to some of the MM selftests to make them
   play better with the overall containing framework.

* tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (43 commits)
  mm/khugepaged: clean up refcount check using folio_expected_ref_count()
  selftests/mm: fix test result reporting in gup_longterm
  selftests/mm: report unique test names for each cow test
  selftests/mm: add helper for logging test start and results
  selftests/mm: use standard ksft_finished() in cow and gup_longterm
  selftests/damon/_damon_sysfs: skip testcases if CONFIG_DAMON_SYSFS is disabled
  sched/numa: add statistics of numa balance task
  sched/numa: fix task swap by skipping kernel threads
  tools/testing: check correct variable in open_procmap()
  tools/testing/vma: add missing function stub
  mm/gup: update comment explaining why gup_fast() disables IRQs
  selftests/mm: two fixes for the pfnmap test
  mm/khugepaged: fix race with folio split/free using temporary reference
  mm: add CONFIG_PAGE_BLOCK_ORDER to select page block order
  mmu_notifiers: remove leftover stub macros
  selftests/mm: deduplicate test names in madv_populate
  kcov: rust: add flags for KCOV with Rust
  mm: rust: make CONFIG_MMU ifdefs more narrow
  mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
  mm/damon/Kconfig: enable CONFIG_DAMON by default
  ...
2025-06-02 16:00:26 -07:00
Linus Torvalds
7f9039c524 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:
  Generic:

   - Clean up locking of all vCPUs for a VM by using the *_nest_lock()
     family of functions, and move duplicated code to virt/kvm/. kernel/
     patches acked by Peter Zijlstra

   - Add MGLRU support to the access tracking perf test

  ARM fixes:

   - Make the irqbypass hooks resilient to changes in the GSI<->MSI
     routing, avoiding behind stale vLPI mappings being left behind. The
     fix is to resolve the VGIC IRQ using the host IRQ (which is stable)
     and nuking the vLPI mapping upon a routing change

   - Close another VGIC race where vCPU creation races with VGIC
     creation, leading to in-flight vCPUs entering the kernel w/o
     private IRQs allocated

   - Fix a build issue triggered by the recently added workaround for
     Ampere's AC04_CPU_23 erratum

   - Correctly sign-extend the VA when emulating a TLBI instruction
     potentially targeting a VNCR mapping

   - Avoid dereferencing a NULL pointer in the VGIC debug code, which
     can happen if the device doesn't have any mapping yet

  s390:

   - Fix interaction between some filesystems and Secure Execution

   - Some cleanups and refactorings, preparing for an upcoming big
     series

  x86:

   - Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE
     to fix a race between AP destroy and VMRUN

   - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for
     the VM

   - Refine and harden handling of spurious faults

   - Add support for ALLOWED_SEV_FEATURES

   - Add #VMGEXIT to the set of handlers special cased for
     CONFIG_RETPOLINE=y

   - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing
     features that utilize those bits

   - Don't account temporary allocations in sev_send_update_data()

   - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock
     Threshold

   - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU
     IBPB, between SVM and VMX

   - Advertise support to userspace for WRMSRNS and PREFETCHI

   - Rescan I/O APIC routes after handling EOI that needed to be
     intercepted due to the old/previous routing, but not the
     new/current routing

   - Add a module param to control and enumerate support for device
     posted interrupts

   - Fix a potential overflow with nested virt on Intel systems running
     32-bit kernels

   - Flush shadow VMCSes on emergency reboot

   - Add support for SNP to the various SEV selftests

   - Add a selftest to verify fastops instructions via forced emulation

   - Refine and optimize KVM's software processing of the posted
     interrupt bitmap, and share the harvesting code between KVM and the
     kernel's Posted MSI handler"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
  rtmutex_api: provide correct extern functions
  KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer
  KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race
  KVM: arm64: Unmap vLPIs affected by changes to GSI routing information
  KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding()
  KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock
  KVM: arm64: Use lock guard in vgic_v4_set_forwarding()
  KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation
  arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue
  KVM: s390: Simplify and move pv code
  KVM: s390: Refactor and split some gmap helpers
  KVM: s390: Remove unneeded srcu lock
  s390: Remove unneeded includes
  s390/uv: Improve splitting of large folios that cannot be split while dirty
  s390/uv: Always return 0 from s390_wiggle_split_folio() if successful
  s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful
  rust: add helper for mutex_trylock
  RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs
  KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs
  x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation
  ...
2025-06-02 12:24:58 -07:00
Yonghong Song
baa39c169d selftests/bpf: Fix selftest btf_tag/btf_type_tag_percpu_vmlinux_helper failure
Ihor Solodrai reported selftest 'btf_tag/btf_type_tag_percpu_vmlinux_helper'
failure ([1]) during 6.16 merge window. The failure log:

  ...
  7: (15) if r0 == 0x0 goto pc+1        ; R0=ptr_css_rstat_cpu()
  ; *(volatile int *)rstat; @ btf_type_tag_percpu.c:68
  8: (61) r1 = *(u32 *)(r0 +0)
  cannot access ptr member updated_children with moff 0 in struct css_rstat_cpu with off 0 size 4

Two changes are needed. First, 'struct cgroup_rstat_cpu' needs to be
replaced with 'struct css_rstat_cpu' to be consistent with new data
structure. Second, layout of 'css_rstat_cpu' is changed compared
to 'cgroup_rstat_cpu'. The first member becomes a pointer so
the bpf prog needs to do 8-byte load instead of 4-byte load.

  [1] https://lore.kernel.org/bpf/6f688f2e-7d26-423a-9029-d1b1ef1c938a@linux.dev/

Cc: Ihor Solodrai <ihor.solodrai@linux.dev>
Cc: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: JP Kobryn <inwardvessel@gmail.com>
Link: https://lore.kernel.org/r/20250529201151.1787575-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-01 13:07:47 -07:00
Saket Kumar Bhaskar
4b65d5ae97 selftests/bpf: Fix bpf selftest build error
On linux-next, build for bpf selftest displays an error due to
mismatch in the expected function signature of bpf_testmod_test_read
and bpf_testmod_test_write.

Commit 97d06802d1 ("sysfs: constify bin_attribute argument of bin_attribute::read/write()")
changed the required type for struct bin_attribute to const struct bin_attribute.

To resolve the error, update corresponding signature for the callback.

Fixes: 97d06802d1 ("sysfs: constify bin_attribute argument of bin_attribute::read/write()")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/all/e915da49-2b9a-4c4c-a34f-877f378129f6@linux.ibm.com/
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Link: https://lore.kernel.org/r/20250512091108.2015615-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-01 12:57:41 -07:00
Mark Brown
66bce7afba selftests/mm: fix test result reporting in gup_longterm
The kselftest framework uses the string logged when a test result is
reported as the unique identifier for a test, using it to track test
results between runs.  The gup_longterm test fails to follow this pattern,
it runs a single test function repeatedly with various parameters but each
result report is a string logging an error message which is fixed between
runs.

Since the code already logs each test uniquely before it starts refactor
to also print this to a buffer, then use that name as the test result. 
This isn't especially pretty but is relatively straightforward and is a
great help to tooling.

Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-4-ff198df8e38e@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:16 -07:00
Mark Brown
3f2d9a9ac5 selftests/mm: report unique test names for each cow test
The kselftest framework uses the string logged when a test result is
reported as the unique identifier for a test, using it to track test
results between runs.  The cow test completely fails to follow this
pattern, it runs test functions repeatedly with various parameters with
each result report from those functions being a string logging an error
message which is fixed between runs.

Since the code already logs each test uniquely before it starts refactor
to also print this to a buffer, then use that name as the test result. 
This isn't especially pretty but is relatively straightforward and is a
great help to tooling.

Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-3-ff198df8e38e@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:16 -07:00
Mark Brown
3f192afbed selftests/mm: add helper for logging test start and results
Several of the MM tests have a pattern of printing a description of the
test to be run then reporting the actual TAP result using a generic string
not connected to the specific test, often in a shared function used by
many tests.  The name reported typically varies depending on the specific
result rather than the test too.  This causes problems for tooling that
works with test results, the names reported with the results are used to
deduplicate tests and track them between runs so both duplicated names and
changing names cause trouble for things like UIs and automated bisection.

As a first step towards matching these tests better with the expectations
of kselftest provide helpers which record the test name as part of the
initial print and then use that as part of reporting a result.

This is not added as a generic kselftest helper partly because the use of
a variable to store the test name doesn't fit well with the header only
implementation of kselftest.h and partly because it's not really an
intended pattern.  Ideally at some point the mm tests that use it will be
updated to not need it.

Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-2-ff198df8e38e@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:16 -07:00
Mark Brown
109364fce5 selftests/mm: use standard ksft_finished() in cow and gup_longterm
Patch series "selftests/mm: cow and gup_longterm cleanups", v2.

The bulk of these changes modify the cow and gup_longterm tests to report
unique and stable names for each test, bringing them into line with the
expectations of tooling that works with kselftest.  The string reported as
a test result is used by tooling to both deduplicate tests and track tests
between test runs, using the same string for multiple tests or changing
the string depending on test result causes problems for user interfaces
and automation such as bisection.

It was suggested that converting to use kselftest_harness.h would be a
good way of addressing this, however that really wants the set of tests to
run to be known at compile time but both test programs dynamically
enumarate the set of huge page sizes the system supports and test each. 
Refactoring to handle this would be even more invasive than these changes
which are large but straightforward and repetitive.

A version of the main gup_longterm cleanup was previously sent separately,
this version factors out the helpers for logging the start of the test
since the cow test looks very similar.


This patch (of 4):

The cow and gup_longterm test programs open code something that looks a
lot like the standard ksft_finished() helper to summarise the test results
and provide an exit code, convert to use ksft_finished().

Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-0-ff198df8e38e@kernel.org
Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-1-ff198df8e38e@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:15 -07:00
Enze Li
79509ec1d2 selftests/damon/_damon_sysfs: skip testcases if CONFIG_DAMON_SYSFS is disabled
When CONFIG_DAMON_SYSFS is disabled, the selftests fail with the following
outputs,

not ok 2 selftests: damon: sysfs_update_schemes_tried_regions_wss_estimation.py # exit=1
not ok 3 selftests: damon: damos_quota.py # exit=1
not ok 4 selftests: damon: damos_quota_goal.py # exit=1
not ok 5 selftests: damon: damos_apply_interval.py # exit=1
not ok 6 selftests: damon: damos_tried_regions.py # exit=1
not ok 7 selftests: damon: damon_nr_regions.py # exit=1
not ok 11 selftests: damon: sysfs_update_schemes_tried_regions_hang.py # exit=1

The root cause of this issue is that all the testcases above do not check
the sysfs interface of DAMON whether it exists or not.  With this patch
applied, all the testcases above now pass successfully.

Link: https://lkml.kernel.org/r/20250531093937.1555159-1-lienze@kylinos.cn
Signed-off-by: Enze Li <lienze@kylinos.cn>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:15 -07:00
Dan Carpenter
83da212b7f tools/testing: check correct variable in open_procmap()
Check if "procmap_out->fd" is negative instead of "procmap_out" (which is
a pointer).

Link: https://lkml.kernel.org/r/aDbFuUTlJTBqziVd@stanley.mountain
Fixes: bd23f293a0 ("tools/testing: add PROCMAP_QUERY helper functions in mm self tests")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: levi.yun <yeoreum.yun@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:14 -07:00
Lorenzo Stoakes
918850c136 tools/testing/vma: add missing function stub
The hugetlb fix introduced in commit ee40c9920a ("mm: fix copy_vma()
error handling for hugetlb mappings") mistakenly did not provide a stub
for the VMA userland testing, which results in a compile error when trying
to build this.

Provide this stub to resolve the issue.

Link: https://lkml.kernel.org/r/20250528-fix-vma-test-v1-1-c8a5f533b38f@oracle.com
Fixes: ee40c9920a ("mm: fix copy_vma() error handling for hugetlb mappings")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by:  Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Jann Horn <jannh@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:14 -07:00
David Hildenbrand
bb084994d3 selftests/mm: two fixes for the pfnmap test
When unregistering the signal handler, we have to pass SIG_DFL, and
blindly reading from PFN 0 and PFN 1 seems to be problematic on !x86
systems.  In particularly, on arm64 tx2 machines where noting resides at
these physical memory locations, we can generate RAS errors.

Let's fix it by scanning /proc/iomem for actual "System RAM".

Link: https://lkml.kernel.org/r/20250528195244.1182810-1-david@redhat.com
Fixes: 2616b37032 ("selftests/mm: add simple VM_PFNMAP tests based on mmap'ing /dev/mem")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Ryan Roberts <ryan.roberts@arm.com>
Closes: https://lore.kernel.org/all/232960c2-81db-47ca-a337-38c4bce5f997@arm.com/T/#u
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:14 -07:00
Mark Brown
cfc695109a selftests/mm: deduplicate test names in madv_populate
The madv_populate selftest has some repetitive code for several different
cases that it covers, included repeated test names used in
ksft_test_result() reports.  This causes problems for automation, the test
name is used to both track the test between runs and distinguish between
multiple tests within the same run.  Fix this by tweaking the messages
with duplication to be more specific about the contexts they're in.

Link: https://lkml.kernel.org/r/20250522-selftests-mm-madv-populate-dedupe-v1-1-fd1dedd79b4b@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:13 -07:00
Zi Yan
115155901d selftests/mm: skip hugevm test if kernel config file is not present
When running hugevm tests in a machine without kernel config present,
e.g., a VM running a kernel without CONFIG_IKCONFIG_PROC nor
/boot/config-*, skip hugevm tests, which reads kernel config to get page
table level information.

Link: https://lkml.kernel.org/r/20250516132938.356627-3-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Adam Sindelar <adam@wowsignal.io>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:11 -07:00
Zi Yan
6d21130312 selftests/mm: skip guard_regions.uffd tests when uffd is not present
Patch series "Skip mm selftests instead when kernel features are not
present", v2.

Two guard_regions tests on userfaultfd fail when userfaultfd is not
present.  Skip them instead.

hugevm test reads kernel config to get page table level information and
fails when neither /proc/config.gz nor /boot/config-* is present.  Skip it
instead.


This patch (of 2):

When userfaultfd is not compiled into kernel, userfaultfd() returns -1,
causing guard_regions.uffd tests to fail.  Skip the tests instead.

Link: https://lkml.kernel.org/r/20250516132938.356627-1-ziy@nvidia.com
Link: https://lkml.kernel.org/r/20250516132938.356627-2-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Adam Sindelar <adam@wowsignal.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:11 -07:00
Mark Brown
9abb8c208f selftests/mm: deduplicate default page size test results in thuge-gen
The thuge-gen test program runs mmap() and shmget() tests for both every
available page size and the default page size, resulting in two tests for
the default size.  These tests are distinct since the flags in the default
case do not specify an explicit size, add the flags to the test name that
is logged to deduplicate.

Link: https://lkml.kernel.org/r/20250515-selfests-mm-thuge-gen-dup-v1-1-057d2836553f@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Dev Jain <dev.jain@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:08 -07:00
Mark Brown
62973e3867 selftests/mm: deduplicate test logging in test_mlock_lock()
The mlock2-tests test_mlock_lock() test reports two test results with an
identical string, one reporitng if it successfully locked a block of
memory and another reporting if the lock is still present after doing an
unlock (following a similar pattern to other tests in the same program). 
This confuses test automation since the test string is used to deduplicate
tests, change the post unlock test to report "Unlocked" instead like the
other tests to fix this.

Link: https://lkml.kernel.org/r/20250515-selftest-mm-mlock2-dup-v1-1-963d5d7d243a@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Dev Jain <dev.jain@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-31 22:46:08 -07:00
Linus Torvalds
7d4e49a77d Merge tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:

 - "hung_task: extend blocking task stacktrace dump to semaphore" from
   Lance Yang enhances the hung task detector.

   The detector presently dumps the blocking tasks's stack when it is
   blocked on a mutex. Lance's series extends this to semaphores

 - "nilfs2: improve sanity checks in dirty state propagation" from
   Wentao Liang addresses a couple of minor flaws in nilfs2

 - "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn
   fixes a couple of issues in the gdb scripts

 - "Support kdump with LUKS encryption by reusing LUKS volume keys" from
   Coiby Xu addresses a usability problem with kdump.

   When the dump device is LUKS-encrypted, the kdump kernel may not have
   the keys to the encrypted filesystem. A full writeup of this is in
   the series [0/N] cover letter

 - "sysfs: add counters for lockups and stalls" from Max Kellermann adds
   /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and
   /sys/kernel/rcu_stall_count

 - "fork: Page operation cleanups in the fork code" from Pasha Tatashin
   implements a number of code cleanups in fork.c

 - "scripts/gdb/symbols: determine KASLR offset on s390 during early
   boot" from Ilya Leoshkevich fixes some s390 issues in the gdb
   scripts

* tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits)
  llist: make llist_add_batch() a static inline
  delayacct: remove redundant code and adjust indentation
  squashfs: add optional full compressed block caching
  crash_dump, nvme: select CONFIGFS_FS as built-in
  scripts/gdb/symbols: determine KASLR offset on s390 during early boot
  scripts/gdb/symbols: factor out pagination_off()
  scripts/gdb/symbols: factor out get_vmlinux()
  kernel/panic.c: format kernel-doc comments
  mailmap: update and consolidate Casey Connolly's name and email
  nilfs2: remove wbc->for_reclaim handling
  fork: define a local GFP_VMAP_STACK
  fork: check charging success before zeroing stack
  fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code
  fork: clean-up ifdef logic around stack allocation
  kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count
  kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count
  x86/crash: make the page that stores the dm crypt keys inaccessible
  x86/crash: pass dm crypt keys to kdump kernel
  Revert "x86/mm: Remove unused __set_memory_prot()"
  crash_dump: retrieve dm crypt keys in kdump kernel
  ...
2025-05-31 19:12:53 -07:00