Commit Graph

51557 Commits

Author SHA1 Message Date
Linus Torvalds
c7decec2f2 Merge tag 'perf-tools-for-v7.0-1-2026-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:

 - Introduce 'perf sched stats' tool with record/report/diff workflows
   using schedstat counters

 - Add a faster libdw based addr2line implementation and allow selecting
   it or its alternatives via 'perf config addr2line.style='

 - Data-type profiling fixes and improvements including the ability to
   select fields using 'perf report''s -F/-fields, e.g.:

     'perf report --fields overhead,type'

 - Add 'perf test' regression tests for Data-type profiling with C and
   Rust workloads

 - Fix srcline printing with inlines in callchains, make sure this has
   coverage in 'perf test'

 - Fix printing of leaf IP in LBR callchains

 - Fix display of metrics without sufficient permission in 'perf stat'

 - Print all machines in 'perf kvm report -vvv', not just the host

 - Switch from SHA-1 to BLAKE2s for build ID generation, remove SHA-1
   code

 - Fix 'perf report's histogram entry collapsing with '-F' option

 - Use system's cacheline size instead of a hardcoded value in 'perf
   report'

 - Allow filtering conversion by time range in 'perf data'

 - Cover conversion to CTF using 'perf data' in 'perf test'

 - Address newer glibc const-correctness (-Werror=discarded-qualifiers)
   issues

 - Fixes and improvements for ARM's CoreSight support, simplify ARM SPE
   event config in 'perf mem', update docs for 'perf c2c' including the
   ARM events it can be used with

 - Build support for generating metrics from arch specific python
   script, add extra AMD, Intel, ARM64 metrics using it

 - Add AMD Zen 6 events and metrics

 - Add JSON file with OpenHW Risc-V CVA6 hardware counters

 - Add 'perf kvm' stats live testing

 - Add more 'perf stat' tests to 'perf test'

 - Fix segfault in `perf lock contention -b/--use-bpf`

 - Fix various 'perf test' cases for s390

 - Build system cleanups, bump minimum shellcheck version to 0.7.2

 - Support building the capstone based annotation routines as a plugin

 - Allow passing extra Clang flags via EXTRA_BPF_FLAGS

* tag 'perf-tools-for-v7.0-1-2026-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (255 commits)
  perf test script: Add python script testing support
  perf test script: Add perl script testing support
  perf script: Allow the generated script to be a path
  perf test: perf data --to-ctf testing
  perf test: Test pipe mode with data conversion --to-json
  perf json: Pipe mode --to-ctf support
  perf json: Pipe mode --to-json support
  perf check: Add libbabeltrace to the listed features
  perf build: Allow passing extra Clang flags via EXTRA_BPF_FLAGS
  perf test data_type_profiling.sh: Skip just the Rust tests if code_with_type workload is missing
  tools build: Fix feature test for rust compiler
  perf libunwind: Fix calls to thread__e_machine()
  perf stat: Add no-affinity flag
  perf evlist: Reduce affinity use and move into iterator, fix no affinity
  perf evlist: Missing TPEBS close in evlist__close()
  perf evlist: Special map propagation for tool events that read on 1 CPU
  perf stat-shadow: In prepare_metric fix guard on reading NULL perf_stat_evsel
  Revert "perf tool_pmu: More accurately set the cpus for tool events"
  tools build: Emit dependencies file for test-rust.bin
  tools build: Make test-rust.bin be removed by the 'clean' target
  ...
2026-02-21 10:51:08 -08:00
Linus Torvalds
4cf4465788 Merge tag 'sched_ext-for-7.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:

 - Various bug fixes for the example schedulers and selftests

* tag 'sched_ext-for-7.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  tools/sched_ext: fix getopt not re-parsed on restart
  tools/sched_ext: scx_userland: fix data races on shared counters
  tools/sched_ext: scx_pair: fix stride == 0 crash on single-CPU systems
  tools/sched_ext: scx_central: fix CPU_SET and skeleton leak on early exit
  tools/sched_ext: scx_userland: fix stale data on restart
  tools/sched_ext: scx_flatcg: fix potential stack overflow from VLA in fcg_read_stats
  selftests/sched_ext: Fix rt_stall flaky failure
  tools/sched_ext: scx_userland: fix restart and stats thread lifecycle bugs
  tools/sched_ext: scx_central: fix sched_setaffinity() call with the set size
  tools/sched_ext: scx_flatcg: zero-initialize stats counter array
2026-02-21 09:38:59 -08:00
David Carlier
640c9dc72f tools/sched_ext: fix getopt not re-parsed on restart
After goto restart, optind retains its advanced position from the
previous getopt loop, causing getopt() to immediately return -1.
This silently drops all command-line options on the restarted skeleton.

Reset optind to 1 at the restart label so options are re-parsed.

Affected schedulers: scx_simple, scx_central, scx_flatcg, scx_pair,
scx_sdt, scx_cpu0.

Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-20 17:17:38 -10:00
David Carlier
f892f9f994 tools/sched_ext: scx_userland: fix data races on shared counters
The stats thread reads nr_vruntime_enqueues, nr_vruntime_dispatches,
nr_vruntime_failed, and nr_curr_enqueued concurrently with the main
thread writing them, with no synchronization.

Use __atomic builtins with relaxed ordering for all accesses to these
counters to eliminate the data races.

Only display accuracy is affected, not scheduling correctness.

Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-20 17:17:31 -10:00
Linus Torvalds
8bf22c33e7 Merge tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from Netfilter.

  Current release - new code bugs:

   - net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT

   - eth: mlx5e: XSK, Fix unintended ICOSQ change

   - phy_port: correctly recompute the port's linkmodes

   - vsock: prevent child netns mode switch from local to global

   - couple of kconfig fixes for new symbols

  Previous releases - regressions:

   - nfc: nci: fix false-positive parameter validation for packet data

   - net: do not delay zero-copy skbs in skb_attempt_defer_free()

  Previous releases - always broken:

   - mctp: ensure our nlmsg responses to user space are zero-initialised

   - ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()

   - fixes for ICMP rate limiting

  Misc:

   - intel: fix PCI device ID conflict between i40e and ipw2200"

* tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
  net: nfc: nci: Fix parameter validation for packet data
  net/mlx5e: Use unsigned for mlx5e_get_max_num_channels
  net/mlx5e: Fix deadlocks between devlink and netdev instance locks
  net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event
  net/mlx5: Fix misidentification of write combining CQE during poll loop
  net/mlx5e: Fix misidentification of ASO CQE during poll loop
  net/mlx5: Fix multiport device check over light SFs
  bonding: alb: fix UAF in rlb_arp_recv during bond up/down
  bnge: fix reserving resources from FW
  eth: fbnic: Advertise supported XDP features.
  rds: tcp: fix uninit-value in __inet_bind
  net/rds: Fix NULL pointer dereference in rds_tcp_accept_one
  octeontx2-af: Fix default entries mcam entry action
  net/mlx5e: XSK, Fix unintended ICOSQ change
  ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero
  ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero
  ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()
  inet: move icmp_global_{credit,stamp} to a separate cache line
  icmp: prevent possible overflow in icmp_global_allow()
  selftests/net: packetdrill: add ipv4-mapped-ipv6 tests
  ...
2026-02-19 10:39:08 -08:00
Linus Torvalds
4f13d0dabc Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:

 - Fix invalid write loop logic in libbpf's bpf_linker__add_buf() (Amery
   Hung)

 - Fix a potential use-after-free of BTF object (Anton Protopopov)

 - Add feature detection to libbpf and avoid moving arena global
   variables on older kernels (Emil Tsalapatis)

 - Remove extern declaration of bpf_stream_vprintk() from libbpf headers
   (Ihor Solodrai)

 - Fix truncated netlink dumps in bpftool (Jakub Kicinski)

 - Fix map_kptr grace period wait in bpf selftests (Kumar Kartikeya
   Dwivedi)

 - Remove hexdump dependency while building bpf selftests (Matthieu
   Baerts)

 - Complete fsession support in BPF trampolines on riscv (Menglong Dong)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Remove hexdump dependency
  libbpf: Remove extern declaration of bpf_stream_vprintk()
  selftests/bpf: Use vmlinux.h in test_xdp_meta
  bpftool: Fix truncated netlink dumps
  libbpf: Delay feature gate check until object prepare time
  libbpf: Do not use PROG_TYPE_TRACEPOINT program for feature gating
  bpf: Add a map/btf from a fd array more consistently
  selftests/bpf: Fix map_kptr grace period wait
  selftests/bpf: enable fsession_test on riscv64
  selftests/bpf: Adjust selftest due to function rename
  bpf, riscv: add fsession support for trampolines
  bpf: Fix a potential use-after-free of BTF object
  bpf, riscv: introduce emit_store_stack_imm64() for trampoline
  libbpf: Fix invalid write loop logic in bpf_linker__add_buf()
  libbpf: Add gating for arena globals relocation feature
2026-02-19 10:36:54 -08:00
Linus Torvalds
2b7a25df82 Merge tag 'mm-nonmm-stable-2026-02-18-19-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more non-MM updates from Andrew Morton:

 - "two fixes in kho_populate()" fixes a couple of not-major issues in
   the kexec handover code (Ran Xiaokai)

 - misc singletons

* tag 'mm-nonmm-stable-2026-02-18-19-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  lib/group_cpus: handle const qualifier from clusters allocation type
  kho: remove unnecessary WARN_ON(err) in kho_populate()
  kho: fix missing early_memunmap() call in kho_populate()
  scripts/gdb: implement x86_page_ops in mm.py
  objpool: fix the overestimation of object pooling metadata size
  selftests/memfd: use IPC semaphore instead of SIGSTOP/SIGCONT
  delayacct: fix build regression on accounting tool
2026-02-18 21:40:16 -08:00
Linus Torvalds
eeccf287a2 Merge tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM  updates from Andrew Morton:

 - "mm/vmscan: fix demotion targets checks in reclaim/demotion" fixes a
   couple of issues in the demotion code - pages were failed demotion
   and were finding themselves demoted into disallowed nodes (Bing Jiao)

 - "Remove XA_ZERO from error recovery of dup_mmap()" fixes a rare
   mapledtree race and performs a number of cleanups (Liam Howlett)

 - "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use
   them" implements a lot of cleanups following on from the conversion
   of the VMA flags into a bitmap (Lorenzo Stoakes)

 - "support batch checking of references and unmapping for large folios"
   implements batching to greatly improve the performance of reclaiming
   clean file-backed large folios (Baolin Wang)

 - "selftests/mm: add memory failure selftests" does as claimed (Miaohe
   Lin)

* tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (36 commits)
  mm/page_alloc: clear page->private in free_pages_prepare()
  selftests/mm: add memory failure dirty pagecache test
  selftests/mm: add memory failure clean pagecache test
  selftests/mm: add memory failure anonymous page test
  mm: rmap: support batched unmapping for file large folios
  arm64: mm: implement the architecture-specific clear_flush_young_ptes()
  arm64: mm: support batch clearing of the young flag for large folios
  arm64: mm: factor out the address and ptep alignment into a new helper
  mm: rmap: support batched checks of the references for large folios
  tools/testing/vma: add VMA userland tests for VMA flag functions
  tools/testing/vma: separate out vma_internal.h into logical headers
  tools/testing/vma: separate VMA userland tests into separate files
  mm: make vm_area_desc utilise vma_flags_t only
  mm: update all remaining mmap_prepare users to use vma_flags_t
  mm: update shmem_[kernel]_file_*() functions to use vma_flags_t
  mm: update secretmem to use VMA flags on mmap_prepare
  mm: update hugetlbfs to use VMA flags on mmap_prepare
  mm: add basic VMA flag operation helper functions
  tools: bitmap: add missing bitmap_[subset(), andnot()]
  mm: add mk_vma_flags() bitmap flag macro helper
  ...
2026-02-18 20:50:32 -08:00
Eric Dumazet
570e4549f6 selftests/net: packetdrill: add ipv4-mapped-ipv6 tests
Add ipv4-mapped-ipv6 case to ksft_runner.sh before
an upcoming TCP fix in this area.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260217142924.1853498-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-18 16:45:30 -08:00
Matthieu Baerts (NGI0)
1e5c009126 selftests/bpf: Remove hexdump dependency
The verification signature header generation requires converting a
binary certificate to a C array. Previously this only worked with xxd,
and a switch to hexdump has been done in commit b640d556a2
("selftests/bpf: Remove xxd util dependency").

hexdump is a more common utility program, yet it might not be installed
by default. When it is not installed, BPF selftests build without
errors, but tests_progs is unusable: it exits with the 255 code and
without any error messages. When manually reproducing the issue, it is
not too hard to find out that the generated verification_cert.h file is
incorrect, but that's time consuming. When digging the BPF selftests
build logs, this line can be seen amongst thousands others, but ignored:

  /bin/sh: 2: hexdump: not found

Here, od is used instead of hexdump. od is coming from the coreutils
package, and this new od command produces the same output when using od
from GNU coreutils, uutils, and even busybox. This is more portable, and
it produces a similar results to what was done before with hexdump:
there is an extra comma at the end instead of trailing whitespaces,
but the C code is not impacted.

Fixes: b640d556a2 ("selftests/bpf: Remove xxd util dependency")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20260218-bpf-sft-hexdump-od-v2-1-2f9b3ee5ab86@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-18 15:11:20 -08:00
Linus Torvalds
956b9cbd7f Merge tag 'kbuild-fixes-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild fixes from Nathan Chancellor:

 - Ensure tools/objtool is cleaned by 'make clean' and 'make mrproper'

 - Fix test program for CONFIG_CC_CAN_LINK to avoid a warning, which is
   made fatal by -Werror

 - Drop explicit LZMA parallel compression in scripts/make_fit.py

 - Several fixes for commit 62089b8048 ("kbuild: rpm-pkg: Generate
   debuginfo package manually")

* tag 'kbuild-fixes-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kbuild: rpm-pkg: Disable automatic requires for manual debuginfo package
  kbuild: rpm-pkg: Fix manual debuginfo generation when using .src.rpm
  kernel: rpm-pkg: Restore find-debuginfo.sh approach to -debuginfo package
  kbuild: rpm-pkg: Restrict manual debug package creation
  scripts/make_fit.py: Drop explicit LZMA parallel compression
  kbuild: Fix CC_CAN_LINK detection
  kbuild: Add objtool to top-level clean target
2026-02-18 14:59:55 -08:00
Ihor Solodrai
0cecd492f5 libbpf: Remove extern declaration of bpf_stream_vprintk()
An issue was reported that building BPF program which includes both
vmlinux.h and bpf_helpers.h from libbpf fails due to conflicting
declarations of bpf_stream_vprintk().

Remove the extern declaration from bpf_helpers.h to address this.

In order to use bpf_stream_printk() macro, BPF programs are expected
to either include vmlinux.h of the kernel they are targeting, or add
their own extern declaration.

Reported-by: Luca Boccassi <luca.boccassi@gmail.com>
Closes: https://github.com/libbpf/libbpf/issues/947
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260218215651.2057673-3-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-18 14:58:47 -08:00
Ihor Solodrai
b3dfa128f7 selftests/bpf: Use vmlinux.h in test_xdp_meta
- Replace linux/* includes with vmlinux.h
- Include errno.h
- Include bpf_tracing_net.h for TC_ACT_* and ETH_*
- Use BPF_STDERR instead of BPF_STREAM_STDERR

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260218215651.2057673-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-18 14:58:47 -08:00
Linus Torvalds
7ad54bbbc9 Merge tag 'turbostat-2026.02.14-AMD-RAPL-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat fix from Len Brown:
 "Fix a recent AMD regression due to errant code cleanup"

* tag 'turbostat-2026.02.14-AMD-RAPL-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: Fix AMD RAPL regression
2026-02-18 09:52:38 -08:00
David Carlier
625be3456b tools/sched_ext: scx_pair: fix stride == 0 crash on single-CPU systems
nr_cpu_ids / 2 produces stride 0 on a single-CPU system, which later
causes SCX_BUG_ON(i == j) to fire. Validate stride after option
parsing to also catch invalid user-supplied values via -S.

Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-18 07:03:57 -10:00
David Carlier
55a24d9203 tools/sched_ext: scx_central: fix CPU_SET and skeleton leak on early exit
Use CPU_SET_S() instead of CPU_SET() on the dynamically allocated
cpuset to avoid a potential out-of-bounds write when nr_cpu_ids
exceeds CPU_SETSIZE.

Also destroy the skeleton before returning on invalid central CPU ID
to prevent a resource leak.

Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-18 07:03:50 -10:00
Len Brown
ef0e60083f tools/power turbostat: Fix AMD RAPL regression
turbostat.c:8688: rapl_perf_init: Assertion `next_domain < num_domains' failed.

Two recent cleanup patches that were not supposed to change anything
broke the core_id code needed for AMD RAPL initialization:

commit 070e92361e ("tools/power turbostat: Enhance HT enumeration")
commit ddf60e38ca ("tools/power turbostat: Simplify global core_id calculation")

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2026-02-17 23:27:52 -06:00
Jakub Kicinski
32b70e6203 selftests: tc_actions: don't dump 2MB of \0 to stdout
Since we started running selftests in NIPA we have been seeing
tc_actions.sh generate a soft lockup warning on ~20% of the runs.
On the pre-netdev foundation setup it was actually a missed irq
splat from the console. Now it's either that or a lockup.

I initially suspected a socket locking issue since the test
is exercising local loopback with act_mirred.
After hours of staring at this I noticed in strace that ncat
when -o $file is specified _both_ saves the output to the file
and still prints it to stdout. Because the file being sent
is constructed with:

  dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$mirred
                                ^^^^^^^^^

the data printed is all \0. Most terminals don't display nul
characters (and neither does vng output capture save them).
But QEMU's serial console still has to poke them thru which
is very slow and causes the lockup (if the file is >600kB).

Replace the '-o $file' with '> $file'. This speeds the test up
from 2m20s to 18s on debug kernels, and prevents the warnings.

Fixes: ca22da2fbd ("act_mirred: use the backlog for nested calls to mirred ingress")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260214035159.2119699-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-17 17:13:07 -08:00
Jakub Kicinski
3b39d73cc3 bpftool: Fix truncated netlink dumps
Netlink requires that the recv buffer used during dumps is at least
min(PAGE_SIZE, 8k) (see the man page). Otherwise the messages will
get truncated. Make sure bpftool follows this requirement, avoid
missing information on systems with large pages.

Acked-by: Quentin Monnet <qmo@kernel.org>
Fixes: 7084566a23 ("tools/bpftool: Remove libbpf_internal.h usage in bpftool")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20260217194150.734701-1-kuba@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-17 16:54:03 -08:00
Linus Torvalds
2961f841b0 Merge tag 'turbostat-2026.02.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:

 - Add L2 statistics columns for recent Intel processors:
	L2MRPS = L2 Cache M-References Per Second
	L2%hit = L2 Cache Hit %

 - Sort work and output by cpu# rather than core#

 - Minor features and fixes

* tag 'turbostat-2026.02.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (23 commits)
  tools/power turbostat: version 2026.02.14
  tools/power turbostat: Fix and document --header_iterations
  tools/power turbostat: Use strtoul() for iteration parsing
  tools/power turbostat: Favor cpu# over core#
  tools/power turbostat: Expunge logical_cpu_id
  tools/power turbostat: Enhance HT enumeration
  tools/power turbostat: Simplify global core_id calculation
  tools/power turbostat: Unify even/odd/average counter referencing
  tools/power turbostat: Allocate average counters dynamically
  tools/power turbostat: Delete core_data.core_id
  tools/power turbostat: Rename physical_core_id to core_id
  tools/power turbostat: Cleanup package_id
  tools/power turbostat: Cleanup internal use of "base_cpu"
  tools/power turbostat: Add L2 cache statistics
  tools/power turbostat: Remove redundant newlines from err(3) strings
  tools/power turbostat: Allow more use of is_hybrid flag
  tools/power turbostat: Rename "LLCkRPS" column to "LLCMRPS"
  tools/power turbostat.8: Document the "--force" option
  tools/power turbostat: Harden against unexpected values
  tools/power turbostat: Dump hypervisor name
  ...
2026-02-17 15:51:14 -08:00
Emil Tsalapatis
d7988720ef libbpf: Delay feature gate check until object prepare time
Commit 728ff16791 ("libbpf: Add gating for arena globals relocation feature")
adds a feature gate check that loads a map and BPF program to
test the running kernel supports large direct offsets for LDIMM64
instructions. This check is currently used to calculate arena symbol
offsets during bpf_object__collect_relos, itself called by
bpf_object_open.

However, the program calling bpf_object_open may not have the permissions to
load maps and programs. This is the case with the BPF selftests, where
bpftool is invoked at compilation time during skeleton generation. This
causes errors as the feature gate unexpectedly fails with -EPERM.

Avoid this by moving all the use of the FEAT_LDIMM64_FULL_RANGE_OFF feature gate
to BPF object preparation time instead.

Fixes: 728ff16791 ("libbpf: Add gating for arena globals relocation feature")
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260217204345.548648-3-emil@etsalapatis.com
2026-02-17 14:20:24 -08:00
Emil Tsalapatis
7cefbb47cc libbpf: Do not use PROG_TYPE_TRACEPOINT program for feature gating
Commit 728ff16791 uses a PROG_TYPE_TRACEPOINT BPF test program to
check whether the running kernel supports large LDIMM64 offsets. The
feature gate incorrectly assumes that the program will fail at
verification time with one of two messages, depending on whether the
feature is supported by the running kernel. However,
PROG_TYPE_TRACEPOINT programs may fail to load before verification even
starts, e.g., if the shell does not have the appropriate capabilities.
Use a BPF_PROG_TYPE_SOCKET_FILTER program for the feature gate instead.

Also fix two minor issues. First, ensure the log buffer for the test is
initialized: Failing program load before verification led to libbpf dumping
uninitialized data to stdout. Also, ensure that close() is only called
for program_fd in the probe if the program load actually succeeded. The
call was currently failing silently with -EBADF most of the time.

Fixes: 728ff16791 ("libbpf: Add gating for arena globals relocation feature")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260217204345.548648-2-emil@etsalapatis.com
2026-02-17 14:17:06 -08:00
Linus Torvalds
17f8d20093 Merge tag 'usb-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt updates from Greg KH:
 "Here is the "big" set of USB and Thunderbolt driver updates for
  7.0-rc1. Overall more lines were removed than added, thanks to
  dropping the obsolete isp1362 USB host controller driver, always a
  nice change.

  Other than that, nothing major happening here, highlights are:

   - lots of dwc3 driver updates and new hardware support added

   - usb gadget function driver updates

   - usb phy driver updates

   - typec driver updates and additions

   - USB rust binding updates for syntax and formatting changes

   - more usb serial device ids added

   - other smaller USB core and driver updates and additions

  All of these have been in linux-next for a long time, with no reported
  problems"

* tag 'usb-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (77 commits)
  usb: typec: ucsi: Add Thunderbolt alternate mode support
  usb: typec: hd3ss3220: Check if regulator needs to be switched
  usb: phy: tegra: parametrize PORTSC1 register offset
  usb: phy: tegra: parametrize HSIC PTS value
  usb: phy: tegra: return error value from utmi_wait_register
  usb: phy: tegra: cosmetic fixes
  dt-bindings: usb: renesas,usbhs: Add RZ/G3E SoC support
  usb: dwc2: fix resume failure if dr_mode is host
  usb: cdns3: fix role switching during resume
  usb: dwc3: gadget: Move vbus draw to workqueue context
  USB: serial: option: add Telit FN920C04 RNDIS compositions
  usb: dwc3: Log dwc3 address in traces
  usb: gadget: tegra-xudc: Add handling for BLCG_COREPLL_PWRDN
  usb: phy: tegra: add HSIC support
  usb: phy: tegra: use phy type directly
  usb: typec: ucsi: Enforce mode selection for cros_ec_ucsi
  usb: typec: ucsi: Support mode selection to activate altmodes
  usb: typec: Introduce mode_selection bit
  usb: typec: Implement mode selection
  usb: typec: Expose alternate mode priority via sysfs
  ...
2026-02-17 09:36:43 -08:00
Aleksei Oladko
a8c198d16c selftests: forwarding: fix pedit tests failure with br_netfilter enabled
The tests use the tc pedit action to modify the IPv4 source address
("pedit ex munge ip src set"), but the IP header checksum is not
recalculated after the modification. As a result, the modified packet
fails sanity checks in br_netfilter after bridging and is dropped,
which causes the test to fail.

Fix this by ensuring net.bridge.bridge-nf-call-iptables is set to 0
during the test execution. This prevents the bridge from passing
L2 traffic to netfilter, bypassing the checksum validation that
causes the test failure.

Fixes: 92ad382894 ("selftests: forwarding: Add a test for pedit munge SIP and DIP")
Fixes: 226657ba23 ("selftests: forwarding: Add a forwarding test for pedit munge dsfield")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-4-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-17 13:34:18 +01:00
Aleksei Oladko
ce9f6aec0f selftests: forwarding: vxlan_bridge_1d_ipv6: fix test failure with br_netfilter enabled
The test generates VXLAN traffic using mausezahn, where the encapsulated
inner IPv6 packet has an incorrect payload length set in the IPv6 header.
After VXLAN decapsulation, such packets do not pass sanity checks in
br_netfilter and are dropped, which causes the test to fail.

Fix this by setting the correct IPv6 payload length for the encapsulated
packet generated by mausezahn, so that the packet is accepted
by br_netfilter.

tools/testing/selftests/net/forwarding/vxlan_bridge_1d_ipv6.sh
lines 698-706

              )"00:03:"$(           : Payload length
              )"3a:"$(              : Next header
              )"04:"$(              : Hop limit
              )"$saddr:"$(          : IP saddr
              )"$daddr:"$(          : IP daddr
              )"80:"$(              : ICMPv6.type
              )"00:"$(              : ICMPv6.code
              )"00:"$(              : ICMPv6.checksum
              )

Data after IPv6 header:
• 80: — 1 byte (ICMPv6 type)
• 00: — 1 byte (ICMPv6 code)
• 00: — 1 byte (ICMPv6 checksum, truncated)

Total: 3 bytes → 00:03 is correct. The old value 00:08 did not match
the actual payload size.

Fixes: b07e9957f2 ("selftests: forwarding: Add VxLAN tests with a VLAN-unaware bridge for IPv6")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-3-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-17 13:34:11 +01:00
Aleksei Oladko
02cb2e6bac selftests: forwarding: vxlan_bridge_1d: fix test failure with br_netfilter enabled
The test generates VXLAN traffic using mausezahn, where the encapsulated
inner IPv4 packet contains a zero IP header checksum. After VXLAN
decapsulation, such packets do not pass sanity checks in br_netfilter
and are dropped, which causes the test to fail.

Fix this by calculating and setting a valid IPv4 header checksum for the
encapsulated packet generated by mausezahn, so that the packet is accepted
by br_netfilter. Fixed by using the payload_template_calc_checksum() /
payload_template_expand_checksum() helpers that are only available
in v6.3 and newer kernels.

Fixes: a0b61f3d8e ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap test")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-2-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-17 13:34:11 +01:00
Nikolay Aleksandrov
a8470953b4 selftests: forwarding: bridge_mdb_max: add tests for mdb_n_entries warning
Recently we were able to trigger a warning in the mdb_n_entries counting
code. Add tests that exercise different ways which used to trigger that
warning.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://patch.msgid.link/20260213070031.1400003-3-nikolay@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-17 13:00:14 +01:00
Bobby Eshleman
e7a3c1adc1 selftests: drv-net: add HDS payload sweep test for devmem TCP
Add check_rx_hds test that verifies header/data split works across
payload sizes. The test sweeps payload sizes from 1 byte to 8KB, if any
data propagates up to userspace as SCM_DEVMEM_LINEAR, then the test
fails. This shows that regardless of payload size, ncdevmem's
configuration of hds-thresh to 0 is respected.

Add -L (--fail-on-linear) flag to ncdevmem that causes the receiver to
fail if any SCM_DEVMEM_LINEAR cmsg is received.

Use socat option for fixed block sizing and tcp nodelay to disable
nagle's algo to avoid buffering.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-4-55d050e6f606@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-17 10:40:14 +01:00
David Carlier
0767684613 tools/sched_ext: scx_userland: fix stale data on restart
Reset all counters, tasks and vruntime_head list on restart.

Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-16 21:02:16 -10:00
David Carlier
cabd76bbc0 tools/sched_ext: scx_flatcg: fix potential stack overflow from VLA in fcg_read_stats
fcg_read_stats() had a VLA allocating 21 * nr_cpus * 8 bytes on the
stack, risking stack overflow on large CPU counts (nr_cpus can be up
to 512).

Fix by using a single heap allocation with the correct size, reusing
it across all stat indices, and freeing it at the end.

Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-16 21:01:18 -10:00
Linus Torvalds
26a4cfaff8 Merge tag 'docs-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux
Pull documentation fixes from Jonathan Corbet:
 "A handful of small, late-arriving documentation fixes"

* tag 'docs-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux:
  docs: toshiba_haps: fix grammar error in SSD warning
  Docs/mm: fix typos and grammar in page_tables.rst
  Docs/core-api: fix typos in rbtree.rst
  docs: clarify wording in programming-language.rst
  docs: process: maintainer-pgp-guide: update kernel.org docs link
  docs: kdoc_parser: allow __exit in function prototypes
2026-02-15 10:47:59 -08:00
Linus Torvalds
64275e9fda Merge tag 'loongarch-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
 - Select HAVE_CMPXCHG_{LOCAL,DOUBLE}
 - Add 128-bit atomic cmpxchg support
 - Add HOTPLUG_SMT implementation
 - Wire up memfd_secret system call
 - Fix boot errors and unwind errors for KASAN
 - Use BPF prog pack allocator and add BPF arena support
 - Update dts files to add nand controllers
 - Some bug fixes and other small changes

* tag 'loongarch-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: dts: loongson-2k1000: Add nand controller support
  LoongArch: dts: loongson-2k0500: Add nand controller support
  LoongArch: BPF: Implement bpf_addr_space_cast instruction
  LoongArch: BPF: Implement PROBE_MEM32 pseudo instructions
  LoongArch: BPF: Use BPF prog pack allocator
  LoongArch: Use IS_ERR_PCPU() macro for KGDB
  LoongArch: Rework KASAN initialization for PTW-enabled systems
  LoongArch: Disable instrumentation for setup_ptwalker()
  LoongArch: Remove some extern variables in source files
  LoongArch: Guard percpu handler under !CONFIG_PREEMPT_RT
  LoongArch: Handle percpu handler address for ORC unwinder
  LoongArch: Use %px to print unmodified unwinding address
  LoongArch: Prefer top-down allocation after arch_mem_init()
  LoongArch: Add HOTPLUG_SMT implementation
  LoongArch: Make cpumask_of_node() robust against NUMA_NO_NODE
  LoongArch: Wire up memfd_secret system call
  LoongArch: Replace seq_printf() with seq_puts() for simple strings
  LoongArch: Add 128-bit atomic cmpxchg support
  LoongArch: Add detection for SC.Q support
  LoongArch: Select HAVE_CMPXCHG_LOCAL in Kconfig
2026-02-14 12:47:15 -08:00
Linus Torvalds
787fe1d43a Merge tag 'memblock-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:

 - update tools/include/linux/mm.h to fix memblock tests compilation

 - drop redundant struct page* parameter from memblock_free_pages() and
   get struct page from the pfn

 - add underflow detection for size calculation in memtest and warn
   about underflow when VM_DEBUG is enabled

* tag 'memblock-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  mm/memtest: add underflow detection for size calculation
  memblock: drop redundant 'struct page *' argument from memblock_free_pages()
  memblock test: include <linux/sizes.h> from tools mm.h stub
2026-02-14 12:39:34 -08:00
Linus Torvalds
770aaedb46 Merge tag 'bootconfig-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig updates from Masami Hiramatsu:

 - Update the bootconfig parser to stop searching for a value when it
   encounters a newline character

 - Update the tests for bootconfig parser to ensure the good examples to
   be parsed correctly by comparing the expected results

* tag 'bootconfig-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  bootconfig: Check the parsed output of the good examples
  bootconfig: Terminate value search if it hits a newline
2026-02-13 19:33:39 -08:00
Linus Torvalds
f50822fd86 Merge tag 'platform-drivers-x86-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Ilpo Järvinen:
 "Highlights:

   - amd/pmf:
      - Avoid overwriting BIOS input values when events occur rapidly
      - Fix PMF driver issues related to S4 (in part on crypto/ccp side)
      - Add NPU metrics API (for accel side consumers)
      - Allow disabling Smart PC function through a module parameter

   - asus-wmi & HID/asus:
      - Unification of backlight control (replaces quirks)
      - Support multiple interfaces for controlling keyboard/RGB brightness
      - Simplify init sequence

   - hp-wmi:
      - Add manual fan control for Victus S models
      - Add fan mode keep-alive
      - Fix platform profile values for Omen 16-wf1xxx
      - Add EC offset to get the thermal profile

   - intel/pmc: Show substate residencies also for non-primary PMCs

   - intel/ISST:
      - Store and restore data for all domains
      - Write interface improvements

   - lenovo-wmi:
      - Support multiple Capability Data
      - Add HWMON reporting and tuning support

   - mellanox/mlx-platform: Add HI173 & HI174 support

   - surface/aggregator_registry: Add Surface Pro 11 (QCOM)

   - thinkpad_acpi: Add support for HW damage detection capability

   - uniwill: Implement cTGP setting

   - wmi:
      - Introduce marshalling support
      - Convert a few drivers to use the new buffer-based WMI API

   - tools/power/x86/intel-speed-select: Allow read operations for non-root

   - Miscellaneous cleanups / refactoring / improvements"

* tag 'platform-drivers-x86-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (68 commits)
  platform/x86: lenovo-wmi-{capdata,other}: Fix HWMON channel visibility
  platform/x86: hp-wmi: Add EC offsets to read Victus S thermal profile
  platform: mellanox: mlx-platform: Add support DGX flavor of next-generation 800GB/s ethernet switch.
  platform: mellanox: mlx-platform: Add support for new Nvidia DGX system based on class VMOD0010
  HID: asus: add support for the asus-wmi brightness handler
  platform/x86: asus-wmi: add keyboard brightness event handler
  platform/x86: asus-wmi: remove unused keyboard backlight quirk
  HID: asus: listen to the asus-wmi brightness device instead of creating one
  platform/x86: asus-wmi: Add support for multiple kbd led handlers
  HID: asus: early return for ROG devices
  HID: asus: move vendor initialization to probe
  HID: asus: fortify keyboard handshake
  HID: asus: use same report_id in response
  HID: asus: initialize additional endpoints only for certain devices
  HID: asus: simplify RGB init sequence
  platform/wmi: string-kunit: Add missing oversized string test case
  platform/x86/amd/pmf: Added a module parameter to disable the Smart PC function
  platform/x86/uniwill: Implement cTGP setting
  platform/x86: uniwill-laptop: Introduce device descriptor system
  platform/x86/amd: Use scope-based cleanup for wbrf_record()
  ...
2026-02-13 15:39:15 -08:00
Menglong Dong
0265c1fd91 selftests/bpf: enable fsession_test on riscv64
Now that the RISC-V trampoline JIT supports BPF_TRACE_FSESSION, run
the fsession selftest on riscv64 as well as x86_64.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Tested-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20260208053311.698352-4-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-13 14:14:27 -08:00
Kumar Kartikeya Dwivedi
2669dde7a8 selftests/bpf: Fix map_kptr grace period wait
Commit c27cea4416 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
broke map_kptr selftest since it removed the function we were kprobing.
Use a new kfunc that invokes call_rcu_tasks_trace and sets a program
provided pointer to an integer to 1. Technically this can be unsafe if
the memory being written to from the callback disappears, but this is
just for usage in a test where we ensure we spin until we see the value
to be set to 1, so it's ok.

Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Fixes: c27cea4416 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260211185747.3630539-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-13 14:14:27 -08:00
Ihor Solodrai
48f624c3dc selftests/bpf: Adjust selftest due to function rename
do_filp_open() was renamed in commit
541003b576 ("rename do_filp_open() to do_file_open()")

This broke test_profiler, because it uses a kretprobe on that
function. Fix it by renaming accordingly.

Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Closes: https://lore.kernel.org/bpf/djwjf2vfb7gro3rfag666bojod6ytcectahnb5z6hx2hawimtj@sx47ghzjg4lw/
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260210235855.215679-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-13 14:14:27 -08:00
Amery Hung
04999b99e8 libbpf: Fix invalid write loop logic in bpf_linker__add_buf()
Fix bpf_linker__add_buf()'s logic of copying data from memory buffer into
memfd. In the event of short write not writing entire buf_sz bytes into memfd
file, we'll append bytes from the beginning of buf *again* (corrupting ELF
file contents) instead of correctly appending the rest of not-yet-read buf
contents.

Closes: https://github.com/libbpf/libbpf/issues/945
Fixes: 6d5e5e5d7c ("libbpf: Extend linker API to support in-memory ELF files")
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260209230134.3530521-1-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-13 14:14:27 -08:00
Emil Tsalapatis
728ff16791 libbpf: Add gating for arena globals relocation feature
Add feature gating for the arena globals relocation introduced in
commit c1f61171d4. The commit depends on a previous commit in the
same patchset that is absent from older kernels
(12a1fe6e12 "bpf/verifier: Do not limit maximum direct offset into arena map").

Without this commit, arena globals relocation with arenas >= 512MiB
fails to load and breaks libbpf's backwards compatibility.

Introduce a libbpf feature to check whether the running kernel allows for
full range ldimm64 offset, and only relocate arena globals if it does.

Fixes: c1f61171d4 ("libbpf: Move arena globals to the end of the arena")
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260210184532.255475-1-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-13 14:14:27 -08:00
Pin-yen Lin
a68a9bd086 selftests: netconsole: Increase port listening timeout
wait_for_port() can wait up to 2 seconds with the sleep and the polling
in wait_local_port_listen() combined. So, in netcons_basic.sh, the socat
process could die before the test writes to the netconsole.

Increase the timeout to 3 seconds to make netcons_basic.sh pass
consistently.

Fixes: 3dc6c76391 ("selftests: net: Add IPv6 support to netconsole basic tests")
Signed-off-by: Pin-yen Lin <treapking@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260210005939.3230550-1-treapking@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-13 12:22:36 -08:00
Len Brown
51496091dd tools/power turbostat: version 2026.02.14
Since release 2025.12.02:

Add L2 statistics columns for recent Intel processors:
	L2MRPS = L2 Cache M-References Per Second
	L2%hit = L2 Cache Hit %

Sort work and output by cpu# rather than core#

This commit:
	Version number and white space (indent -l160)
	No functional change.

Signed-off-by: Len Brown <len.brown@intel.com>
2026-02-13 14:08:33 -06:00
Len Brown
96718ad296 tools/power turbostat: Fix and document --header_iterations
The "header_iterations" option is commonly used to de-clutter
the screen of redundant header label rows in an interactive session:
Eg. every 10 rows:

$ sudo turbostat --header_iterations 10 -S -q -i 1

But --header_iterations was missing from turbostat.8

Also turbostat help advertised the "-N" short option
that did not actually work:

$ turbostat --help
  -N, --header_iterations num
		print header every num iterations

Repair "-N"
Document "--header_iterations" on turbostat.8

Signed-off-by: Len Brown <len.brown@intel.com>
2026-02-13 14:03:17 -06:00
Kaushlendra Kumar
8e5c0cc326 tools/power turbostat: Use strtoul() for iteration parsing
Replace strtod() with strtoul() and check errno for -n/-N options, since
num_iterations and header_iterations are unsigned long counters. Reject
zero and conversion errors; negative inputs wrap to large positive values
per standard unsigned semantics.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2026-02-13 14:03:17 -06:00
Len Brown
a2b4d0f8bf tools/power turbostat: Favor cpu# over core#
Turbostat collects statistics and outputs results in "topology order",
which means it prioritizes the core# over the cpu#.
The strategy is to minimize wakesups to a core -- which is
important when measuring an idle system.

But core order is problematic, because Linux core#'s are physical
(within each package), and thus subject to APIC-id scrambling
that may be done by the hardware or the BIOS.

As a result users may be are faced with rows in a confusing order:

sudo turbostat -q --show topology,Busy%,CPU%c6,UncMHz sleep 1
Core	CPU	Busy%	CPU%c6	UncMHz
-	-	1.25	72.18	3400
0	4	7.74	0.00
1	5	1.77	88.59
2	6	0.48	96.73
3	7	0.21	98.34
4	8	0.14	96.85
5	9	0.26	97.55
6	10	0.44	97.24
7	11	0.12	96.18
8	0	5.41	0.31	3400
8	1	0.19
12	2	0.41	0.22
12	3	0.08
32	12	0.04	99.21
33	13	0.25	94.92

Abandon the legacy "core# topology order" in favor of simply
ordering by cpu#, with a special case to handle HT siblings
that may not have adjacent cpu#'s.

sudo ./turbostat -q --show topology,Busy%,CPU%c6,UncMHz sleep 1
1.003001 sec
Core	CPU	Busy%	CPU%c6	UncMHz
-	-	1.38	80.55	1600
8	0	10.94	0.00	1600
8	1	0.53
12	2	2.90	0.45
12	3	0.11
0	4	1.96	91.20
1	5	0.97	96.40
2	6	0.24	94.72
3	7	0.31	98.01
4	8	0.20	98.20
5	9	0.62	96.00
6	10	0.06	98.15
7	11	0.12	99.31
32	12	0.04	99.07
33	13	0.27	95.09

The result is that cpu#'s now take precedence over core#'s.

Signed-off-by: Len Brown <len.brown@intel.com>
2026-02-13 14:03:10 -06:00
Linus Torvalds
cb5573868e Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "Loongarch:

   - Add more CPUCFG mask bits

   - Improve feature detection

   - Add lazy load support for FPU and binary translation (LBT) register
     state

   - Fix return value for memory reads from and writes to in-kernel
     devices

   - Add support for detecting preemption from within a guest

   - Add KVM steal time test case to tools/selftests

  ARM:

   - Add support for FEAT_IDST, allowing ID registers that are not
     implemented to be reported as a normal trap rather than as an UNDEF
     exception

   - Add sanitisation of the VTCR_EL2 register, fixing a number of
     UXN/PXN/XN bugs in the process

   - Full handling of RESx bits, instead of only RES0, and resulting in
     SCTLR_EL2 being added to the list of sanitised registers

   - More pKVM fixes for features that are not supposed to be exposed to
     guests

   - Make sure that MTE being disabled on the pKVM host doesn't give it
     the ability to attack the hypervisor

   - Allow pKVM's host stage-2 mappings to use the Force Write Back
     version of the memory attributes by using the "pass-through'
     encoding

   - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
     guest

   - Preliminary work for guest GICv5 support

   - A bunch of debugfs fixes, removing pointless custom iterators
     stored in guest data structures

   - A small set of FPSIMD cleanups

   - Selftest fixes addressing the incorrect alignment of page
     allocation

   - Other assorted low-impact fixes and spelling fixes

  RISC-V:

   - Fixes for issues discoverd by KVM API fuzzing in
     kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and
     kvm_riscv_vcpu_aia_imsic_update()

   - Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM

   - Transparent huge page support for hypervisor page tables

   - Adjust the number of available guest irq files based on MMIO
     register sizes found in the device tree or the ACPI tables

   - Add RISC-V specific paging modes to KVM selftests

   - Detect paging mode at runtime for selftests

  s390:

   - Performance improvement for vSIE (aka nested virtualization)

   - Completely new memory management. s390 was a special snowflake that
     enlisted help from the architecture's page table management to
     build hypervisor page tables, in particular enabling sharing the
     last level of page tables. This however was a lot of code (~3K
     lines) in order to support KVM, and also blocked several features.
     The biggest advantages is that the page size of userspace is
     completely independent of the page size used by the guest:
     userspace can mix normal pages, THPs and hugetlbfs as it sees fit,
     and in fact transparent hugepages were not possible before. It's
     also now possible to have nested guests and guests with huge pages
     running on the same host

   - Maintainership change for s390 vfio-pci

   - Small quality of life improvement for protected guests

  x86:

   - Add support for giving the guest full ownership of PMU hardware
     (contexted switched around the fastpath run loop) and allowing
     direct access to data MSRs and PMCs (restricted by the vPMU model).

     KVM still intercepts access to control registers, e.g. to enforce
     event filtering and to prevent the guest from profiling sensitive
     host state. This is more accurate, since it has no risk of
     contention and thus dropped events, and also has significantly less
     overhead.

     For more information, see the commit message for merge commit
     bf2c3138ae ("Merge tag 'kvm-x86-pmu-6.20' ...")

   - Disallow changing the virtual CPU model if L2 is active, for all
     the same reasons KVM disallows change the model after the first
     KVM_RUN

   - Fix a bug where KVM would incorrectly reject host accesses to PV
     MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled,
     even if those were advertised as supported to userspace,

   - Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs,
     where KVM would attempt to read CR3 configuring an async #PF entry

   - Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM
     (for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL.
     Only a few exports that are intended for external usage, and those
     are allowed explicitly

   - When checking nested events after a vCPU is unblocked, ignore
     -EBUSY instead of WARNing. Userspace can sometimes put the vCPU
     into what should be an impossible state, and spurious exit to
     userspace on -EBUSY does not really do anything to solve the issue

   - Also throw in the towel and drop the WARN on INIT/SIPI being
     blocked when vCPU is in Wait-For-SIPI, which also resulted in
     playing whack-a-mole with syzkaller stuffing architecturally
     impossible states into KVM

   - Add support for new Intel instructions that don't require anything
     beyond enumerating feature flags to userspace

   - Grab SRCU when reading PDPTRs in KVM_GET_SREGS2

   - Add WARNs to guard against modifying KVM's CPU caps outside of the
     intended setup flow, as nested VMX in particular is sensitive to
     unexpected changes in KVM's golden configuration

   - Add a quirk to allow userspace to opt-in to actually suppress EOI
     broadcasts when the suppression feature is enabled by the guest
     (currently limited to split IRQCHIP, i.e. userspace I/O APIC).
     Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an
     option as some userspaces have come to rely on KVM's buggy behavior
     (KVM advertises Supress EOI Broadcast irrespective of whether or
     not userspace I/O APIC supports Directed EOIs)

   - Clean up KVM's handling of marking mapped vCPU pages dirty

   - Drop a pile of *ancient* sanity checks hidden behind in KVM's
     unused ASSERT() macro, most of which could be trivially triggered
     by the guest and/or user, and all of which were useless

   - Fold "struct dest_map" into its sole user, "struct rtc_status", to
     make it more obvious what the weird parameter is used for, and to
     allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n

   - Bury all of ioapic.h, i8254.h and related ioctls (including
     KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y

   - Add a regression test for recent APICv update fixes

   - Handle "hardware APIC ISR", a.k.a. SVI, updates in
     kvm_apic_update_apicv() to consolidate the updates, and to
     co-locate SVI updates with the updates for KVM's own cache of ISR
     information

   - Drop a dead function declaration

   - Minor cleanups

  x86 (Intel):

   - Rework KVM's handling of VMCS updates while L2 is active to
     temporarily switch to vmcs01 instead of deferring the update until
     the next nested VM-Exit.

     The deferred updates approach directly contributed to several bugs,
     was proving to be a maintenance burden due to the difficulty in
     auditing the correctness of deferred updates, and was polluting
     "struct nested_vmx" with a growing pile of booleans

   - Fix an SGX bug where KVM would incorrectly try to handle EPCM page
     faults, and instead always reflect them into the guest. Since KVM
     doesn't shadow EPCM entries, EPCM violations cannot be due to KVM
     interference and can't be resolved by KVM

   - Fix a bug where KVM would register its posted interrupt wakeup
     handler even if loading kvm-intel.ko ultimately failed

   - Disallow access to vmcb12 fields that aren't fully supported,
     mostly to avoid weirdness and complexity for FRED and other
     features, where KVM wants enable VMCS shadowing for fields that
     conditionally exist

   - Print out the "bad" offsets and values if kvm-intel.ko refuses to
     load (or refuses to online a CPU) due to a VMCS config mismatch

  x86 (AMD):

   - Drop a user-triggerable WARN on nested_svm_load_cr3() failure

   - Add support for virtualizing ERAPS. Note, correct virtualization of
     ERAPS relies on an upcoming, publicly announced change in the APM
     to reduce the set of conditions where hardware (i.e. KVM) *must*
     flush the RAP

   - Ignore nSVM intercepts for instructions that are not supported
     according to L1's virtual CPU model

   - Add support for expedited writes to the fast MMIO bus, a la VMX's
     fastpath for EPT Misconfig

   - Don't set GIF when clearing EFER.SVME, as GIF exists independently
     of SVM, and allow userspace to restore nested state with GIF=0

   - Treat exit_code as an unsigned 64-bit value through all of KVM

   - Add support for fetching SNP certificates from userspace

   - Fix a bug where KVM would use vmcb02 instead of vmcb01 when
     emulating VMLOAD or VMSAVE on behalf of L2

   - Misc fixes and cleanups

  x86 selftests:

   - Add a regression test for TPR<=>CR8 synchronization and IRQ masking

   - Overhaul selftest's MMU infrastructure to genericize stage-2 MMU
     support, and extend x86's infrastructure to support EPT and NPT
     (for L2 guests)

   - Extend several nested VMX tests to also cover nested SVM

   - Add a selftest for nested VMLOAD/VMSAVE

   - Rework the nested dirty log test, originally added as a regression
     test for PML where KVM logged L2 GPAs instead of L1 GPAs, to
     improve test coverage and to hopefully make the test easier to
     understand and maintain

  guest_memfd:

   - Remove kvm_gmem_populate()'s preparation tracking and half-baked
     hugepage handling. SEV/SNP was the only user of the tracking and it
     can do it via the RMP

   - Retroactively document and enforce (for SNP) that
     KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the
     source page to be 4KiB aligned, to avoid non-trivial complexity for
     something that no known VMM seems to be doing and to avoid an API
     special case for in-place conversion, which simply can't support
     unaligned sources

   - When populating guest_memfd memory, GUP the source page in common
     code and pass the refcounted page to the vendor callback, instead
     of letting vendor code do the heavy lifting. Doing so avoids a
     looming deadlock bug with in-place due an AB-BA conflict betwee
     mmap_lock and guest_memfd's filemap invalidate lock

  Generic:

   - Fix a bug where KVM would ignore the vCPU's selected address space
     when creating a vCPU-specific mapping of guest memory. Actually
     this bug could not be hit even on x86, the only architecture with
     multiple address spaces, but it's a bug nevertheless"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits)
  KVM: s390: Increase permitted SE header size to 1 MiB
  MAINTAINERS: Replace backup for s390 vfio-pci
  KVM: s390: vsie: Fix race in acquire_gmap_shadow()
  KVM: s390: vsie: Fix race in walk_guest_tables()
  KVM: s390: Use guest address to mark guest page dirty
  irqchip/riscv-imsic: Adjust the number of available guest irq files
  RISC-V: KVM: Transparent huge page support
  RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test
  RISC-V: KVM: Allow Zalasr extensions for Guest/VM
  KVM: riscv: selftests: Add riscv vm satp modes
  KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test
  riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM
  RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized
  RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr()
  RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr()
  RISC-V: KVM: Remove unnecessary 'ret' assignment
  KVM: s390: Add explicit padding to struct kvm_s390_keyop
  KVM: LoongArch: selftests: Add steal time test case
  LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side
  LoongArch: KVM: Add paravirt preempt feature in hypervisor side
  ...
2026-02-13 11:31:15 -08:00
Ihor Solodrai
0b82cc331d selftests/sched_ext: Fix rt_stall flaky failure
The rt_stall test measures the runtime ratio between an EXT and an RT
task pinned to the same CPU, verifying that the deadline server prevents
RT tasks from starving SCHED_EXT tasks. It expects the EXT task to get
at least 4% of CPU time.

The test is flaky because sched_stress_test() calls sleep(RUN_TIME)
immediately after fork(), without waiting for the RT child to complete
its setup (set_affinity + set_sched). If the RT child experiences
scheduling latency before completing setup, that delay eats into the
measurement window: the RT child runs for less than RUN_TIME seconds,
and the EXT task's measured ratio drops below the 4% threshold.

For example, in the failing CI run [1]:
  EXT=0.140s RT=4.750s total=4.890s (expected ~5.0s)
  ratio=2.86% < 4% → FAIL

The 110ms gap (5.0 - 4.89) corresponds to the RT child's setup time
being counted inside the measurement window, during which fewer
deadline server ticks fire for the EXT task.

Fix by using pipes to synchronize: each child signals the parent after
completing its setup, and the parent waits for both signals before
starting sleep(RUN_TIME). This ensures the measurement window only
counts time when both tasks are fully configured and competing.

[1] https://github.com/kernel-patches/bpf/actions/runs/21961895809/job/63442490449

Fixes: be621a7634 ("selftests/sched_ext: Add test for sched_ext dl_server")
Assisted-by: claude-opus-4-6-v1
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-13 09:09:39 -10:00
Len Brown
6be5c151eb tools/power turbostat: Expunge logical_cpu_id
There is only once cpu_id name space -- cpu_id.
Expunge the term logical_cpu_id.

No functional change.

Signed-off-by: Len Brown <len.brown@intel.com>
2026-02-13 08:53:17 -06:00
Len Brown
070e92361e tools/power turbostat: Enhance HT enumeration
Record the cpu_id of each CPU HT sibling -- will need this later.

Rename "thread_id" to "ht_id" to disambiguate that the scope
of this id is within a Core -- it is not a global cpu_id.

No functional change.

Signed-off-by: Len Brown <len.brown@intel.com>
2026-02-13 08:53:17 -06:00
Len Brown
ddf60e38ca tools/power turbostat: Simplify global core_id calculation
Standardize the generation of globally unique core_id's
in a macro, and simplify the related code.

No functional change.

Signed-off-by: Len Brown <len.brown@intel.com>
2026-02-13 08:53:17 -06:00