2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

2229 Commits

Author SHA1 Message Date
Namhyung Kim
568901e709 tools/include: Sync uapi/asm-generic/unistd.h with the kernel sources
And arch syscall tables to pick up changes from:

  b1e31c134a powerpc: restore some missing spu syscalls
  d3882564a7 syscalls: fix compat_sys_io_pgetevents_time64 usage
  54233a4254 uretprobe: change syscall number, again
  63ded11097 uprobe: Change uretprobe syscall scope and number
  9142be9e64 x86/syscall: Mark exit[_group] syscall handlers __noreturn
  9aae1baa1c x86, arm: Add missing license tag to syscall tables files
  5c28424e9a syscalls: Fix to add sys_uretprobe to syscall.tbl
  190fec72df uprobe: Wire up uretprobe system call

This should be used to beautify syscall arguments and it addresses
these tools/perf build warnings:

  Warning: Kernel ABI header differences:
  diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
  diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
  diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
  diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl

Please see tools/include/uapi/README for details (it's in the first patch
of this series).

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-08-07 10:58:51 -07:00
Namhyung Kim
8ec9497d3e tools/include: Sync uapi/linux/perf.h with the kernel sources
To pick up changes from:

  608f6976c3 perf/x86/intel: Support new data source for Lunar Lake

This should be used to beautify perf syscall arguments and it addresses
these tools/perf build warnings:

  Warning: Kernel ABI header differences:
  diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h

Please see tools/include/uapi/README for details (it's in the first patch
of this series).

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-08-06 14:34:29 -07:00
Namhyung Kim
a625df3995 tools/include: Sync uapi/linux/kvm.h with the kernel sources
And other arch-specific UAPI headers to pick up changes from:

  4b23e0c199 KVM: Ensure new code that references immediate_exit gets extra scrutiny
  85542adb65 KVM: x86: Add KVM_RUN_X86_GUEST_MODE kvm_run flag
  6fef518594 KVM: x86: Add a capability to configure bus frequency for APIC timer
  34ff659017 x86/sev: Use kernel provided SVSM Calling Areas
  5dcc1e7614 Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux into HEAD
  9a0d2f4995 KVM: PPC: Book3S HV: Add one-reg interface for HASHPKEYR register
  e9eb790b25 KVM: PPC: Book3S HV: Add one-reg interface for HASHKEYR register
  1a1e6865f5 KVM: PPC: Book3S HV: Add one-reg interface for DEXCR register

This should be used to beautify KVM syscall arguments and it addresses
these tools/perf build warnings:

  Warning: Kernel ABI header differences:
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
  diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
  diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
  diff -u tools/arch/powerpc/include/uapi/asm/kvm.h arch/powerpc/include/uapi/asm/kvm.h

Please see tools/include/uapi/README for details (it's in the first patch
of this series).

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-08-06 12:30:32 -07:00
Namhyung Kim
aef21f6b6a tools/include: Sync uapi/drm/i915_drm.h with the kernel sources
To pick up changes from:

  0f1bb41bf3 drm/i915: Support replaying GPU hangs with captured context image

This should be used to beautify DRM syscall arguments and it addresses
these tools/perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h

Please see tools/include/uapi/README for details (it's in the first patch
of this series).

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-08-06 12:30:31 -07:00
Namhyung Kim
fbc05142cc perf tools: Add tools/include/uapi/README
Write down the reason why we keep a copy of headers to the README file
instead of adding it to every commit messages.

Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
Original-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Original-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-08-06 12:30:08 -07:00
Wei Yang
9e3d665384 memblock test: fix implicit declaration of function 'strscpy'
Commit 1e4c64b71c ("mm/memblock: Add "reserve_mem" to reserved named
memory at boot up") introduce the usage of strscpy, which breaks the
memblock test.

Let's define it as strcpy in userspace to fix it.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Link: https://lore.kernel.org/r/20240806010319.29194-5-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2024-08-06 08:21:25 +03:00
Wei Yang
a88cde5769 memblock test: fix implicit declaration of function 'memparse'
Commit 1e4c64b71c ("mm/memblock: Add "reserve_mem" to reserved named
memory at boot up") introduce the usage of memparse(), which is not
defined in memblock test.

Add the definition and link it to fix the build.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Link: https://lore.kernel.org/r/20240806010319.29194-3-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2024-08-06 08:21:25 +03:00
Wei Yang
9f76c2ade3 memblock test: add the definition of __setup()
Commit 1e4c64b71c ("mm/memblock: Add "reserve_mem" to reserved named
memory at boot up") introduce usage of __setup(), which is not defined
in memblock test.

Define it in init.h to fix the build error.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Link: https://lore.kernel.org/r/20240806010319.29194-2-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2024-08-06 08:21:25 +03:00
Wei Yang
39f64e402f memblock test: fix implicit declaration of function 'virt_to_phys'
Commit 94ff46de4a ("memblock: Move late alloc warning down to phys
alloc") introduce the usage of virt_to_phys(), which is not defined in
memblock tests.

Define it in mm.h to fix the build error.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Link: https://lore.kernel.org/r/20240806010319.29194-1-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2024-08-06 08:20:03 +03:00
Wei Yang
e2ae9cf39f tools/testing: abstract two init.h into common include directory
Currently we have two test suits define its own init.h. This is a little
redundant.

Let's create a init.h in common include directory and merge these two
into it.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Mike Rapoport <rppt@kernel.org>
CC: Liam R. Howlett <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/r/20240712035138.24674-3-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2024-08-06 08:17:50 +03:00
Wei Yang
d68c08173b memblock tests: include export.h in linkage.h as kernel dose
In kernel code, linkage.h includes export.h. Let's sync with kernel.

This is a preparation for move init.h in common include directory.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Mike Rapoport <rppt@kernel.org>
Link: https://lore.kernel.org/r/20240712035138.24674-2-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2024-08-06 08:17:49 +03:00
Athira Rajeev
06dd4c5a56 perf annotate: Add disasm_line__parse() to parse raw instruction for powerpc
Currently, the perf tool infrastructure uses the disasm_line__parse
function to parse disassembled line.

Example snippet from objdump:

  objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>

  c0000000010224b4:	lwz     r10,0(r9)

This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset.

In powerpc, the approach for data type profiling uses raw instruction
instead of result from objdump to identify the instruction category and
extract the source/target registers.

Example: 38 01 81 e8     ld      r4,312(r1)

Here "38 01 81 e8" is the raw instruction representation. Add function
"disasm_line__parse_powerpc" to handle parsing of raw instruction.
Also update "struct disasm_line" to save the binary code/
With the change, function captures:

line -> "38 01 81 e8     ld      r4,312(r1)"
raw instruction "38 01 81 e8"

Raw instruction is used later to extract the reg/offset fields. Macros
are added to extract opcode and register fields. "struct disasm_line"
is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
code (raw).

Function "disasm_line__parse_powerpc fills the raw instruction hex value
and can use macros to get opcode. There is no changes in existing code
paths, which parses the disassembled code.  The size of raw instruction
depends on architecture.

In case of powerpc, the parsing the disasm line needs to handle cases
for reading binary code directly from DSO as well as parsing the objdump
result. Hence adding the logic into separate function instead of
updating "disasm_line__parse".  The architecture using the instruction
name and present approach is not altered. Since this approach targets
powerpc, the macro implementation is added for powerpc as of now.

Since the disasm_line__parse is used in other cases (perf annotate) and
not only data tye profiling, the powerpc callback includes changes to
work with binary code as well as mnemonic representation.

Also in case if the DSO read fails and libcapstone is not supported, the
approach fallback to use objdump as option. Hence as option, patch has
changes to ensure objdump option also works well.

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Akanksha J N <akanksha@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://lore.kernel.org/lkml/20240718084358.72242-5-atrajeev@linux.vnet.ibm.com
[ Add check for strndup() result ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31 16:12:59 -03:00
Thomas Weißschuh
ae1f550efc tools/nolibc: add stdbool.h header
stdbool.h is very simple.
Provide an implementation for the user convenience.

Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20240725-nolibc-stdbool-v1-1-a6ee2c80bcde@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2024-07-29 07:52:57 +02:00
Thomas Weißschuh
6ea2987c9a tools/nolibc: include arch.h from string.h
string.h tests for the macros NOLIBC_ARCH_HAS_$FUNC to use the
architecture-optimized function variants.
However if string.h is included before arch.h header then that check
does not work, leading to duplicate function definitions.

Fixes: 553845eebd ("tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`")
Fixes: 12108aa8c1 ("tools/nolibc: x86-64: Use `rep stosb` for `memset()`")
Cc: stable@vger.kernel.org
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20240725-arch-has-func-v1-1-5521ed354acd@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2024-07-29 07:52:56 +02:00
Linus Torvalds
51c4767503 bitmap-6.11-rc1
Random fixes for v6.11.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmahKbIACgkQsUSA/Tof
 vsh8zQwAvguyeNubDFqdMe3E/Vp1J3WqXsBFzbE1rGLCyI2S0cgJFL5BlW51zY47
 70wLt9EmroEobwj1qHSQlzejNp31kSBQ1Sqq25oivfJqEF1elDT5PQxYqBbU1C9Y
 kVWnxtb+oKaoFd5jiBK8+iTl8dXjT6H2RoV0zpPab/JPcqsjwFfkUvtENt/Kpo5c
 aRrGTFwshdp5eT4sEZQv57VKroBcwZOvv2//qrklFHrJHl4pjMT8eaX3twcQysoy
 umTVt+TK6NErLnht+VRQJ2/L02FKi7b+bHePVgNzaT+1FSDMT4FltmZd96Xwbzah
 hSkwWtqy0N2gaTcqie9nwdEiCJGjF39M7k2wangUS91CeDsbIUSsJgDCESUCm+zK
 hRqleGOnoeg4+jZBci7M53lKa/pADlmLhnU8iAc3BSKozsaioltkT+hHn8vAkstk
 h/kHlbfkzasufUWAhduBpIn384gWWEY6RACffgCsOuvbT+ZyDKUJpKYaEwVx+Pri
 l72j0hs9
 =RbET
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-6.11-rc1' of https://github.com:/norov/linux

Pull bitmap updates from Yury Norov:
 "Random fixes"

* tag 'bitmap-6.11-rc1' of https://github.com:/norov/linux:
  riscv: Remove unnecessary int cast in variable_fls()
  radix tree test suite: put definition of bitmap_clear() into lib/bitmap.c
  bitops: Add a comment explaining the double underscore macros
  lib: bitmap: add missing MODULE_DESCRIPTION() macros
  cpumask: introduce assign_cpu() macro
2024-07-26 09:50:36 -07:00
Linus Torvalds
1722389b0d A lot of networking people were at a conference last week, busy
catching COVID, so relatively short PR. Including fixes from bpf
 and netfilter.
 
 Current release - regressions:
 
  - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP
 
 Current release - new code bugs:
 
  - l2tp: protect session IDR and tunnel session list with one lock,
    make sure the state is coherent to avoid a warning
 
  - eth: bnxt_en: update xdp_rxq_info in queue restart logic
 
  - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field
 
 Previous releases - regressions:
 
  - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
    the field reuses previously un-validated pad
 
 Previous releases - always broken:
 
  - tap/tun: drop short frames to prevent crashes later in the stack
 
  - eth: ice: add a per-VF limit on number of FDIR filters
 
  - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmaibxAACgkQMUZtbf5S
 IruuIRAAu96TiN/urPwmKznyb/Sk8x7p8iUzn6OvPS/TUlFUkURQtOh6M9uvbpN4
 x/L//EWkMR0hY4SkBegoiXfb1GS0PjBdWTWUiROm5X9nVHqp5KRZAxWXhjFiS1BO
 BIYOT+JfCl7mQiPs90Mys/cEtYOggMBsCZQVIGw/iYoJLFREqxFSONwa0dG+tGMX
 jn9WNu4yCVDhJ/jtl2MaTsCNtYUaBUgYrKHJBfNGfJ2Lz/7rH9yFui2WSMlmOd/U
 QGeCb1DWURlShlCqY37wNinbFsxWkI5JN00ukTtwFAXLIaqc+zgHcIjrDjTJwK43
 F4tKbJT3+bmehMU/h3Uo3c7DhXl7n9zDGiDtbCxnkykp0sFGJpjhDrWydo51c+YB
 qW5HaNrII2LiDicOVN8L29ylvKp7AEkClxgivEhZVGGk2f/szJRXfp9u3WBn5kAx
 3paH55YN0DEsKbYbb1ZENEI1Vnc/4ff4PxZJCUNKwzcS8wCn1awqwcriK9TjS/cp
 fjilNFT4J3/uFrodHWTkx0jJT6UJFT0aF03qPLUH/J5kG+EVukOf1jBPInNdf1si
 1j47SpblHUe86HiHphFMt32KZ210lJzWxh8uGma57Y2sB9makdLiK4etrFjkiMJJ
 Z8A3kGp3KpFjbuK4tHY25rp+5oxLNNOBNpay29lQrWtCL/NDcaQ=
 =9OsH
 -----END PGP SIGNATURE-----

Merge tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf and netfilter.

  A lot of networking people were at a conference last week, busy
  catching COVID, so relatively short PR.

  Current release - regressions:

   - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP

  Current release - new code bugs:

   - l2tp: protect session IDR and tunnel session list with one lock,
     make sure the state is coherent to avoid a warning

   - eth: bnxt_en: update xdp_rxq_info in queue restart logic

   - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field

  Previous releases - regressions:

   - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
     the field reuses previously un-validated pad

  Previous releases - always broken:

   - tap/tun: drop short frames to prevent crashes later in the stack

   - eth: ice: add a per-VF limit on number of FDIR filters

   - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash"

* tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
  tun: add missing verification for short frame
  tap: add missing verification for short frame
  mISDN: Fix a use after free in hfcmulti_tx()
  gve: Fix an edge case for TSO skb validity check
  bnxt_en: update xdp_rxq_info in queue restart logic
  tcp: process the 3rd ACK with sk_socket for TFO/MPTCP
  selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
  xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
  bpf: Fix a segment issue when downgrading gso_size
  net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling
  MAINTAINERS: make Breno the netconsole maintainer
  MAINTAINERS: Update bonding entry
  net: nexthop: Initialize all fields in dumped nexthops
  net: stmmac: Correct byte order of perfect_match
  selftests: forwarding: skip if kernel not support setting bridge fdb learning limit
  tipc: Return non-zero value from tipc_udp_addr2str() on error
  netfilter: nft_set_pipapo_avx2: disable softinterrupts
  ice: Fix recipe read procedure
  ice: Add a per-VF limit on number of FDIR filters
  net: bonding: correctly annotate RCU in bond_should_notify_peers()
  ...
2024-07-25 13:32:25 -07:00
Jakub Kicinski
f7578df913 bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZqIl1AAKCRDbK58LschI
 g/MdAP9oyZV9/IZ6Y6Z1fWfio0SB+yJGugcwbFjWcEtNrzsqJQEAwipQnemAI4NC
 HBMfK2a/w7vhAFMXrP/SbkB/gUJJ7QE=
 =vovf
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-07-25

We've added 14 non-merge commits during the last 8 day(s) which contain
a total of 19 files changed, 177 insertions(+), 70 deletions(-).

The main changes are:

1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and
   BPF sockhash. Also add test coverage for this case, from Michal Luczaj.

2) Fix a segmentation issue when downgrading gso_size in the BPF helper
   bpf_skb_adjust_room(), from Fred Li.

3) Fix a compiler warning in resolve_btfids due to a missing type cast,
   from Liwei Song.

4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte
   boundary in the fexit_sleep BPF selftest, from Puranjay Mohan.

5) Fix a xsk regression to require a flag when actuating tx_metadata_len,
   from Stanislav Fomichev.

6) Fix function prototype BTF dumping in libbpf for prototypes that have
   no input arguments, from Andrii Nakryiko.

7) Fix stacktrace symbol resolution in perf script for BPF programs
   containing subprograms, from Hou Tao.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
  xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
  bpf: Fix a segment issue when downgrading gso_size
  tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids
  bpf, events: Use prog to emit ksymbol event for main program
  selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB
  selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags
  selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected()
  af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash
  bpftool: Fix typo in usage help
  libbpf: Fix no-args func prototype BTF dumping syntax
  MAINTAINERS: Update powerpc BPF JIT maintainers
  MAINTAINERS: Update email address of Naveen
  selftests/bpf: fexit_sleep: Fix stack allocation for arm64
====================

Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-25 07:40:25 -07:00
Stanislav Fomichev
9b9969c40b selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
This flag is now required to use tx_metadata_len.

Fixes: 40808a237d ("selftests/bpf: Add TX side to xdp_metadata")
Reported-by: Julian Schindel <mail@arctic-alpaca.de>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20240713015253.121248-3-sdf@fomichev.me
2024-07-25 11:57:33 +02:00
Linus Torvalds
7a3fad30fd Random number generator updates for Linux 6.11-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmaarzgACgkQSfxwEqXe
 A66ZWBAAlhXx8bve0uKlDRK8fffWHgruho/fOY4lZJ137AKwA9JCtmOyqdfL4Dmk
 VxFe7pEQJlQhcA/6kH54uO7SBXwfKlKZJth6SYnaCRMUIbFifHjjIQ0QqldjEKi0
 rP90Hu4FVsbwQC7u9i9lQj9n2P36zb6pn83BzpZQ/2PtoVCSCrdSJUe0Rxa3H3GN
 0+nNkDSXQt5otCByLaeE3x7KJgXLWL9+G2eFSFLTZ8rSVfMx1CdOIAG37WlLGdWm
 BaFYPDKMyBTVvVJBNgAe9YSqtrsZ5nlmLz+Z9wAe/hTL7RlL03kWUu34/Udcpull
 zzMDH0WMntiGK3eFQ2gOYSWqypvAjwHgn3BzqNmjUb69+89mZsdU1slcvnxWsUwU
 D3vphrscaqarF629tfsXti3jc5PoXwUTjROZVcCyeFPBhyAZgzK8xUvPpJO+RT+K
 EuUABob9cpA6FCpW/QeolDmMDhXlNT8QgsZu1juokZac2xP3Ly3REyEvT7HLbU2W
 ZJjbEqm1ppp3RmGELUOJbyhwsLrnbt+OMDO7iEWoG8aSFK4diBK/ZM6WvLMkr8Oi
 7ioXGIsYkCy3c47wpZKTrAapOPJp5keqNAiHSEbXw8mozp6429QAEZxNOcczgHKC
 Ea2JzRkctqutcIT+Slw/uUe//i1iSsIHXbE81fp5udcQTJcUByo=
 =P8aI
 -----END PGP SIGNATURE-----

Merge tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator updates from Jason Donenfeld:
 "This adds getrandom() support to the vDSO.

  First, it adds a new kind of mapping to mmap(2), MAP_DROPPABLE, which
  lets the kernel zero out pages anytime under memory pressure, which
  enables allocating memory that never gets swapped to disk but also
  doesn't count as being mlocked.

  Then, the vDSO implementation of getrandom() is introduced in a
  generic manner and hooked into random.c.

  Next, this is implemented on x86. (Also, though it's not ready for
  this pull, somebody has begun an arm64 implementation already)

  Finally, two vDSO selftests are added.

  There are also two housekeeping cleanup commits"

* tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  MAINTAINERS: add random.h headers to RNG subsection
  random: note that RNDGETPOOL was removed in 2.6.9-rc2
  selftests/vDSO: add tests for vgetrandom
  x86: vdso: Wire up getrandom() vDSO implementation
  random: introduce generic vDSO getrandom() implementation
  mm: add MAP_DROPPABLE for designating always lazily freeable mappings
2024-07-24 10:29:50 -07:00
Linus Torvalds
fbc90c042c - 875fa64577da ("mm/hugetlb_vmemmap: fix race with speculative PFN
walkers") is known to cause a performance regression
   (https://lore.kernel.org/all/3acefad9-96e5-4681-8014-827d6be71c7a@linux.ibm.com/T/#mfa809800a7862fb5bdf834c6f71a3a5113eb83ff).
   Yu has a fix which I'll send along later via the hotfixes branch.
 
 - In the series "mm: Avoid possible overflows in dirty throttling" Jan
   Kara addresses a couple of issues in the writeback throttling code.
   These fixes are also targetted at -stable kernels.
 
 - Ryusuke Konishi's series "nilfs2: fix potential issues related to
   reserved inodes" does that.  This should actually be in the
   mm-nonmm-stable tree, along with the many other nilfs2 patches.  My bad.
 
 - More folio conversions from Kefeng Wang in the series "mm: convert to
   folio_alloc_mpol()"
 
 - Kemeng Shi has sent some cleanups to the writeback code in the series
   "Add helper functions to remove repeated code and improve readability of
   cgroup writeback"
 
 - Kairui Song has made the swap code a little smaller and a little
   faster in the series "mm/swap: clean up and optimize swap cache index".
 
 - In the series "mm/memory: cleanly support zeropage in
   vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
   Hildenbrand has reworked the rather sketchy handling of the use of the
   zeropage in MAP_SHARED mappings.  I don't see any runtime effects here -
   more a cleanup/understandability/maintainablity thing.
 
 - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of
   higher addresses, for aarch64.  The (poorly named) series is
   "Restructure va_high_addr_switch".
 
 - The core TLB handling code gets some cleanups and possible slight
   optimizations in Bang Li's series "Add update_mmu_tlb_range() to
   simplify code".
 
 - Jane Chu has improved the handling of our
   fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the
   series "Enhance soft hwpoison handling and injection".
 
 - Jeff Johnson has sent a billion patches everywhere to add
   MODULE_DESCRIPTION() to everything.  Some landed in this pull.
 
 - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has
   simplified migration's use of hardware-offload memory copying.
 
 - Yosry Ahmed performs more folio API conversions in his series "mm:
   zswap: trivial folio conversions".
 
 - In the series "large folios swap-in: handle refault cases first",
   Chuanhua Han inches us forward in the handling of large pages in the
   swap code.  This is a cleanup and optimization, working toward the end
   objective of full support of large folio swapin/out.
 
 - In the series "mm,swap: cleanup VMA based swap readahead window
   calculation", Huang Ying has contributed some cleanups and a possible
   fixlet to his VMA based swap readahead code.
 
 - In the series "add mTHP support for anonymous shmem" Baolin Wang has
   taught anonymous shmem mappings to use multisize THP.  By default this
   is a no-op - users must opt in vis sysfs controls.  Dramatic
   improvements in pagefault latency are realized.
 
 - David Hildenbrand has some cleanups to our remaining use of
   page_mapcount() in the series "fs/proc: move page_mapcount() to
   fs/proc/internal.h".
 
 - David also has some highmem accounting cleanups in the series
   "mm/highmem: don't track highmem pages manually".
 
 - Build-time fixes and cleanups from John Hubbard in the series
   "cleanups, fixes, and progress towards avoiding "make headers"".
 
 - Cleanups and consolidation of the core pagemap handling from Barry
   Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
   and utilize them".
 
 - Lance Yang's series "Reclaim lazyfree THP without splitting" has
   reduced the latency of the reclaim of pmd-mapped THPs under fairly
   common circumstances.  A 10x speedup is seen in a microbenchmark.
 
   It does this by punting to aother CPU but I guess that's a win unless
   all CPUs are pegged.
 
 - hugetlb_cgroup cleanups from Xiu Jianfeng in the series
   "mm/hugetlb_cgroup: rework on cftypes".
 
 - Miaohe Lin's series "Some cleanups for memory-failure" does just that
   thing.
 
 - Is anyone reading this stuff?  If so, email me!
 
 - Someone other than SeongJae has developed a DAMON feature in Honggyu
   Kim's series "DAMON based tiered memory management for CXL memory".
   This adds DAMON features which may be used to help determine the
   efficiency of our placement of CXL/PCIe attached DRAM.
 
 - DAMON user API centralization and simplificatio work in SeongJae
   Park's series "mm/damon: introduce DAMON parameters online commit
   function".
 
 - In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
   David Hildenbrand does some maintenance work on zsmalloc - partially
   modernizing its use of pageframe fields.
 
 - Kefeng Wang provides more folio conversions in the series "mm: remove
   page_maybe_dma_pinned() and page_mkclean()".
 
 - More cleanup from David Hildenbrand, this time in the series
   "mm/memory_hotplug: use PageOffline() instead of PageReserved() for
   !ZONE_DEVICE".  It "enlightens memory hotplug more about PageOffline()
   pages" and permits the removal of some virtio-mem hacks.
 
 - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
   __folio_add_anon_rmap()" is a cleanup to the anon folio handling in
   preparation for mTHP (multisize THP) swapin.
 
 - Kefeng Wang's series "mm: improve clear and copy user folio"
   implements more folio conversions, this time in the area of large folio
   userspace copying.
 
 - The series "Docs/mm/damon/maintaier-profile: document a mailing tool
   and community meetup series" tells people how to get better involved
   with other DAMON developers.  From SeongJae Park.
 
 - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
   that.
 
 - David Hildenbrand sends along more cleanups, this time against the
   migration code.  The series is "mm/migrate: move NUMA hinting fault
   folio isolation + checks under PTL".
 
 - Jan Kara has found quite a lot of strangenesses and minor errors in
   the readahead code.  He addresses this in the series "mm: Fix various
   readahead quirks".
 
 - SeongJae Park's series "selftests/damon: test DAMOS tried regions and
   {min,max}_nr_regions" adds features and addresses errors in DAMON's self
   testing code.
 
 - Gavin Shan has found a userspace-triggerable WARN in the pagecache
   code.  The series "mm/filemap: Limit page cache size to that supported
   by xarray" addresses this.  The series is marked cc:stable.
 
 - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
   and cleanup" cleans up and slightly optimizes KSM.
 
 - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
   code motion.  The series (which also makes the memcg-v1 code
   Kconfigurable) are
 
   "mm: memcg: separate legacy cgroup v1 code and put under config
   option" and
   "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1"
 
 - Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
   adds an additional feature to this cgroup-v2 control file.
 
 - The series "Userspace controls soft-offline pages" from Jiaqi Yan
   permits userspace to stop the kernel's automatic treatment of excessive
   correctable memory errors.  In order to permit userspace to monitor and
   handle this situation.
 
 - Kefeng Wang's series "mm: migrate: support poison recover from migrate
   folio" teaches the kernel to appropriately handle migration from
   poisoned source folios rather than simply panicing.
 
 - SeongJae Park's series "Docs/damon: minor fixups and improvements"
   does those things.
 
 - In the series "mm/zsmalloc: change back to per-size_class lock"
   Chengming Zhou improves zsmalloc's scalability and memory utilization.
 
 - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
   pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare
   refcount increments.  So these paes can first be moved aside if they
   reside in the movable zone or a CMA block.
 
 - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps
   for much faster reading of vma information.  The series is "query VMAs
   from /proc/<pid>/maps".
 
 - In the series "mm: introduce per-order mTHP split counters" Lance Yang
   improves the kernel's presentation of developer information related to
   multisize THP splitting.
 
 - Michael Ellerman has developed the series "Reimplement huge pages
   without hugepd on powerpc (8xx, e500, book3s/64)".  This permits
   userspace to use all available huge page sizes.
 
 - In the series "revert unconditional slab and page allocator fault
   injection calls" Vlastimil Babka removes a performance-affecting and not
   very useful feature from slab fault injection.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2C+QAKCRDdBJ7gKXxA
 joTkAQDvjqOoFStqk4GU3OXMYB7WCU/ZQMFG0iuu1EEwTVDZ4QEA8CnG7seek1R3
 xEoo+vw0sWWeLV3qzsxnCA1BJ8cTJA8=
 =z0Lf
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - In the series "mm: Avoid possible overflows in dirty throttling" Jan
   Kara addresses a couple of issues in the writeback throttling code.
   These fixes are also targetted at -stable kernels.

 - Ryusuke Konishi's series "nilfs2: fix potential issues related to
   reserved inodes" does that. This should actually be in the
   mm-nonmm-stable tree, along with the many other nilfs2 patches. My
   bad.

 - More folio conversions from Kefeng Wang in the series "mm: convert to
   folio_alloc_mpol()"

 - Kemeng Shi has sent some cleanups to the writeback code in the series
   "Add helper functions to remove repeated code and improve readability
   of cgroup writeback"

 - Kairui Song has made the swap code a little smaller and a little
   faster in the series "mm/swap: clean up and optimize swap cache
   index".

 - In the series "mm/memory: cleanly support zeropage in
   vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
   Hildenbrand has reworked the rather sketchy handling of the use of
   the zeropage in MAP_SHARED mappings. I don't see any runtime effects
   here - more a cleanup/understandability/maintainablity thing.

 - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling
   of higher addresses, for aarch64. The (poorly named) series is
   "Restructure va_high_addr_switch".

 - The core TLB handling code gets some cleanups and possible slight
   optimizations in Bang Li's series "Add update_mmu_tlb_range() to
   simplify code".

 - Jane Chu has improved the handling of our
   fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in
   the series "Enhance soft hwpoison handling and injection".

 - Jeff Johnson has sent a billion patches everywhere to add
   MODULE_DESCRIPTION() to everything. Some landed in this pull.

 - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang
   has simplified migration's use of hardware-offload memory copying.

 - Yosry Ahmed performs more folio API conversions in his series "mm:
   zswap: trivial folio conversions".

 - In the series "large folios swap-in: handle refault cases first",
   Chuanhua Han inches us forward in the handling of large pages in the
   swap code. This is a cleanup and optimization, working toward the end
   objective of full support of large folio swapin/out.

 - In the series "mm,swap: cleanup VMA based swap readahead window
   calculation", Huang Ying has contributed some cleanups and a possible
   fixlet to his VMA based swap readahead code.

 - In the series "add mTHP support for anonymous shmem" Baolin Wang has
   taught anonymous shmem mappings to use multisize THP. By default this
   is a no-op - users must opt in vis sysfs controls. Dramatic
   improvements in pagefault latency are realized.

 - David Hildenbrand has some cleanups to our remaining use of
   page_mapcount() in the series "fs/proc: move page_mapcount() to
   fs/proc/internal.h".

 - David also has some highmem accounting cleanups in the series
   "mm/highmem: don't track highmem pages manually".

 - Build-time fixes and cleanups from John Hubbard in the series
   "cleanups, fixes, and progress towards avoiding "make headers"".

 - Cleanups and consolidation of the core pagemap handling from Barry
   Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
   and utilize them".

 - Lance Yang's series "Reclaim lazyfree THP without splitting" has
   reduced the latency of the reclaim of pmd-mapped THPs under fairly
   common circumstances. A 10x speedup is seen in a microbenchmark.

   It does this by punting to aother CPU but I guess that's a win unless
   all CPUs are pegged.

 - hugetlb_cgroup cleanups from Xiu Jianfeng in the series
   "mm/hugetlb_cgroup: rework on cftypes".

 - Miaohe Lin's series "Some cleanups for memory-failure" does just that
   thing.

 - Someone other than SeongJae has developed a DAMON feature in Honggyu
   Kim's series "DAMON based tiered memory management for CXL memory".
   This adds DAMON features which may be used to help determine the
   efficiency of our placement of CXL/PCIe attached DRAM.

 - DAMON user API centralization and simplificatio work in SeongJae
   Park's series "mm/damon: introduce DAMON parameters online commit
   function".

 - In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
   David Hildenbrand does some maintenance work on zsmalloc - partially
   modernizing its use of pageframe fields.

 - Kefeng Wang provides more folio conversions in the series "mm: remove
   page_maybe_dma_pinned() and page_mkclean()".

 - More cleanup from David Hildenbrand, this time in the series
   "mm/memory_hotplug: use PageOffline() instead of PageReserved() for
   !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline()
   pages" and permits the removal of some virtio-mem hacks.

 - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
   __folio_add_anon_rmap()" is a cleanup to the anon folio handling in
   preparation for mTHP (multisize THP) swapin.

 - Kefeng Wang's series "mm: improve clear and copy user folio"
   implements more folio conversions, this time in the area of large
   folio userspace copying.

 - The series "Docs/mm/damon/maintaier-profile: document a mailing tool
   and community meetup series" tells people how to get better involved
   with other DAMON developers. From SeongJae Park.

 - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
   that.

 - David Hildenbrand sends along more cleanups, this time against the
   migration code. The series is "mm/migrate: move NUMA hinting fault
   folio isolation + checks under PTL".

 - Jan Kara has found quite a lot of strangenesses and minor errors in
   the readahead code. He addresses this in the series "mm: Fix various
   readahead quirks".

 - SeongJae Park's series "selftests/damon: test DAMOS tried regions and
   {min,max}_nr_regions" adds features and addresses errors in DAMON's
   self testing code.

 - Gavin Shan has found a userspace-triggerable WARN in the pagecache
   code. The series "mm/filemap: Limit page cache size to that supported
   by xarray" addresses this. The series is marked cc:stable.

 - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
   and cleanup" cleans up and slightly optimizes KSM.

 - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
   code motion. The series (which also makes the memcg-v1 code
   Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put
   under config option" and "mm: memcg: put cgroup v1-specific memcg
   data under CONFIG_MEMCG_V1"

 - Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
   adds an additional feature to this cgroup-v2 control file.

 - The series "Userspace controls soft-offline pages" from Jiaqi Yan
   permits userspace to stop the kernel's automatic treatment of
   excessive correctable memory errors. In order to permit userspace to
   monitor and handle this situation.

 - Kefeng Wang's series "mm: migrate: support poison recover from
   migrate folio" teaches the kernel to appropriately handle migration
   from poisoned source folios rather than simply panicing.

 - SeongJae Park's series "Docs/damon: minor fixups and improvements"
   does those things.

 - In the series "mm/zsmalloc: change back to per-size_class lock"
   Chengming Zhou improves zsmalloc's scalability and memory
   utilization.

 - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
   pinning memfd folios" makes the GUP code use FOLL_PIN rather than
   bare refcount increments. So these paes can first be moved aside if
   they reside in the movable zone or a CMA block.

 - Andrii Nakryiko has added a binary ioctl()-based API to
   /proc/pid/maps for much faster reading of vma information. The series
   is "query VMAs from /proc/<pid>/maps".

 - In the series "mm: introduce per-order mTHP split counters" Lance
   Yang improves the kernel's presentation of developer information
   related to multisize THP splitting.

 - Michael Ellerman has developed the series "Reimplement huge pages
   without hugepd on powerpc (8xx, e500, book3s/64)". This permits
   userspace to use all available huge page sizes.

 - In the series "revert unconditional slab and page allocator fault
   injection calls" Vlastimil Babka removes a performance-affecting and
   not very useful feature from slab fault injection.

* tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits)
  mm/mglru: fix ineffective protection calculation
  mm/zswap: fix a white space issue
  mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
  mm/hugetlb: fix possible recursive locking detected warning
  mm/gup: clear the LRU flag of a page before adding to LRU batch
  mm/numa_balancing: teach mpol_to_str about the balancing mode
  mm: memcg1: convert charge move flags to unsigned long long
  alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting
  lib: reuse page_ext_data() to obtain codetag_ref
  lib: add missing newline character in the warning message
  mm/mglru: fix overshooting shrinker memory
  mm/mglru: fix div-by-zero in vmpressure_calc_level()
  mm/kmemleak: replace strncpy() with strscpy()
  mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC
  mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB
  mm: ignore data-race in __swap_writepage
  hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr
  mm: shmem: rename mTHP shmem counters
  mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async()
  mm/migrate: putback split folios when numa hint migration fails
  ...
2024-07-21 17:15:46 -07:00
Linus Torvalds
2c9b351240 ARM:
* Initial infrastructure for shadow stage-2 MMUs, as part of nested
   virtualization enablement
 
 * Support for userspace changes to the guest CTR_EL0 value, enabling
   (in part) migration of VMs between heterogenous hardware
 
 * Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of
   the protocol
 
 * FPSIMD/SVE support for nested, including merged trap configuration
   and exception routing
 
 * New command-line parameter to control the WFx trap behavior under KVM
 
 * Introduce kCFI hardening in the EL2 hypervisor
 
 * Fixes + cleanups for handling presence/absence of FEAT_TCRX
 
 * Miscellaneous fixes + documentation updates
 
 LoongArch:
 
 * Add paravirt steal time support.
 
 * Add support for KVM_DIRTY_LOG_INITIALLY_SET.
 
 * Add perf kvm-stat support for loongarch.
 
 RISC-V:
 
 * Redirect AMO load/store access fault traps to guest
 
 * perf kvm stat support
 
 * Use guest files for IMSIC virtualization, when available
 
 ONE_REG support for the Zimop, Zcmop, Zca, Zcf, Zcd, Zcb and Zawrs ISA
 extensions is coming through the RISC-V tree.
 
 s390:
 
 * Assortment of tiny fixes which are not time critical
 
 x86:
 
 * Fixes for Xen emulation.
 
 * Add a global struct to consolidate tracking of host values, e.g. EFER
 
 * Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC
   bus frequency, because TDX.
 
 * Print the name of the APICv/AVIC inhibits in the relevant tracepoint.
 
 * Clean up KVM's handling of vendor specific emulation to consistently act on
   "compatible with Intel/AMD", versus checking for a specific vendor.
 
 * Drop MTRR virtualization, and instead always honor guest PAT on CPUs
   that support self-snoop.
 
 * Update to the newfangled Intel CPU FMS infrastructure.
 
 * Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads
   '0' and writes from userspace are ignored.
 
 * Misc cleanups
 
 x86 - MMU:
 
 * Small cleanups, renames and refactoring extracted from the upcoming
   Intel TDX support.
 
 * Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't
   hold leafs SPTEs.
 
 * Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager
   page splitting, to avoid stalling vCPUs when splitting huge pages.
 
 * Bug the VM instead of simply warning if KVM tries to split a SPTE that is
   non-present or not-huge.  KVM is guaranteed to end up in a broken state
   because the callers fully expect a valid SPTE, it's all but dangerous
   to let more MMU changes happen afterwards.
 
 x86 - AMD:
 
 * Make per-CPU save_area allocations NUMA-aware.
 
 * Force sev_es_host_save_area() to be inlined to avoid calling into an
   instrumentable function from noinstr code.
 
 * Base support for running SEV-SNP guests.  API-wise, this includes
   a new KVM_X86_SNP_VM type, encrypting/measure the initial image into
   guest memory, and finalizing it before launching it.  Internally,
   there are some gmem/mmu hooks needed to prepare gmem-allocated pages
   before mapping them into guest private memory ranges.
 
   This includes basic support for attestation guest requests, enough to
   say that KVM supports the GHCB 2.0 specification.
 
   There is no support yet for loading into the firmware those signing
   keys to be used for attestation requests, and therefore no need yet
   for the host to provide certificate data for those keys.  To support
   fetching certificate data from userspace, a new KVM exit type will be
   needed to handle fetching the certificate from userspace. An attempt to
   define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle
   this was introduced in v1 of this patchset, but is still being discussed
   by community, so for now this patchset only implements a stub version
   of SNP Extended Guest Requests that does not provide certificate data.
 
 x86 - Intel:
 
 * Remove an unnecessary EPT TLB flush when enabling hardware.
 
 * Fix a series of bugs that cause KVM to fail to detect nested pending posted
   interrupts as valid wake eents for a vCPU executing HLT in L2 (with
   HLT-exiting disable by L1).
 
 * KVM: x86: Suppress MMIO that is triggered during task switch emulation
 
   Explicitly suppress userspace emulated MMIO exits that are triggered when
   emulating a task switch as KVM doesn't support userspace MMIO during
   complex (multi-step) emulation.  Silently ignoring the exit request can
   result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to
   userspace for some other reason prior to purging mmio_needed.
 
   See commit 0dc902267c ("KVM: x86: Suppress pending MMIO write exits if
   emulator detects exception") for more details on KVM's limitations with
   respect to emulated MMIO during complex emulator flows.
 
 Generic:
 
 * Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE,
   because the special casing needed by these pages is not due to just
   unmovability (and in fact they are only unmovable because the CPU cannot
   access them).
 
 * New ioctl to populate the KVM page tables in advance, which is useful to
   mitigate KVM page faults during guest boot or after live migration.
   The code will also be used by TDX, but (probably) not through the ioctl.
 
 * Enable halt poll shrinking by default, as Intel found it to be a clear win.
 
 * Setup empty IRQ routing when creating a VM to avoid having to synchronize
   SRCU when creating a split IRQCHIP on x86.
 
 * Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
   that arch code can use for hooking both sched_in() and sched_out().
 
 * Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
   truncating a bogus value from userspace, e.g. to help userspace detect bugs.
 
 * Mark a vCPU as preempted if and only if it's scheduled out while in the
   KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
   memory when retrieving guest state during live migration blackout.
 
 Selftests:
 
 * Remove dead code in the memslot modification stress test.
 
 * Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs.
 
 * Print the guest pseudo-RNG seed only when it changes, to avoid spamming the
   log for tests that create lots of VMs.
 
 * Make the PMU counters test less flaky when counting LLC cache misses by
   doing CLFLUSH{OPT} in every loop iteration.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaZQB0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNkZwf/bv2jiENaLFNGPe/VqTKMQ6PHQLMG
 +sNHx6fJPP35gTM8Jqf0/7/ummZXcSuC1mWrzYbecZm7Oeg3vwNXHZ4LquwwX6Dv
 8dKcUzLbWDAC4WA3SKhi8C8RV2v6E7ohy69NtAJmFWTc7H95dtIQm6cduV2osTC3
 OEuHe1i8d9umk6couL9Qhm8hk3i9v2KgCsrfyNrQgLtS3hu7q6yOTR8nT0iH6sJR
 KE5A8prBQgLmF34CuvYDw4Hu6E4j+0QmIqodovg2884W1gZQ9LmcVqYPaRZGsG8S
 iDdbkualLKwiR1TpRr3HJGKWSFdc7RblbsnHRvHIZgFsMQiimh4HrBSCyQ==
 =zepX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Initial infrastructure for shadow stage-2 MMUs, as part of nested
     virtualization enablement

   - Support for userspace changes to the guest CTR_EL0 value, enabling
     (in part) migration of VMs between heterogenous hardware

   - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1
     of the protocol

   - FPSIMD/SVE support for nested, including merged trap configuration
     and exception routing

   - New command-line parameter to control the WFx trap behavior under
     KVM

   - Introduce kCFI hardening in the EL2 hypervisor

   - Fixes + cleanups for handling presence/absence of FEAT_TCRX

   - Miscellaneous fixes + documentation updates

  LoongArch:

   - Add paravirt steal time support

   - Add support for KVM_DIRTY_LOG_INITIALLY_SET

   - Add perf kvm-stat support for loongarch

  RISC-V:

   - Redirect AMO load/store access fault traps to guest

   - perf kvm stat support

   - Use guest files for IMSIC virtualization, when available

  s390:

   - Assortment of tiny fixes which are not time critical

  x86:

   - Fixes for Xen emulation

   - Add a global struct to consolidate tracking of host values, e.g.
     EFER

   - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the
     effective APIC bus frequency, because TDX

   - Print the name of the APICv/AVIC inhibits in the relevant
     tracepoint

   - Clean up KVM's handling of vendor specific emulation to
     consistently act on "compatible with Intel/AMD", versus checking
     for a specific vendor

   - Drop MTRR virtualization, and instead always honor guest PAT on
     CPUs that support self-snoop

   - Update to the newfangled Intel CPU FMS infrastructure

   - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as
     it reads '0' and writes from userspace are ignored

   - Misc cleanups

  x86 - MMU:

   - Small cleanups, renames and refactoring extracted from the upcoming
     Intel TDX support

   - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages
     that can't hold leafs SPTEs

   - Unconditionally drop mmu_lock when allocating TDP MMU page tables
     for eager page splitting, to avoid stalling vCPUs when splitting
     huge pages

   - Bug the VM instead of simply warning if KVM tries to split a SPTE
     that is non-present or not-huge. KVM is guaranteed to end up in a
     broken state because the callers fully expect a valid SPTE, it's
     all but dangerous to let more MMU changes happen afterwards

  x86 - AMD:

   - Make per-CPU save_area allocations NUMA-aware

   - Force sev_es_host_save_area() to be inlined to avoid calling into
     an instrumentable function from noinstr code

   - Base support for running SEV-SNP guests. API-wise, this includes a
     new KVM_X86_SNP_VM type, encrypting/measure the initial image into
     guest memory, and finalizing it before launching it. Internally,
     there are some gmem/mmu hooks needed to prepare gmem-allocated
     pages before mapping them into guest private memory ranges

     This includes basic support for attestation guest requests, enough
     to say that KVM supports the GHCB 2.0 specification

     There is no support yet for loading into the firmware those signing
     keys to be used for attestation requests, and therefore no need yet
     for the host to provide certificate data for those keys.

     To support fetching certificate data from userspace, a new KVM exit
     type will be needed to handle fetching the certificate from
     userspace.

     An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS
     exit type to handle this was introduced in v1 of this patchset, but
     is still being discussed by community, so for now this patchset
     only implements a stub version of SNP Extended Guest Requests that
     does not provide certificate data

  x86 - Intel:

   - Remove an unnecessary EPT TLB flush when enabling hardware

   - Fix a series of bugs that cause KVM to fail to detect nested
     pending posted interrupts as valid wake eents for a vCPU executing
     HLT in L2 (with HLT-exiting disable by L1)

   - KVM: x86: Suppress MMIO that is triggered during task switch
     emulation

     Explicitly suppress userspace emulated MMIO exits that are
     triggered when emulating a task switch as KVM doesn't support
     userspace MMIO during complex (multi-step) emulation

     Silently ignoring the exit request can result in the
     WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace
     for some other reason prior to purging mmio_needed

     See commit 0dc902267c ("KVM: x86: Suppress pending MMIO write
     exits if emulator detects exception") for more details on KVM's
     limitations with respect to emulated MMIO during complex emulator
     flows

  Generic:

   - Rename the AS_UNMOVABLE flag that was introduced for KVM to
     AS_INACCESSIBLE, because the special casing needed by these pages
     is not due to just unmovability (and in fact they are only
     unmovable because the CPU cannot access them)

   - New ioctl to populate the KVM page tables in advance, which is
     useful to mitigate KVM page faults during guest boot or after live
     migration. The code will also be used by TDX, but (probably) not
     through the ioctl

   - Enable halt poll shrinking by default, as Intel found it to be a
     clear win

   - Setup empty IRQ routing when creating a VM to avoid having to
     synchronize SRCU when creating a split IRQCHIP on x86

   - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with
     a flag that arch code can use for hooking both sched_in() and
     sched_out()

   - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
     truncating a bogus value from userspace, e.g. to help userspace
     detect bugs

   - Mark a vCPU as preempted if and only if it's scheduled out while in
     the KVM_RUN loop, e.g. to avoid marking it preempted and thus
     writing guest memory when retrieving guest state during live
     migration blackout

  Selftests:

   - Remove dead code in the memslot modification stress test

   - Treat "branch instructions retired" as supported on all AMD Family
     17h+ CPUs

   - Print the guest pseudo-RNG seed only when it changes, to avoid
     spamming the log for tests that create lots of VMs

   - Make the PMU counters test less flaky when counting LLC cache
     misses by doing CLFLUSH{OPT} in every loop iteration"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  crypto: ccp: Add the SNP_VLEK_LOAD command
  KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops
  KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops
  KVM: x86: Replace static_call_cond() with static_call()
  KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
  x86/sev: Move sev_guest.h into common SEV header
  KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
  KVM: x86: Suppress MMIO that is triggered during task switch emulation
  KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro
  KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE
  KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY
  KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()
  KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level
  KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault"
  KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler
  KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory
  KVM: Document KVM_PRE_FAULT_MEMORY ioctl
  mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE
  perf kvm: Add kvm-stat for loongarch64
  LoongArch: KVM: Add PV steal time support in guest side
  ...
2024-07-20 12:41:03 -07:00
Jason A. Donenfeld
4920a2590e selftests/vDSO: add tests for vgetrandom
This adds two tests for vgetrandom. The first one, vdso_test_chacha,
simply checks that the assembly implementation of chacha20 matches that
of libsodium, a basic sanity check that should catch most errors. The
second, vdso_test_getrandom, is a full "libc-like" implementation of the
userspace side of vgetrandom() support. It's meant to be used also as
example code for libcs that might be integrating this.

Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-19 20:22:12 +02:00
Jason A. Donenfeld
9651fcedf7 mm: add MAP_DROPPABLE for designating always lazily freeable mappings
The vDSO getrandom() implementation works with a buffer allocated with a
new system call that has certain requirements:

- It shouldn't be written to core dumps.
  * Easy: VM_DONTDUMP.
- It should be zeroed on fork.
  * Easy: VM_WIPEONFORK.

- It shouldn't be written to swap.
  * Uh-oh: mlock is rlimited.
  * Uh-oh: mlock isn't inherited by forks.

- It shouldn't reserve actual memory, but it also shouldn't crash when
  page faulting in memory if none is available
  * Uh-oh: VM_NORESERVE means segfaults.

It turns out that the vDSO getrandom() function has three really nice
characteristics that we can exploit to solve this problem:

1) Due to being wiped during fork(), the vDSO code is already robust to
   having the contents of the pages it reads zeroed out midway through
   the function's execution.

2) In the absolute worst case of whatever contingency we're coding for,
   we have the option to fallback to the getrandom() syscall, and
   everything is fine.

3) The buffers the function uses are only ever useful for a maximum of
   60 seconds -- a sort of cache, rather than a long term allocation.

These characteristics mean that we can introduce VM_DROPPABLE, which
has the following semantics:

a) It never is written out to swap.
b) Under memory pressure, mm can just drop the pages (so that they're
   zero when read back again).
c) It is inherited by fork.
d) It doesn't count against the mlock budget, since nothing is locked.
e) If there's not enough memory to service a page fault, it's not fatal,
   and no signal is sent.

This way, allocations used by vDSO getrandom() can use:

    VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE

And there will be no problem with OOMing, crashing on overcommitment,
using memory when not in use, not wiping on fork(), coredumps, or
writing out to swap.

In order to let vDSO getrandom() use this, expose these via mmap(2) as
MAP_DROPPABLE.

Note that this involves removing the MADV_FREE special case from
sort_folio(), which according to Yu Zhao is unnecessary and will simply
result in an extra call to shrink_folio_list() in the worst case. The
chunk removed reenables the swapbacked flag, which we don't want for
VM_DROPPABLE, and we can't conditionalize it here because there isn't a
vma reference available.

Finally, the provided self test ensures that this is working as desired.

Cc: linux-mm@kvack.org
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-19 20:22:12 +02:00
Linus Torvalds
76d9b92e68 slab updates for 6.11
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmaXl0kACgkQu+CwddJF
 iJrOlgf+N/G7BmgoW2CBF7mKsvCYs+pX3xeBuxPtsuq4FD386nsPFMN8gWAYLG3q
 ZU1z1S+0M8LhTg6/G9jMYLHt2Y7WhYbhFTjTHmULJkuhMDTUP9CRYy4XZ+hdPtHF
 30ezSdJQF9x/XxCSaaRVK1s+SMVHFg5xAOHKpfkNSamcMz9g+ZkYyPBr10/VoKd0
 JqwhW7r6hrlvWAiqY3QKCOvohIWglgvBUnNjUGMh1cUkOE2aYLYHklhRwICKgA6z
 p/2BUXiAEWUtgBkUrizwm/pdhJXLs0pOeYarVZP1v83tQMxyrc6XLNnqhvxP3DPW
 31thF5Rf9I8WaWTczXhxsAwFjqO3KQ==
 =4uf9
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:
 "The most prominent change this time is the kmem_buckets based
  hardening of kmalloc() allocations from Kees Cook.

  We have also extended the kmalloc() alignment guarantees for
  non-power-of-two sizes in a way that benefits rust.

  The rest are various cleanups and non-critical fixups.

   - Dedicated bucket allocator (Kees Cook)

     This series [1] enhances the probabilistic defense against heap
     spraying/grooming of CONFIG_RANDOM_KMALLOC_CACHES from last year.

     kmalloc() users that are known to be useful for exploits can get
     completely separate set of kmalloc caches that can't be shared with
     other users. The first converted users are alloc_msg() and
     memdup_user().

     The hardening is enabled by CONFIG_SLAB_BUCKETS.

   - Extended kmalloc() alignment guarantees (Vlastimil Babka)

     For years now we have guaranteed natural alignment for power-of-two
     allocations, but nothing was defined for other sizes (in practice,
     we have two such buckets, kmalloc-96 and kmalloc-192).

     To avoid unnecessary padding in the rust layer due to its alignment
     rules, extend the guarantee so that the alignment is at least the
     largest power-of-two divisor of the requested size.

     This fits what rust needs, is a superset of the existing
     power-of-two guarantee, and does not in practice change the layout
     (and thus does not add overhead due to padding) of the kmalloc-96
     and kmalloc-192 caches, unless slab debugging is enabled for them.

   - Cleanups and non-critical fixups (Chengming Zhou, Suren
     Baghdasaryan, Matthew Willcox, Alex Shi, and Vlastimil Babka)

     Various tweaks related to the new alloc profiling code, folio
     conversion, debugging and more leftovers after SLAB"

Link: https://lore.kernel.org/all/20240701190152.it.631-kees@kernel.org/ [1]

* tag 'slab-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/memcg: alignment memcg_data define condition
  mm, slab: move prepare_slab_obj_exts_hook under CONFIG_MEM_ALLOC_PROFILING
  mm, slab: move allocation tagging code in the alloc path into a hook
  mm/util: Use dedicated slab buckets for memdup_user()
  ipc, msg: Use dedicated slab buckets for alloc_msg()
  mm/slab: Introduce kmem_buckets_create() and family
  mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument
  mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
  mm/slab: Introduce kmem_buckets typedef
  slab, rust: extend kmalloc() alignment guarantees to remove Rust padding
  slab: delete useless RED_INACTIVE and RED_ACTIVE
  slab: don't put freepointer outside of object if only orig_size
  slab: make check_object() more consistent
  mm: Reduce the number of slab->folio casts
  mm, slab: don't wrap internal functions with alloc_hooks()
2024-07-18 15:08:12 -07:00
Linus Torvalds
b2fc97c186 memblock: updates for 6.11-rc1
* reserve_mem command line parameter to allow creation of named memory
   reservation at boot time.
   The driving use-case is to improve the ability of pstore to retain
   ramoops data across reboots.
 * cleaunps and small improvements in memblock and mm_init
 * new tests cases in memblock test suite
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmaXfoIQHHJwcHRAa2Vy
 bmVsLm9yZwAKCRA5A4Ymyw79kU5mCAC23vIrB8FRlORczMYj+V3VFss3OjKT92lS
 fHGwq2oxHW+rdDpHXFObHU0D3k8d2l5jyrENRAAyA02qR0L6Pv8Na6pGxQua1eic
 VIdw0PFQMsizD1AIj84Y6skkyyF/tvZHpmX0B12D5+Ur65DC/Z867Cm/lE33/fHv
 /1+QB0JlG7W+FzxVstYyebY5/DVkH+bC7/A57FE2oB4BRXjEd8v9tTHBS4kRSvrE
 zE2KFxeGajN749LHztIpIprPKehu8Gc3oLrYLNJO+uLFVCV8ey3OqVj0RXMG2wLl
 hmVYqhbZM/Uz59D/P8pULD49f1Thjv/5A/MvUZ3SxM6zpWlsincf
 =xrZd
 -----END PGP SIGNATURE-----

Merge tag 'memblock-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock updates from Mike Rapoport:

 - 'reserve_mem' command line parameter to allow creation of named
   memory reservation at boot time.

   The driving use-case is to improve the ability of pstore to retain
   ramoops data across reboots.

 - cleanups and small improvements in memblock and mm_init

 - new tests cases in memblock test suite

* tag 'memblock-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock tests: fix implicit declaration of function 'numa_valid_node'
  memblock: Move late alloc warning down to phys alloc
  pstore/ramoops: Add ramoops.mem_name= command line option
  mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  mm/mm_init.c: don't initialize page->lru again
  mm/mm_init.c: not always search next deferred_init_pfn from very beginning
  mm/mm_init.c: use deferred_init_mem_pfn_range_in_zone() to decide loop condition
  mm/mm_init.c: get the highest zone directly
  mm/mm_init.c: move nr_initialised reset down a bit
  mm/memblock: fix a typo in description of for_each_mem_region()
  mm/mm_init.c: use memblock_region_memory_base_pfn() to get startpfn
  mm/memblock: use PAGE_ALIGN_DOWN to get pgend in free_memmap
  mm/memblock: return true directly on finding overlap region
  memblock tests: add memblock_overlaps_region_checks
  mm/memblock: fix comment for memblock_isolate_range()
  memblock tests: add memblock_reserve_many_may_conflict_check()
  memblock tests: add memblock_reserve_all_locations_check()
  mm/memblock: remove empty dummy entry
2024-07-18 14:48:11 -07:00
Linus Torvalds
91bd008d4e Probes updates for v6.11:
Uprobes:
 - x86/shstk: Make return uprobe work with shadow stack.
 - Add uretprobe syscall which speeds up the uretprobe 10-30% faster. This
   syscall is automatically used from user-space trampolines which are
   generated by the uretprobe. If this syscall is used by normal
   user program, it will cause SIGILL. Note that this is currently only
   implemented on x86_64.
   (This also has 2 fixes for adjusting the syscall number to avoid conflict
    with new *attrat syscalls.)
 - uprobes/perf: fix user stack traces in the presence of pending uretprobe.
   This corrects the uretprobe's trampoline address in the stacktrace with
   correct return address.
 - selftests/x86: Add a return uprobe with shadow stack test.
 - selftests/bpf: Add uretprobe syscall related tests.
   . test case for register integrity check.
   . test case with register changing case.
   . test case for uretprobe syscall without uprobes (expected to be failed).
   . test case for uretprobe with shadow stack.
 - selftests/bpf: add test validating uprobe/uretprobe stack traces
 - MAINTAINERS: Add uprobes entry. This does not specify the tree but to
   clarify who maintains and reviews the uprobes.
 
 Kprobes:
 - tracing/kprobes: Test case cleanups. Replace redundant WARN_ON_ONCE() +
   pr_warn() with WARN_ONCE() and remove unnecessary code from selftest.
 - tracing/kprobes: Add symbol counting check when module loads. This
   checks the uniqueness of the probed symbol on modules. The same check
   has already done for kernel symbols.
   (This also has a fix for build error with CONFIG_MODULES=n)
 
 Cleanup:
 - Add MODULE_DESCRIPTION() macros for fprobe and kprobe examples.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmaWYxwbHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bsUgH/3JcSzDZujQWCZ1f4fJn
 QecvTFSYcCl6ck8+/3wm4EsgeCXIFOyPnoPc7k2Gm+l6Dlk1DKGV6wV4tuKFUq9X
 9mplcwoVA0Ln+EX9zv9v4s99yUGxcU9xjgC9XT7J52SvqYncPIi6dR0Z9wlJBmyd
 Bx3cZk+wSzCYaoqYngI2fKlzsEcYgDIP999fQPRi0HGzNZujc4xeJyjCTC/48yWO
 9kreRQq6wFdgRQTwMcR/fKPDKIGZQCU8jkXv5crVV5K3rNaBcwBmCJJMP8PzPU0V
 UQ0+8RZK+Qk8SBwXcMNVRqm/efTderob4IYxP8OBe5wjAIE7+vu8r6sqwxRIS54M
 Cyg=
 =DRSr
 -----END PGP SIGNATURE-----

Merge tag 'probes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes updates from Masami Hiramatsu:
 "Uprobes:

   - x86/shstk: Make return uprobe work with shadow stack

   - Add uretprobe syscall which speeds up the uretprobe 10-30% faster.
     This syscall is automatically used from user-space trampolines
     which are generated by the uretprobe. If this syscall is used by
     normal user program, it will cause SIGILL. Note that this is
     currently only implemented on x86_64.

     (This also has two fixes for adjusting the syscall number to avoid
     conflict with new *attrat syscalls.)

   - uprobes/perf: fix user stack traces in the presence of pending
     uretprobe. This corrects the uretprobe's trampoline address in the
     stacktrace with correct return address

   - selftests/x86: Add a return uprobe with shadow stack test

   - selftests/bpf: Add uretprobe syscall related tests.
      - test case for register integrity check
      - test case with register changing case
      - test case for uretprobe syscall without uprobes (expected to fail)
      - test case for uretprobe with shadow stack

   - selftests/bpf: add test validating uprobe/uretprobe stack traces

   - MAINTAINERS: Add uprobes entry. This does not specify the tree but
     to clarify who maintains and reviews the uprobes

  Kprobes:

   - tracing/kprobes: Test case cleanups.

     Replace redundant WARN_ON_ONCE() + pr_warn() with WARN_ONCE() and
     remove unnecessary code from selftest

   - tracing/kprobes: Add symbol counting check when module loads.

     This checks the uniqueness of the probed symbol on modules. The
     same check has already done for kernel symbols

     (This also has a fix for build error with CONFIG_MODULES=n)

  Cleanup:

   - Add MODULE_DESCRIPTION() macros for fprobe and kprobe examples"

* tag 'probes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  MAINTAINERS: Add uprobes entry
  selftests/bpf: Change uretprobe syscall number in uprobe_syscall test
  uprobe: Change uretprobe syscall scope and number
  tracing/kprobes: Fix build error when find_module() is not available
  tracing/kprobes: Add symbol counting check when module loads
  selftests/bpf: add test validating uprobe/uretprobe stack traces
  perf,uprobes: fix user stack traces in the presence of pending uretprobes
  tracing/kprobe: Remove cleanup code unrelated to selftest
  tracing/kprobe: Integrate test warnings into WARN_ONCE
  selftests/bpf: Add uretprobe shadow stack test
  selftests/bpf: Add uretprobe syscall call from user space test
  selftests/bpf: Add uretprobe syscall test for regs changes
  selftests/bpf: Add uretprobe syscall test for regs integrity
  selftests/x86: Add return uprobe shadow stack test
  uprobe: Add uretprobe syscall to speed up return probe
  uprobe: Wire up uretprobe system call
  x86/shstk: Make return uprobe work with shadow stack
  samples: kprobes: add missing MODULE_DESCRIPTION() macros
  fprobe: add missing MODULE_DESCRIPTION() macro
2024-07-18 12:19:20 -07:00
Linus Torvalds
51835949dd Networking changes for 6.11. Not much excitement - a handful of large
patchsets (devmem among them) did not make it in time.
 
 Core & protocols
 ----------------
 
  - Use local_lock in addition to local_bh_disable() to protect per-CPU
    resources in networking, a step closer for local_bh_disable() not
    to act as a big lock on PREEMPT_RT.
 
  - Use flex array for netdevice priv area, ensure its cache alignment.
 
  - Add a sysctl knob to allow user to specify a default rto_min at socket
    init time. Bit of a big hammer but multiple companies were
    independently carrying such patch downstream so clearly it's useful.
 
  - Support scheduling transmission of packets based on CLOCK_TAI.
 
  - Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned off
    using cpusets.
 
  - Support multiple L2TPv3 UDP tunnels using the same 5-tuple address.
 
  - Allow configuration of multipath hash seed, to both allow synchronizing
    hashing of two routers, and preventing partial accidental sync.
 
  - Improve TCP compliance with RFC 9293 for simultaneous connect().
 
  - Support sending NAT keepalives in IPsec ESP in UDP states. Userspace
    IKE daemon had to do this before, but the kernel can better keep
    track of it.
 
  - Support sending supervision HSR frames with MAC addresses stored in
    ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled.
 
  - Introduce IPPROTO_SMC for selecting SMC when socket is created.
 
  - Allow UDP GSO transmit from devices with no checksum offload.
 
  - openvswitch: add packet sampling via psample, separating the sampled
    traffic from "upcall" packets sent to user space for forwarding.
 
  - nf_tables: shrink memory consumption for transaction objects.
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Power Sequencing subsystem (used by Qualcomm Bluetooth driver
    for QCA6390).
 
  - Add IRQ information in sysfs for auxiliary bus.
 
  - Introduce guard definition for local_lock.
 
  - Add aligned flavor of __cacheline_group_{begin, end}() markings for
    grouping fields in structures.
 
 BPF
 ---
 
  - Notify user space (via epoll) when a struct_ops object is getting
    detached/unregistered.
 
  - Add new kfuncs for a generic, open-coded bits iterator.
 
  - Enable BPF programs to declare arrays of kptr, bpf_rb_root, and
    bpf_list_head.
 
  - Support resilient split BTF which cuts down on duplication and makes
    BTF as compact as possible WRT BTF from modules.
 
  - Add support for dumping kfunc prototypes from BTF which enables both
    detecting as well as dumping compilable prototypes for kfuncs.
 
  - riscv64 BPF JIT improvements in particular to add 12-argument support
    for BPF trampolines and to utilize bpf_prog_pack for the latter.
 
  - Add the capability to offload the netfilter flowtable in XDP layer
    through kfuncs.
 
 Driver API
 ----------
 
  - Allow users to configure IRQ tresholds between which automatic IRQ
    moderation can choose.
 
  - Expand Power Sourcing (PoE) status with power, class and failure
    reason. Support setting power limits.
 
  - Track additional RSS contexts in the core, make sure configuration
    changes don't break them.
 
  - Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP
    data paths.
 
  - Support updating firmware on SFP modules.
 
 Tests and tooling
 -----------------
 
  - mptcp: use net/lib.sh to manage netns.
 
  - TCP-AO and TCP-MD5: replace debug prints used by tests with
    tracepoints.
 
  - openvswitch: make test self-contained (don't depend on OvS CLI tools).
 
 Drivers
 -------
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt):
      - increase the max total outstanding PTP TX packets to 4
      - add timestamping statistics support
      - implement netdev_queue_mgmt_ops
      - support new RSS context API
    - Intel (100G, ice, idpf):
      - implement FEC statistics and dumping signal quality indicators
      - support E825C products (with 56Gbps PHYs)
    - nVidia/Mellanox:
      - support HW-GRO
      - mlx4/mlx5: support per-queue statistics via netlink
      - obey the max number of EQs setting in sub-functions
    - AMD/Solarflare:
      - support new RSS context API
    - AMD/Pensando:
      - ionic: rework fix for doorbell miss to lower overhead
        and skip it on new HW
    - Wangxun:
      - txgbe: support Flow Director perfect filters
 
  - Ethernet NICs consumer, embedded and virtual:
    - Add driver for Tehuti Networks TN40xx chips
    - Add driver for Meta's internal NIC chips
    - Add driver for Ethernet MAC on Airoha EN7581 SoCs
    - Add driver for Renesas Ethernet-TSN devices
    - Google cloud vNIC:
      - flow steering support
    - Microsoft vNIC:
      - support page sizes other than 4KB on ARM64
    - vmware vNIC:
      - support latency measurement (update to version 9)
    - VirtIO net:
      - support for Byte Queue Limits
      - support configuring thresholds for automatic IRQ moderation
      - support for AF_XDP Rx zero-copy
    - Synopsys (stmmac):
      - support for STM32MP13 SoC
      - let platforms select the right PCS implementation
    - TI:
      - icssg-prueth: add multicast filtering support
      - icssg-prueth: enable PTP timestamping and PPS
    - Renesas:
      - ravb: improve Rx performance 30-400% by using page pool,
        theaded NAPI and timer-based IRQ coalescing
      - ravb: add MII support for R-Car V4M
    - Cadence (macb):
      - macb: add ARP support to Wake-On-LAN
    - Cortina:
      - use phylib for RX and TX pause configuration
 
  - Ethernet switches:
    - nVidia/Mellanox:
      - support configuration of multipath hash seed
      - report more accurate max MTU
      - use page_pool to improve Rx performance
    - MediaTek:
      - mt7530: add support for bridge port isolation
    - Qualcomm:
      - qca8k: add support for bridge port isolation
    - Microchip:
      - lan9371/2: add 100BaseTX PHY support
    - NXP:
      - vsc73xx: implement VLAN operations
 
  - Ethernet PHYs:
    - aquantia: enable support for aqr115c
    - aquantia: add support for PHY LEDs
    - realtek: add support for rtl8224 2.5Gbps PHY
    - xpcs: add memory-mapped device support
    - add BroadR-Reach link mode and support in Broadcom's PHY driver
 
  - CAN:
    - add document for ISO 15765-2 protocol support
    - mcp251xfd: workaround for erratum DS80000789E, use timestamps
      to catch when device returns incorrect FIFO status
 
  - WiFi:
    - mac80211/cfg80211:
      - parse Transmit Power Envelope (TPE) data in mac80211 instead of
        in drivers
      - improvements for 6 GHz regulatory flexibility
      - multi-link improvements
      - support multiple radios per wiphy
      - remove DEAUTH_NEED_MGD_TX_PREP flag
    - Intel (iwlwifi):
      - bump FW API to 91 for BZ/SC devices
      - report 64-bit radiotap timestamp
      - enable P2P low latency by default
      - handle Transmit Power Envelope (TPE) advertised by AP
      - remove support for older FW for new devices
      - fast resume (keeping the device configured)
      - mvm: re-enable Multi-Link Operation (MLO)
      - aggregation (A-MSDU) optimizations
    - MediaTek (mt76):
      - mt7925 Multi-Link Operation (MLO) support
    - Qualcomm (ath10k):
      - LED support for various chipsets
    - Qualcomm (ath12k):
      - remove unsupported Tx monitor handling
      - support channel 2 in 6 GHz band
      - support Spatial Multiplexing Power Save (SMPS) in 6 GHz band
      - supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID
        Advertisements (EMA)
      - support dynamic VLAN
      - add panic handler for resetting the firmware state
      - DebugFS support for datapath statistics
      - WCN7850: support for Wake on WLAN
    - Microchip (wilc1000):
      - read MAC address during probe to make it visible to user space
      - suspend/resume improvements
    - TI (wl18xx):
      - support newer firmware versions
    - RealTek (rtw89):
      - preparation for RTL8852BE-VT support
      - Wake on WLAN support for WiFi 6 chips
      - 36-bit PCI DMA support
    - RealTek (rtlwifi):
      - RTL8192DU support
    - Broadcom (brcmfmac):
      - Management Frame Protection support (to enable WPA3)
 
  - Bluetooth:
    - qualcomm: use the power sequencer for QCA6390
    - btusb: mediatek: add ISO data transmission functions
    - hci_bcm4377: add BCM4388 support
    - btintel: add support for BlazarU core
    - btintel: add support for Whale Peak2
    - btnxpuart: add support for AW693 A1 chipset
    - btnxpuart: add support for IW615 chipset
    - btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmaWjBwACgkQMUZtbf5S
 IrvuSRAAkJuEzTRqgURBCe4eNEQde6mJJig7l2CKHwCbFiHZpRkFHf8qKbcGWbL6
 uLW33SWnKtJVDhxVKWHLq635XW7BAa80YhqGw21GDi+mIEhWXZglHj3xbXNxsMfE
 4eg/kG4BkfYWFmHaXOwVWV/mr7nXf6j7WmXNeXEi32ufE1j0OL+YlQenKnMj8yP2
 j9JmYa2Chwppng1SblHmcjmGkdNVwFhStKeCG+2K7v06wdDH/QYBlbgUv9gw/cxp
 NlW//wgiaeX40U4O3kDwt9C+LDoh+0VrDDeVdQ+IsScLtY3PhAzEoKolFYTq2HSr
 I1JpoaHNnyNsJq3DZrACQ5WlH4yDn6C2EUB6dxNnFaI9F1ZPsi+7MTl6Sei1AklD
 TuQTj/lxOACBwW2Q77NU72uoxiIUauesGPHcnrAFuoCIEhZF0mso7k59BvrXhsOP
 QwcLbQdc1YHNkqv/Vc7NBY+ruMsYB+5Ubbhhj2p27dp/CWFIwxI29fze4dn2uhO6
 ejHN3mbqwPdSzg12YJtM6Iq61Cnwo2eVSvhTxl+ZVSZtI4nu2arzR+y7QTYmNrXP
 6tkgVN9UsWeLl2xJ8wyyqL5mcvNHP2rPXWZ2X56iTaa26m+UlleeQ7YRaYtQAAr0
 Ec/vlDMX64SwHhd+qwE99DXGQf2g+KklHKSLsnajJUVrWFTlRI0=
 =opz8
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Not much excitement - a handful of large patchsets (devmem among them)
  did not make it in time.

  Core & protocols:

   - Use local_lock in addition to local_bh_disable() to protect per-CPU
     resources in networking, a step closer for local_bh_disable() not
     to act as a big lock on PREEMPT_RT

   - Use flex array for netdevice priv area, ensure its cache alignment

   - Add a sysctl knob to allow user to specify a default rto_min at
     socket init time. Bit of a big hammer but multiple companies were
     independently carrying such patch downstream so clearly it's useful

   - Support scheduling transmission of packets based on CLOCK_TAI

   - Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned
     off using cpusets

   - Support multiple L2TPv3 UDP tunnels using the same 5-tuple address

   - Allow configuration of multipath hash seed, to both allow
     synchronizing hashing of two routers, and preventing partial
     accidental sync

   - Improve TCP compliance with RFC 9293 for simultaneous connect()

   - Support sending NAT keepalives in IPsec ESP in UDP states.
     Userspace IKE daemon had to do this before, but the kernel can
     better keep track of it

   - Support sending supervision HSR frames with MAC addresses stored in
     ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled

   - Introduce IPPROTO_SMC for selecting SMC when socket is created

   - Allow UDP GSO transmit from devices with no checksum offload

   - openvswitch: add packet sampling via psample, separating the
     sampled traffic from "upcall" packets sent to user space for
     forwarding

   - nf_tables: shrink memory consumption for transaction objects

  Things we sprinkled into general kernel code:

   - Power Sequencing subsystem (used by Qualcomm Bluetooth driver for
     QCA6390)           [ Already merged separately - Linus ]

   - Add IRQ information in sysfs for auxiliary bus

   - Introduce guard definition for local_lock

   - Add aligned flavor of __cacheline_group_{begin, end}() markings for
     grouping fields in structures

  BPF:

   - Notify user space (via epoll) when a struct_ops object is getting
     detached/unregistered

   - Add new kfuncs for a generic, open-coded bits iterator

   - Enable BPF programs to declare arrays of kptr, bpf_rb_root, and
     bpf_list_head

   - Support resilient split BTF which cuts down on duplication and
     makes BTF as compact as possible WRT BTF from modules

   - Add support for dumping kfunc prototypes from BTF which enables
     both detecting as well as dumping compilable prototypes for kfuncs

   - riscv64 BPF JIT improvements in particular to add 12-argument
     support for BPF trampolines and to utilize bpf_prog_pack for the
     latter

   - Add the capability to offload the netfilter flowtable in XDP layer
     through kfuncs

  Driver API:

   - Allow users to configure IRQ tresholds between which automatic IRQ
     moderation can choose

   - Expand Power Sourcing (PoE) status with power, class and failure
     reason. Support setting power limits

   - Track additional RSS contexts in the core, make sure configuration
     changes don't break them

   - Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated
     ESP data paths

   - Support updating firmware on SFP modules

  Tests and tooling:

   - mptcp: use net/lib.sh to manage netns

   - TCP-AO and TCP-MD5: replace debug prints used by tests with
     tracepoints

   - openvswitch: make test self-contained (don't depend on OvS CLI
     tools)

  Drivers:

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - increase the max total outstanding PTP TX packets to 4
         - add timestamping statistics support
         - implement netdev_queue_mgmt_ops
         - support new RSS context API
      - Intel (100G, ice, idpf):
         - implement FEC statistics and dumping signal quality indicators
         - support E825C products (with 56Gbps PHYs)
      - nVidia/Mellanox:
         - support HW-GRO
         - mlx4/mlx5: support per-queue statistics via netlink
         - obey the max number of EQs setting in sub-functions
      - AMD/Solarflare:
         - support new RSS context API
      - AMD/Pensando:
         - ionic: rework fix for doorbell miss to lower overhead and
           skip it on new HW
      - Wangxun:
         - txgbe: support Flow Director perfect filters

   - Ethernet NICs consumer, embedded and virtual:
      - Add driver for Tehuti Networks TN40xx chips
      - Add driver for Meta's internal NIC chips
      - Add driver for Ethernet MAC on Airoha EN7581 SoCs
      - Add driver for Renesas Ethernet-TSN devices
      - Google cloud vNIC:
         - flow steering support
      - Microsoft vNIC:
         - support page sizes other than 4KB on ARM64
      - vmware vNIC:
         - support latency measurement (update to version 9)
      - VirtIO net:
         - support for Byte Queue Limits
         - support configuring thresholds for automatic IRQ moderation
         - support for AF_XDP Rx zero-copy
      - Synopsys (stmmac):
         - support for STM32MP13 SoC
         - let platforms select the right PCS implementation
      - TI:
         - icssg-prueth: add multicast filtering support
         - icssg-prueth: enable PTP timestamping and PPS
      - Renesas:
         - ravb: improve Rx performance 30-400% by using page pool,
           theaded NAPI and timer-based IRQ coalescing
         - ravb: add MII support for R-Car V4M
      - Cadence (macb):
         - macb: add ARP support to Wake-On-LAN
      - Cortina:
         - use phylib for RX and TX pause configuration

   - Ethernet switches:
      - nVidia/Mellanox:
         - support configuration of multipath hash seed
         - report more accurate max MTU
         - use page_pool to improve Rx performance
      - MediaTek:
         - mt7530: add support for bridge port isolation
      - Qualcomm:
         - qca8k: add support for bridge port isolation
      - Microchip:
         - lan9371/2: add 100BaseTX PHY support
      - NXP:
         - vsc73xx: implement VLAN operations

   - Ethernet PHYs:
      - aquantia: enable support for aqr115c
      - aquantia: add support for PHY LEDs
      - realtek: add support for rtl8224 2.5Gbps PHY
      - xpcs: add memory-mapped device support
      - add BroadR-Reach link mode and support in Broadcom's PHY driver

   - CAN:
      - add document for ISO 15765-2 protocol support
      - mcp251xfd: workaround for erratum DS80000789E, use timestamps to
        catch when device returns incorrect FIFO status

   - WiFi:
      - mac80211/cfg80211:
         - parse Transmit Power Envelope (TPE) data in mac80211 instead
           of in drivers
         - improvements for 6 GHz regulatory flexibility
         - multi-link improvements
         - support multiple radios per wiphy
         - remove DEAUTH_NEED_MGD_TX_PREP flag
      - Intel (iwlwifi):
         - bump FW API to 91 for BZ/SC devices
         - report 64-bit radiotap timestamp
         - enable P2P low latency by default
         - handle Transmit Power Envelope (TPE) advertised by AP
         - remove support for older FW for new devices
         - fast resume (keeping the device configured)
         - mvm: re-enable Multi-Link Operation (MLO)
         - aggregation (A-MSDU) optimizations
      - MediaTek (mt76):
         - mt7925 Multi-Link Operation (MLO) support
      - Qualcomm (ath10k):
         - LED support for various chipsets
      - Qualcomm (ath12k):
         - remove unsupported Tx monitor handling
         - support channel 2 in 6 GHz band
         - support Spatial Multiplexing Power Save (SMPS) in 6 GHz band
         - supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID
           Advertisements (EMA)
         - support dynamic VLAN
         - add panic handler for resetting the firmware state
         - DebugFS support for datapath statistics
         - WCN7850: support for Wake on WLAN
      - Microchip (wilc1000):
         - read MAC address during probe to make it visible to user space
         - suspend/resume improvements
      - TI (wl18xx):
         - support newer firmware versions
      - RealTek (rtw89):
         - preparation for RTL8852BE-VT support
         - Wake on WLAN support for WiFi 6 chips
         - 36-bit PCI DMA support
      - RealTek (rtlwifi):
         - RTL8192DU support
      - Broadcom (brcmfmac):
         - Management Frame Protection support (to enable WPA3)

   - Bluetooth:
      - qualcomm: use the power sequencer for QCA6390
      - btusb: mediatek: add ISO data transmission functions
      - hci_bcm4377: add BCM4388 support
      - btintel: add support for BlazarU core
      - btintel: add support for Whale Peak2
      - btnxpuart: add support for AW693 A1 chipset
      - btnxpuart: add support for IW615 chipset
      - btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591"

* tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1589 commits)
  eth: fbnic: Fix spelling mistake "tiggerring" -> "triggering"
  tcp: Replace strncpy() with strscpy()
  wifi: ath12k: fix build vs old compiler
  tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child().
  eth: fbnic: Write the TCAM tables used for RSS control and Rx to host
  eth: fbnic: Add L2 address programming
  eth: fbnic: Add basic Rx handling
  eth: fbnic: Add basic Tx handling
  eth: fbnic: Add link detection
  eth: fbnic: Add initial messaging to notify FW of our presence
  eth: fbnic: Implement Rx queue alloc/start/stop/free
  eth: fbnic: Implement Tx queue alloc/start/stop/free
  eth: fbnic: Allocate a netdevice and napi vectors with queues
  eth: fbnic: Add FW communication mechanism
  eth: fbnic: Add message parsing for FW messages
  eth: fbnic: Add register init to set PCIe/Ethernet device config
  eth: fbnic: Allocate core device specific structures and devlink interface
  eth: fbnic: Add scaffolding for Meta's NIC driver
  PCI: Add Meta Platforms vendor ID
  net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK
  ...
2024-07-16 19:28:34 -07:00
Linus Torvalds
d80f2996b8 asm-generic updates for 6.11
Most of this is part of my ongoing work to clean up the system call
 tables. In this bit, all of the newer architectures are converted to
 use the machine readable syscall.tbl format instead in place of complex
 macros in include/uapi/asm-generic/unistd.h.
 
 This follows an earlier series that fixed various API mismatches
 and in turn is used as the base for planned simplifications.
 
 The other two patches are dead code removal and a warning fix.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmaVB1cACgkQYKtH/8kJ
 UicMqxAAnYKOxfjoMIhYYK6bl126wg/vIcDcjIR9cNWH21Nhn3qxn11ZXau3S7xv
 3l/HreEhyEQr4gC2a70IlXyHUadYOlrk+83OURrunWk1oKPmZlMKcfPVbtp8GL7x
 PUNXQfwM1XZLveKwufY24hoZdwKC+Y/5WLc1t0ReznJuAqgeO2rM9W5dnV5bAfCp
 he3F5hFcr196Dz3/GJjJIWrY+cbwfmZWsNtj1vFTL5/r/LuCu8HTkqhsGj8tE5BJ
 NGVEEXbp5eaVTCIGqJWhnuZcsnKN9kM51M7CtdwWf8OTckUVuJap5OsDVKQkWkGl
 bLPbd2jhDltph0sah51hAIvv4WdkThW76u9FRW7KR3fo7ra67eF7l5j7wc1lE2JB
 GwLJ1X56Bxe1GhvvNTlDmb7DrnlP/DMPuRv3Z6xyH6l8iZ2pMGlnAxuw6Bs1s6Y5
 WSs36ZpnS0ctgjfx37ZITsZSvbKFPpQFJP4siwS8aRNv/NFALNNdFyOCY5lNzspZ
 0dxwjn6/7UpHE4MKh6/hvCg2QwupXXBTRytibw+75/rOsR+EYlmtuONtyq2sLUHe
 ktJ5pg+8XuZm27+wLffuluzmY7sv2F8OU4cTYeM60Ynmc6pRzwUY6/VhG52S1/mU
 Ua4VgYIpzOtlLrYmz5QTWIZpdSFSVbIc/3pLriD6hn4Mvg+BwdA=
 =XOhL
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "Most of this is part of my ongoing work to clean up the system call
  tables. In this bit, all of the newer architectures are converted to
  use the machine readable syscall.tbl format instead in place of
  complex macros in include/uapi/asm-generic/unistd.h.

  This follows an earlier series that fixed various API mismatches and
  in turn is used as the base for planned simplifications.

  The other two patches are dead code removal and a warning fix"

* tag 'asm-generic-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  vmlinux.lds.h: catch .bss..L* sections into BSS")
  fixmap: Remove unused set_fixmap_offset_io()
  riscv: convert to generic syscall table
  openrisc: convert to generic syscall table
  nios2: convert to generic syscall table
  loongarch: convert to generic syscall table
  hexagon: use new system call table
  csky: convert to generic syscall table
  arm64: rework compat syscall macros
  arm64: generate 64-bit syscall.tbl
  arm64: convert unistd_32.h to syscall.tbl format
  arc: convert to generic syscall table
  clone3: drop __ARCH_WANT_SYS_CLONE3 macro
  kbuild: add syscall table generation to scripts/Makefile.asm-headers
  kbuild: verify asm-generic header list
  loongarch: avoid generating extra header files
  um: don't generate asm/bpf_perf_event.h
  csky: drop asm/gpio.h wrapper
  syscalls: add generic scripts/syscall.tbl
2024-07-16 12:09:03 -07:00
Vlastimil Babka
436381eaf2 Merge branch 'slab/for-6.11/buckets' into slab/for-next
Merge all the slab patches previously collected on top of v6.10-rc1,
over cleanups/fixes that had to be based on rc6.
2024-07-15 10:44:16 +02:00
Andrii Nakryiko
77179b6f30 tools: sync uapi/linux/fs.h header into tools subdir
We need this UAPI header in tools/include subdirectory for using it from
BPF selftests.

Link: https://lkml.kernel.org/r/20240627170900.1672542-6-andrii@kernel.org
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12 15:52:12 -07:00
Paolo Bonzini
c8b8b8190a LoongArch KVM changes for v6.11
1. Add ParaVirt steal time support.
 2. Add some VM migration enhancement.
 3. Add perf kvm-stat support for loongarch.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmaOS6UWHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImehejD/9pACGe3h3krXLcFVWXOFIu5Hpc
 5kQLP0lSPJ/o5Xs8t/oPLrnDX70z90wXI1LOmltc7h32MSwFa2l8COQh+sN5eJBQ
 PNyt7u7bMipp0yJS4Gl3LQQ5vklcGOSpQc/gbeXnVx8J/tz+Mo9YGGLIXVRXRM6W
 Ri8D2VVFiwzQQYeTpPo1u1Ob8C6mA4KOppwvhscMTM3vj4NMbsinBzRnR0lG0Tdw
 meFhxDPly1Ksxsbnj9UGO6UnEY0A2SLONs6MiO4y4DtoqoDlw/lbqFJuYo4vvbx1
 pxtjyirD/PX/wjslQFWUOuU0hMfAodera+JupZ5BZWfcG8FltA4DQfDsm/U9RjK/
 7gGNnr8Xk2/tp6+4AVV+HU2iTgRvq+mXCL72zSy2Y4r7ElBAANDfk4n+Zn/PWisn
 U9wwV8Ue7tVB15BRpRsg77NzBidiCFEe/6flWYiX2y24ke71gwDJBGUy8hMdKt6t
 4Cq8atsU0MvDAzfYMsK9JjskJp4UFq6wb1tXbbuADM4TDhnzlK6s6h3vM+pFlh/f
 my7fDH8/2qsCWhBDM4pmsJskVp+I1GOk/80RjTQISwx7iHktJWvxNYTaisK2fvD5
 Qs1IUWfNFbDX0Lr0QpN6j6X4rZkghR4R6XoFkd4nkicwi+UHVn3oK9GSqv24QJn9
 7+Ev3dfRTUYLd6mC4Q==
 =DpIK
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-kvm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD

LoongArch KVM changes for v6.11

1. Add ParaVirt steal time support.
2. Add some VM migration enhancement.
3. Add perf kvm-stat support for loongarch.
2024-07-12 11:24:12 -04:00
Isaku Yamahata
9ff0e37c68 KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY
Add a test case to exercise KVM_PRE_FAULT_MEMORY and run the guest to access the
pre-populated area.  It tests KVM_PRE_FAULT_MEMORY ioctl for KVM_X86_DEFAULT_VM
and KVM_X86_SW_PROTECTED_VM.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Message-ID: <32427791ef42e5efaafb05d2ac37fa4372715f47.1712785629.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-12 11:18:27 -04:00
Wei Yang
692a68ee9c radix tree test suite: put definition of bitmap_clear() into lib/bitmap.c
In tools/ directory, function bitmap_clear() is currently only used in
object file tools/testing/radix-tree/xarray.o.

But instead of keeping a bitmap.c with only bitmap_clear() definition in
radix-tree's own directory, it would be more proper to put it in common
directory lib/.

Sync the kernel definition and link some related libs, no functional
change is expected.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Matthew Wilcox <willy@infradead.org>
CC: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2024-07-10 14:24:27 -07:00
Arnd Bergmann
505d66d1ab clone3: drop __ARCH_WANT_SYS_CLONE3 macro
When clone3() was introduced, it was not obvious how each architecture
deals with setting up the stack and keeping the register contents in
a fork()-like system call, so this was left for the architecture
maintainers to implement, with __ARCH_WANT_SYS_CLONE3 defined by those
that already implement it.

Five years later, we still have a few architectures left that are missing
clone3(), and the macro keeps getting in the way as it's fundamentally
different from all the other __ARCH_WANT_SYS_* macros that are meant
to provide backwards-compatibility with applications using older
syscalls that are no longer provided by default.

Address this by reversing the polarity of the macro, adding an
__ARCH_BROKEN_SYS_CLONE3 macro to all architectures that don't
already provide the syscall, and remove __ARCH_WANT_SYS_CLONE3
from all the other ones.

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-10 14:23:38 +02:00
Paolo Abeni
7b769adc26 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZoxN0AAKCRDbK58LschI
 g0c5AQDa3ZV9gfbN42y1zSDoM1uOgO60fb+ydxyOYh8l3+OiQQD/fLfpTY3gBFSY
 9yi/pZhw/QdNzQskHNIBrHFGtJbMxgs=
 =p1Zz
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-07-08

The following pull-request contains BPF updates for your *net-next* tree.

We've added 102 non-merge commits during the last 28 day(s) which contain
a total of 127 files changed, 4606 insertions(+), 980 deletions(-).

The main changes are:

1) Support resilient split BTF which cuts down on duplication and makes BTF
   as compact as possible wrt BTF from modules, from Alan Maguire & Eduard Zingerman.

2) Add support for dumping kfunc prototypes from BTF which enables both detecting
   as well as dumping compilable prototypes for kfuncs, from Daniel Xu.

3) Batch of s390x BPF JIT improvements to add support for BPF arena and to implement
   support for BPF exceptions, from Ilya Leoshkevich.

4) Batch of riscv64 BPF JIT improvements in particular to add 12-argument support
   for BPF trampolines and to utilize bpf_prog_pack for the latter, from Pu Lehui.

5) Extend BPF test infrastructure to add a CHECKSUM_COMPLETE validation option
   for skbs and add coverage along with it, from Vadim Fedorenko.

6) Inline bpf_get_current_task/_btf() helpers in the arm64 BPF JIT which gives
   a small 1% performance improvement in micro-benchmarks, from Puranjay Mohan.

7) Extend the BPF verifier to track the delta between linked registers in order
   to better deal with recent LLVM code optimizations, from Alexei Starovoitov.

8) Fix bpf_wq_set_callback_impl() kfunc signature where the third argument should
   have been a pointer to the map value, from Benjamin Tissoires.

9) Extend BPF selftests to add regular expression support for test output matching
   and adjust some of the selftest when compiled under gcc, from Cupertino Miranda.

10) Simplify task_file_seq_get_next() and remove an unnecessary loop which always
    iterates exactly once anyway, from Dan Carpenter.

11) Add the capability to offload the netfilter flowtable in XDP layer through
    kfuncs, from Florian Westphal & Lorenzo Bianconi.

12) Various cleanups in networking helpers in BPF selftests to shave off a few
    lines of open-coded functions on client/server handling, from Geliang Tang.

13) Properly propagate prog->aux->tail_call_reachable out of BPF verifier, so
    that x86 JIT does not need to implement detection, from Leon Hwang.

14) Fix BPF verifier to add a missing check_func_arg_reg_off() to prevent an
    out-of-bounds memory access for dynpointers, from Matt Bobrowski.

15) Fix bpf_session_cookie() kfunc to return __u64 instead of long pointer as
    it might lead to problems on 32-bit archs, from Jiri Olsa.

16) Enhance traffic validation and dynamic batch size support in xsk selftests,
    from Tushar Vyavahare.

bpf-next-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (102 commits)
  selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature
  bpf: helpers: fix bpf_wq_set_callback_impl signature
  libbpf: Add NULL checks to bpf_object__{prev_map,next_map}
  selftests/bpf: Remove exceptions tests from DENYLIST.s390x
  s390/bpf: Implement exceptions
  s390/bpf: Change seen_reg to a mask
  bpf: Remove unnecessary loop in task_file_seq_get_next()
  riscv, bpf: Optimize stack usage of trampoline
  bpf, devmap: Add .map_alloc_check
  selftests/bpf: Remove arena tests from DENYLIST.s390x
  selftests/bpf: Add UAF tests for arena atomics
  selftests/bpf: Introduce __arena_global
  s390/bpf: Support arena atomics
  s390/bpf: Enable arena
  s390/bpf: Support address space cast instruction
  s390/bpf: Support BPF_PROBE_MEM32
  s390/bpf: Land on the next JITed instruction after exception
  s390/bpf: Introduce pre- and post- probe functions
  s390/bpf: Get rid of get_probe_mem_regno()
  ...
====================

Link: https://patch.msgid.link/20240708221438.10974-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-09 17:01:46 +02:00
Wei Yang
9364a7e40d memblock tests: fix implicit declaration of function 'numa_valid_node'
commit 8043832e2a ("memblock: use numa_valid_node() helper to check
for invalid node ID") introduce a new helper numa_valid_node(), which is
not defined in memblock tests.

Let's add it in the corresponding header file.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Mike Rapoport (IBM) <rppt@kernel.org>
Link: https://lore.kernel.org/r/20240624015432.31134-1-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport <rppt@kernel.org>
2024-07-05 11:24:47 +03:00
John Hubbard
c142850fbc selftests/mm: kvm, mdwe fixes to avoid requiring "make headers"
On Ubuntu 23.04, the kvm and mdwe selftests/mm build fails due to
missing a few items that are found in prctl.h. Here is an excerpt of the
build failures:

ksm_tests.c:252:13: error: use of undeclared identifier 'PR_SET_MEMORY_MERGE'
...
mdwe_test.c:26:18: error: use of undeclared identifier 'PR_SET_MDWE'
mdwe_test.c:38:18: error: use of undeclared identifier 'PR_GET_MDWE'

Fix these errors by adding a new tools/include/uapi/linux/prctl.h . This
file was created by running "make headers", and then copying a snapshot
over from ./usr/include/linux/prctl.h, as per the approach we settled on
in [1].

[1] commit e076eaca59 ("selftests: break the dependency upon local
header files")

Link: https://lkml.kernel.org/r/20240618022422.804305-6-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:30:07 -07:00
John Hubbard
e20194725b selftests/mm: fix vm_util.c build failures: add snapshot of fs.h
On Ubuntu 23.04, on a clean git tree, the selftests/mm build fails due 10
or 20 missing items, all of which are found in fs.h, which is created via
"make headers".  However, as per [1], the idea is to stop requiring "make
headers", and instead, take a snapshot of the files and check them in.

Here are a few of the build errors:

vm_util.c:34:21: error: variable has incomplete type 'struct pm_scan_arg'
        struct pm_scan_arg arg;
...
vm_util.c:45:28: error: use of undeclared identifier 'PAGE_IS_WPALLOWED'
...
vm_util.c:55:21: error: variable has incomplete type 'struct page_region'
...
vm_util.c:105:20: error: use of undeclared identifier 'PAGE_IS_SOFT_DIRTY'

To fix this, add fs.h, taken from a snapshot of ./usr/include/linux/fs.h
after running "make headers".

[1] commit e076eaca59 ("selftests: break the dependency upon local
header files")

Link: https://lkml.kernel.org/r/20240618022422.804305-5-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:30:07 -07:00
Thomas Weißschuh
d20d0b10f8 tools/nolibc: implement strerror()
strerror() is commonly used.
For example in kselftest which currently needs to do an #ifdef NOLIBC to
handle the lack of strerror().

Keep it simple and reuse the output format of perror() for strerror().

Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2024-06-29 09:44:57 +02:00
Thomas Weißschuh
0cf24d36bb tools/nolibc: implement strtol() and friends
The implementation always works on uintmax_t values.

This is inefficient when only 32bit are needed.
However for all functions this only happens for strtol() on 32bit
platforms.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20240425-nolibc-strtol-v1-2-bfeef7846902@weissschuh.net
2024-06-29 09:44:55 +02:00
Thomas Weißschuh
8c3bd8bc40 tools/nolibc: add limits for {u,}intmax_t, ulong and {u,}llong
They are useful for users and necessary for strtol() and friends.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20240425-nolibc-strtol-v1-1-bfeef7846902@weissschuh.net
2024-06-29 09:44:54 +02:00
Chengming Zhou
4a24bbabc8 slab: delete useless RED_INACTIVE and RED_ACTIVE
These seem useless since we use the SLUB_RED_INACTIVE and SLUB_RED_ACTIVE,
so just delete them, no functional change.

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Christoph Lameter (Ampere) <cl@linux.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-06-24 21:41:41 +02:00
Jakub Kicinski
4c7d3d79c7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts, no adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13 13:13:46 -07:00
Vadim Fedorenko
a3cfe84cca bpf: Add CHECKSUM_COMPLETE to bpf test progs
Add special flag to validate that TC BPF program properly updates
checksum information in skb.

Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240606145851.229116-1-vadfed@meta.com
2024-06-13 14:29:47 +02:00
Jiri Olsa
3e8e25761a selftests/bpf: Add uretprobe syscall test for regs integrity
Add uretprobe syscall test that compares register values before
and after the uretprobe is hit. It also compares the register
values seen from attached bpf program.

Link: https://lore.kernel.org/all/20240611112158.40795-6-jolsa@kernel.org/

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-06-12 08:44:28 +09:00
Linus Torvalds
b8481381d4 perf tools fixes for v6.10: 2nd batch
- Update copies of kernel headers, which resulted in support for the new
   'mseal' syscall, SUBVOL statx return mask bit, RISC-V and PPC prctls,
   fcntl's DUPFD_QUERY, POSTED_MSI_NOTIFICATION IRQ vector, 'map_shadow_stack'
   syscall for x86-32.
 
 - Revert perf.data record memory allocation optimization that ended up
   causing a regression, work is being done to re-introduce it in the
   next merge window.
 
 - Fix handling of minimal vmlinux.h file used with BPF's CO-RE when
   interrupting the build.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZmWw9QAKCRCyPKLppCJ+
 J/bcAP4vftSGYZ0BDlfLT7KjEUDdc03nvgd5kHAkYM1OyVNgzAD/XRBe2c1f4pUK
 DWj5rjMAFpMfcGTv4TefQoopMZxYXgs=
 =cFei
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.10-2-2024-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Update copies of kernel headers, which resulted in support for the
   new 'mseal' syscall, SUBVOL statx return mask bit, RISC-V and PPC
   prctls, fcntl's DUPFD_QUERY, POSTED_MSI_NOTIFICATION IRQ vector,
   'map_shadow_stack' syscall for x86-32.

 - Revert perf.data record memory allocation optimization that ended up
   causing a regression, work is being done to re-introduce it in the
   next merge window.

 - Fix handling of minimal vmlinux.h file used with BPF's CO-RE when
   interrupting the build.

* tag 'perf-tools-fixes-for-v6.10-2-2024-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf bpf: Fix handling of minimal vmlinux.h file when interrupting the build
  Revert "perf record: Reduce memory for recording PERF_RECORD_LOST_SAMPLES event"
  tools headers arm64: Sync arm64's cputype.h with the kernel sources
  tools headers uapi: Sync linux/stat.h with the kernel sources to pick STATX_SUBVOL
  tools headers UAPI: Update i915_drm.h with the kernel sources
  tools headers UAPI: Sync kvm headers with the kernel sources
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools headers: Update the syscall tables and unistd.h, mostly to support the new 'mseal' syscall
  perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources to pick POSTED_MSI_NOTIFICATION
  perf beauty: Update copy of linux/socket.h with the kernel sources
  tools headers UAPI: Sync fcntl.h with the kernel sources to pick F_DUPFD_QUERY
  tools headers UAPI: Sync linux/prctl.h with the kernel sources
  tools include UAPI: Sync linux/stat.h with the kernel sources
2024-06-09 09:04:51 -07:00
Wei Yang
b73f6b98bb mm/memblock: use PAGE_ALIGN_DOWN to get pgend in free_memmap
Leverage the macro PAGE_ALIGN_DOWN to get pgend.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Link: https://lore.kernel.org/r/20240507075833.6346-7-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-06-05 10:22:22 +03:00
Arnaldo Carvalho de Melo
d6283b160a tools headers uapi: Sync linux/stat.h with the kernel sources to pick STATX_SUBVOL
To pick the changes from:

  2a82bb0294 ("statx: stx_subvol")

This silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/stat.h include/uapi/linux/stat.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZlnK2Fmx_gahzwZI@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-06-03 14:44:28 -03:00
Jakub Kicinski
e19de2064f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/ti/icssg/icssg_classifier.c
  abd5576b9c ("net: ti: icssg-prueth: Add support for ICSSG switch firmware")
  56a5cf538c ("net: ti: icssg-prueth: Fix start counter for ft1 filter")
https://lore.kernel.org/all/20240531123822.3bb7eadf@canb.auug.org.au/

No other adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-31 14:10:28 -07:00
Jakub Kicinski
13c7c941e7 netdev: add qstat for csum complete
Recent commit 0cfe71f45f ("netdev: add queue stats") added
a lot of useful stats, but only those immediately needed by virtio.
Presumably virtio does not support CHECKSUM_COMPLETE,
so statistic for that form of checksumming wasn't included.
Other drivers will definitely need it, in fact we expect it
to be needed in net-next soon (mlx5). So let's add the definition
of the counter for CHECKSUM_COMPLETE to uAPI in net already,
so that the counters are in a more natural order (all subsequent
counters have not been present in any released kernel, yet).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Fixes: 0cfe71f45f ("netdev: add queue stats")
Link: https://lore.kernel.org/r/20240529163547.3693194-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-30 12:15:56 +02:00
Arnaldo Carvalho de Melo
2f523f29d3 tools headers UAPI: Update i915_drm.h with the kernel sources
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-28 16:54:31 -03:00
Arnaldo Carvalho de Melo
88e520512a tools headers UAPI: Sync kvm headers with the kernel sources
To pick the changes in:

  4af663c2f6 ("KVM: SEV: Allow per-guest configuration of GHCB protocol version")
  4f5defae70 ("KVM: SEV: introduce KVM_SEV_INIT2 operation")
  26c44aa9e0 ("KVM: SEV: define VM types for SEV and SEV-ES")
  ac5c48027b ("KVM: SEV: publish supported VMSA features")
  651d61bc8b ("KVM: PPC: Fix documentation for ppc mmu caps")

That don't change functionality in tools/perf, as no new ioctl is added
for the 'perf trace' scripts to harvest.

This addresses these perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
    diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/lkml/ZlYxAdHjyAkvGtMW@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-28 16:49:36 -03:00
Jakub Kicinski
4b3529edbb bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZlWtmQAKCRDbK58LschI
 g0TUAQDT76jx7Rq1DShCtZ3eqiBMNkYczK8b+GqNsSG8YGduaAEA1jn/GN+H65Rh
 atQZ/pYAfLZflMV04+XE0GyBr5q1uQg=
 =NczG
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-05-28

We've added 23 non-merge commits during the last 11 day(s) which contain
a total of 45 files changed, 696 insertions(+), 277 deletions(-).

The main changes are:

1) Rename skb's mono_delivery_time to tstamp_type for extensibility
   and add SKB_CLOCK_TAI type support to bpf_skb_set_tstamp(),
   from Abhishek Chauhan.

2) Add netfilter CT zone ID and direction to bpf_ct_opts so that arbitrary
   CT zones can be used from XDP/tc BPF netfilter CT helper functions,
   from Brad Cowie.

3) Several tweaks to the instruction-set.rst IETF doc to address
   the Last Call review comments, from Dave Thaler.

4) Small batch of riscv64 BPF JIT optimizations in order to emit more
   compressed instructions to the JITed image for better icache efficiency,
   from Xiao Wang.

5) Sort bpftool C dump output from BTF, aiming to simplify vmlinux.h
   diffing and forcing more natural type definitions ordering,
   from Mykyta Yatsenko.

6) Use DEV_STATS_INC() macro in BPF redirect helpers to silence
   a syzbot/KCSAN race report for the tx_errors counter,
   from Jiang Yunshui.

7) Un-constify bpf_func_info in bpftool to fix compilation with LLVM 17+
   which started treating const structs as constants and thus breaking
   full BTF program name resolution, from Ivan Babrou.

8) Fix up BPF program numbers in test_sockmap selftest in order to reduce
   some of the test-internal array sizes, from Geliang Tang.

9) Small cleanup in Makefile.btf script to use test-ge check for v1.25-only
   pahole, from Alan Maguire.

10) Fix bpftool's make dependencies for vmlinux.h in order to avoid needless
    rebuilds in some corner cases, from Artem Savkov.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
  bpf, net: Use DEV_STAT_INC()
  bpf, docs: Fix instruction.rst indentation
  bpf, docs: Clarify call local offset
  bpf, docs: Add table captions
  bpf, docs: clarify sign extension of 64-bit use of 32-bit imm
  bpf, docs: Use RFC 2119 language for ISA requirements
  bpf, docs: Move sentence about returning R0 to abi.rst
  bpf: constify member bpf_sysctl_kern:: Table
  riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT
  riscv, bpf: Use STACK_ALIGN macro for size rounding up
  riscv, bpf: Optimize zextw insn with Zba extension
  selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets
  net: Add additional bit to support clockid_t timestamp type
  net: Rename mono_delivery_time to tstamp_type for scalabilty
  selftests/bpf: Update tests for new ct zone opts for nf_conntrack kfuncs
  net: netfilter: Make ct zone opts configurable for bpf ct helpers
  selftests/bpf: Fix prog numbers in test_sockmap
  bpf: Remove unused variable "prev_state"
  bpftool: Un-const bpf_func_info to fix it for llvm 17 and newer
  bpf: Fix order of args in call to bpf_map_kvcalloc
  ...
====================

Link: https://lore.kernel.org/r/20240528105924.30905-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-28 07:27:29 -07:00
Arnaldo Carvalho de Melo
da42b5229b tools headers: Update the syscall tables and unistd.h, mostly to support the new 'mseal' syscall
But also to wire up shadow stacks on 32-bit x86, picking up those
changes from these csets:

  ff388fe5c4 ("mseal: wire up mseal syscall")
  2883f01ec3 ("x86/shstk: Enable shadow stacks for x32")

This makes 'perf trace' support it, now its possible, for instance to
do:

  # perf trace -e mseal --max-stack=16

Here is an example with the 'sendmmsg' syscall:

  root@x1:~# perf trace -e sendmmsg --max-stack 16 --max-events=1
       0.000 ( 0.062 ms): dbus-broker/1012 sendmmsg(fd: 150, mmsg: 0x7ffef57cca50, vlen: 1, flags: DONTWAIT|NOSIGNAL) = 1
                                         syscall_exit_to_user_mode_prepare ([kernel.kallsyms])
                                         syscall_exit_to_user_mode_prepare ([kernel.kallsyms])
                                         syscall_exit_to_user_mode ([kernel.kallsyms])
                                         do_syscall_64 ([kernel.kallsyms])
                                         entry_SYSCALL_64 ([kernel.kallsyms])
                                         [0x117ce7] (/usr/lib64/libc.so.6 (deleted))
  root@x1:~#

To do a system wide tracing of the new 'mseal' syscall with a backtrace
of at most 16 entries.

This addresses these perf tools build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
    diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
    diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
    diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
    diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H J Lu <hjl.tools@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZlXlo4TNcba4wnVZ@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-28 11:10:00 -03:00
Abhishek Chauhan
1693c5db6a net: Add additional bit to support clockid_t timestamp type
tstamp_type is now set based on actual clockid_t compressed
into 2 bits.

To make the design scalable for future needs this commit bring in
the change to extend the tstamp_type:1 to tstamp_type:2 to support
other clockid_t timestamp.

We now support CLOCK_TAI as part of tstamp_type as part of this
commit with existing support CLOCK_MONOTONIC and CLOCK_REALTIME.

Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509211834.3235191-3-quic_abchauha@quicinc.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-23 14:14:36 -07:00
Linus Torvalds
29c73fc794 perf tools fixes and improvements for v6.10:
- Add Kan Liang to MAINTAINERS as a perf tools reviewer.
 
 - Add support for using the 'capstone' disassembler library in various tools,
   such as 'perf script' and 'perf annotate'. This is an alternative for the
   use of the 'xed' and 'objdump' disassemblers.
 
 - Data-type profiling improvements:
 
   Resolve types for a->b->c by backtracking the assignments until it finds
   DWARF info for one of those members
 
   Support for global variables, keeping a cache to speed up lookups.
 
   Handle the 'call' instruction, dealing with effects on registers and handling
   its return when tracking register data types.
 
   Handle x86's segment based addressing like %gs:0x28, to support things like
   per CPU variables, the stack canary, etc.
 
   Data-type profiling got big speedups when using capstone for disassembling.
   The objdump outoput parsing method is left as a fallback when capstone fails or
   isn't available. There are patches posted for 6.11 that to use a LLVM
   disassembler.
 
   Support event group display in the TUI when annotating types with --data-type,
   for instance to show memory load and store events for the data type fields.
 
   Optimize the 'perf annotate' data structures, reducing memory usage.
 
   Add a initial 'perf test' for 'perf annotate', checking that a target symbol
   appears on the output, specifying objdump via the command line, etc.
 
 - Integrate the shellcheck utility with the build of perf to allow catching
   shell problems early in areas such as 'perf test', 'perf trace' scrape
   scripts, etc.
 
 - Add 'uretprobe' variant in the 'perf bench uprobe' tool.
 
 - Add script to run instances of 'perf script' in parallel.
 
 - Allow parsing tracepoint names that start with digits, such as
   9p/9p_client_req, etc. Make sure 'perf test' tests it even on systems
   where those tracepoints aren't available.
 
 Vendor Events:
 
 - Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand Ridge, Ice
   Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra Forest, Sky Lake X,
   Sky Lake and Snow Ridge X.  Remove info metrics erroneously in TopdownL1.
 
 - Add AMD's Zen 5 core and uncore events and metrics. Those come from the
   "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh Processors"
   document, with events that capture information on op dispatch, execution and
   retirement, branch prediction, L1 and L2 cache activity, TLB activity, etc.
 
 - Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/AmpereOneX.
 
 Miscellaneous:
 
 - Sync header copies with the kernel sources.
 
 - Move some header copies used only for generating translation string tables
   for ioctl cmds and other syscall integer arguments to a new directory under
   tools/perf/beauty/, to separate from copies in tools/include/ that are used
   to build the tools.
 
 - Introduce scrape script for several syscall 'flags'/'mask' arguments.
 
 - Improve cpumap utilization, fixing up pairing of refcounts, using the right
   iterators (perf_cpu_map__for_each_cpu), etc.
 
 - Give more details about raw event encodings in 'perf list', show tracepoint
   encoding in the detailed output.
 
 - Refactor the DSOs handling code, reducing memory usage.
 
 - Document the BPF event modifier and add a 'perf test' for it.
 
 - Improve the event parser, better error messages and add further 'perf test's
   for it.
 
 - Add reference count checking to 'struct comm_str' and 'struct mem_info'.
 
 - Make ARM64's 'perf test' entries for the Neoverse N1 more robust.
 
 - Tweak the ARM64's Coresight 'perf test's.
 
 - Improve ARM64's CoreSight ETM version detection and error reporting.
 
 - Fix handling of symbols when using kcore.
 
 - Fix PAI (Processor Activity Instrumentation) counter names for s390 virtual
   machines in 'perf report'.
 
 - Fix -g/--call-graph option failure in 'perf sched timehist'.
 
 - Add LIBTRACEEVENT_DIR build option to allow building with libtraceevent
   installed in non-standard directories, such as when doing cross builds.
 
 - Various 'perf test' and 'perf bench' fixes.
 
 - Improve 'perf probe' error message for long C++ probe names.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZkzdjgAKCRCyPKLppCJ+
 J8ZYAP46rcbOuDWol6tjD9FDXd+spkWc40bnqeSnOR+TWlmJXwEA87XU4+3LAh6p
 HQxKXehJRh90I90yn954mK2NuN+58Q0=
 =sie+
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Arnaldo Carvalho de Melo:
 "General:

   - Integrate the shellcheck utility with the build of perf to allow
     catching shell problems early in areas such as 'perf test', 'perf
     trace' scrape scripts, etc

   - Add 'uretprobe' variant in the 'perf bench uprobe' tool

   - Add script to run instances of 'perf script' in parallel

   - Allow parsing tracepoint names that start with digits, such as
     9p/9p_client_req, etc. Make sure 'perf test' tests it even on
     systems where those tracepoints aren't available

   - Add Kan Liang to MAINTAINERS as a perf tools reviewer

   - Add support for using the 'capstone' disassembler library in
     various tools, such as 'perf script' and 'perf annotate'. This is
     an alternative for the use of the 'xed' and 'objdump' disassemblers

  Data-type profiling improvements:

   - Resolve types for a->b->c by backtracking the assignments until it
     finds DWARF info for one of those members

   - Support for global variables, keeping a cache to speed up lookups

   - Handle the 'call' instruction, dealing with effects on registers
     and handling its return when tracking register data types

   - Handle x86's segment based addressing like %gs:0x28, to support
     things like per CPU variables, the stack canary, etc

   - Data-type profiling got big speedups when using capstone for
     disassembling. The objdump outoput parsing method is left as a
     fallback when capstone fails or isn't available. There are patches
     posted for 6.11 that to use a LLVM disassembler

   - Support event group display in the TUI when annotating types with
     --data-type, for instance to show memory load and store events for
     the data type fields

   - Optimize the 'perf annotate' data structures, reducing memory usage

   - Add a initial 'perf test' for 'perf annotate', checking that a
     target symbol appears on the output, specifying objdump via the
     command line, etc

  Vendor Events:

   - Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand
     Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra
     Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics
     erroneously in TopdownL1

   - Add AMD's Zen 5 core and uncore events and metrics. Those come from
     the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh
     Processors" document, with events that capture information on op
     dispatch, execution and retirement, branch prediction, L1 and L2
     cache activity, TLB activity, etc

   - Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/
     AmpereOneX

  Miscellaneous:

   - Sync header copies with the kernel sources

   - Move some header copies used only for generating translation string
     tables for ioctl cmds and other syscall integer arguments to a new
     directory under tools/perf/beauty/, to separate from copies in
     tools/include/ that are used to build the tools

   - Introduce scrape script for several syscall 'flags'/'mask'
     arguments

   - Improve cpumap utilization, fixing up pairing of refcounts, using
     the right iterators (perf_cpu_map__for_each_cpu), etc

   - Give more details about raw event encodings in 'perf list', show
     tracepoint encoding in the detailed output

   - Refactor the DSOs handling code, reducing memory usage

   - Document the BPF event modifier and add a 'perf test' for it

   - Improve the event parser, better error messages and add further
     'perf test's for it

   - Add reference count checking to 'struct comm_str' and 'struct
     mem_info'

   - Make ARM64's 'perf test' entries for the Neoverse N1 more robust

   - Tweak the ARM64's Coresight 'perf test's

   - Improve ARM64's CoreSight ETM version detection and error reporting

   - Fix handling of symbols when using kcore

   - Fix PAI (Processor Activity Instrumentation) counter names for s390
     virtual machines in 'perf report'

   - Fix -g/--call-graph option failure in 'perf sched timehist'

   - Add LIBTRACEEVENT_DIR build option to allow building with
     libtraceevent installed in non-standard directories, such as when
     doing cross builds

   - Various 'perf test' and 'perf bench' fixes

   - Improve 'perf probe' error message for long C++ probe names"

* tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (260 commits)
  tools lib subcmd: Show parent options in help
  perf pmu: Count sys and cpuid JSON events separately
  perf stat: Don't display metric header for non-leader uncore events
  perf annotate-data: Ensure the number of type histograms
  perf annotate: Fix segfault on sample histogram
  perf daemon: Fix file leak in daemon_session__control
  libsubcmd: Fix parse-options memory leak
  perf lock: Avoid memory leaks from strdup()
  perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency
  perf tools: Ignore deleted cgroups
  perf parse: Allow tracepoint names to start with digits
  perf parse-events: Add new 'fake_tp' parameter for tests
  perf parse-events: pass parse_state to add_tracepoint
  perf symbols: Fix ownership of string in dso__load_vmlinux()
  perf symbols: Update kcore map before merging in remaining symbols
  perf maps: Re-use __maps__free_maps_by_name()
  perf symbols: Remove map from list before updating addresses
  perf tracepoint: Don't scan all tracepoints to test if one exists
  perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT
  perf thread: Fixes to thread__new() related to initializing comm
  ...
2024-05-21 15:45:14 -07:00
Linus Torvalds
3eb3c33c1d asm-generic cleanups for 6.10
These are a few cross-architecture cleanup patches:
 
  - Thomas Zimmermann works on separating fbdev support from the asm/video.h
    contents that may be used by either the old fbdev drivers or the
    newer drm display code.
 
  - Thorsten Blum contributes cleanups for the generic bitops code
    and asm-generic/bug.h
 
  - I remove the orphaned include/asm-generic/page.h header that used to
    included by long-removed mmu-less architectures.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmZLvewACgkQYKtH/8kJ
 UicUEQ//b5WVLOVXkFGlQvAaZkagOLEF8xSTnchA7aKrWQ/C6hSwLN6CQU6MAY7j
 Fe54jYQtjwBwpVIj3jn20xiXP/pZbQp9aldkOx4v8YoGnjNF5UWLHm5510DV1ecE
 0LF/2YIH25vIXGY6MVm6sFq+nkDgWZee6fBFNc3GsCu2y0biD1Gob9xH/ngCHjIj
 tw9KS/j6MivPy/9vJ/Ml2YeutV6+pUA9hNmSrbSVlXSWFh3Wq6IZ+j6bNEftqtZY
 xdnYwdVfReOCIayq6hSHhAgIp/uw8JOqLuE2JNwG/9sSF4zp4ZHLvTaMhqEoCpyB
 3kZYd1qQTwV3eL5PyYtRcW03KvbhfZpMPzZT+wbl9SNPUljC2MSVeSFF30Uqatgb
 yUJ9d/vlb1ynu1yQrFfTZ/kK+U0pPByydwLybcMtEIZ6Hrb1h/eRicvHhUx7bKUB
 H9z/FN/TxGY+tPradx2lqm3J1wNu0ox8DUreXjtlJijKIUZQeAkJrGJgr6i6XLBz
 crwgKzuQUClzEjBcoWzuTVUB7v19jaDuHMsaBBu8O9f1g5FnEIJlItqnXf1J0Dno
 rJy68Mxsg4Dzt4YI3lpOJGDDDPhpOTBXfgsjkuru2MrdFMgZQh+DYLl3qOkJ4DJe
 rdiEJb9PygBaGGQnoXO71oOLf5yQuenj+Fg5GIe9AQrci5fXwRQ=
 =riCs
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanups from Arnd Bergmann:
 "These are a few cross-architecture cleanup patches:

   - separate out fbdev support from the asm/video.h contents that may
     be used by either the old fbdev drivers or the newer drm display
     code (Thomas Zimmermann)

   - cleanups for the generic bitops code and asm-generic/bug.h
     (Thorsten Blum)

   - remove the orphaned include/asm-generic/page.h header that used to
     be included by long-removed mmu-less architectures (me)"

* tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  arch: Fix name collision with ACPI's video.o
  bug: Improve comment
  asm-generic: remove unused asm-generic/page.h
  arch: Rename fbdev header and source files
  arch: Remove struct fb_info from video helpers
  arch: Select fbdev helpers with CONFIG_VIDEO
  bitops: Change function return types from long to int
2024-05-20 15:18:34 -07:00
Linus Torvalds
eb6a9339ef Mainly singleton patches, documented in their respective changelogs.
Notable series include:
 
 - Some maintenance and performance work for ocfs2 in Heming Zhao's
   series "improve write IO performance when fragmentation is high".
 
 - Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes
   exposed by fstests".
 
 - kfifo header rework from Andy Shevchenko in the series "kfifo: Clean
   up kfifo.h".
 
 - GDB script fixes from Florian Rommel in the series "scripts/gdb: Fixes
   for $lx_current and $lx_per_cpu".
 
 - After much discussion, a coding-style update from Barry Song
   explaining one reason why inline functions are preferred over macros.
   The series is "codingstyle: avoid unused parameters for a function-like
   macro".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkpLYQAKCRDdBJ7gKXxA
 jo9NAQDctSD3TMXqxqCHLaEpCaYTYzi6TGAVHjgkqGzOt7tYjAD/ZIzgcmRwthjP
 R7SSiSgZ7UnP9JRn16DQILmFeaoG1gs=
 =lYhr
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-mm updates from Andrew Morton:
 "Mainly singleton patches, documented in their respective changelogs.
  Notable series include:

   - Some maintenance and performance work for ocfs2 in Heming Zhao's
     series "improve write IO performance when fragmentation is high".

   - Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes
     exposed by fstests".

   - kfifo header rework from Andy Shevchenko in the series "kfifo:
     Clean up kfifo.h".

   - GDB script fixes from Florian Rommel in the series "scripts/gdb:
     Fixes for $lx_current and $lx_per_cpu".

   - After much discussion, a coding-style update from Barry Song
     explaining one reason why inline functions are preferred over
     macros. The series is "codingstyle: avoid unused parameters for a
     function-like macro""

* tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (62 commits)
  fs/proc: fix softlockup in __read_vmcore
  nilfs2: convert BUG_ON() in nilfs_finish_roll_forward() to WARN_ON()
  scripts: checkpatch: check unused parameters for function-like macro
  Documentation: coding-style: ask function-like macros to evaluate parameters
  nilfs2: use __field_struct() for a bitwise field
  selftests/kcmp: remove unused open mode
  nilfs2: remove calls to folio_set_error() and folio_clear_error()
  kernel/watchdog_perf.c: tidy up kerneldoc
  watchdog: allow nmi watchdog to use raw perf event
  watchdog: handle comma separated nmi_watchdog command line
  nilfs2: make superblock data array index computation sparse friendly
  squashfs: remove calls to set the folio error flag
  squashfs: convert squashfs_symlink_read_folio to use folio APIs
  scripts/gdb: fix detection of current CPU in KGDB
  scripts/gdb: make get_thread_info accept pointers
  scripts/gdb: fix parameter handling in $lx_per_cpu
  scripts/gdb: fix failing KGDB detection during probe
  kfifo: don't use "proxy" headers
  media: stih-cec: add missing io.h
  media: rc: add missing io.h
  ...
2024-05-19 14:02:03 -07:00
Linus Torvalds
61307b7be4 The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.  Notable
 series include:
 
 - Lucas Stach has provided some page-mapping
   cleanup/consolidation/maintainability work in the series "mm/treewide:
   Remove pXd_huge() API".
 
 - In the series "Allow migrate on protnone reference with
   MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
   MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
   test.
 
 - In their series "Memory allocation profiling" Kent Overstreet and
   Suren Baghdasaryan have contributed a means of determining (via
   /proc/allocinfo) whereabouts in the kernel memory is being allocated:
   number of calls and amount of memory.
 
 - Matthew Wilcox has provided the series "Various significant MM
   patches" which does a number of rather unrelated things, but in largely
   similar code sites.
 
 - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
   Weiner has fixed the page allocator's handling of migratetype requests,
   with resulting improvements in compaction efficiency.
 
 - In the series "make the hugetlb migration strategy consistent" Baolin
   Wang has fixed a hugetlb migration issue, which should improve hugetlb
   allocation reliability.
 
 - Liu Shixin has hit an I/O meltdown caused by readahead in a
   memory-tight memcg.  Addressed in the series "Fix I/O high when memory
   almost met memcg limit".
 
 - In the series "mm/filemap: optimize folio adding and splitting" Kairui
   Song has optimized pagecache insertion, yielding ~10% performance
   improvement in one test.
 
 - Baoquan He has cleaned up and consolidated the early zone
   initialization code in the series "mm/mm_init.c: refactor
   free_area_init_core()".
 
 - Baoquan has also redone some MM initializatio code in the series
   "mm/init: minor clean up and improvement".
 
 - MM helper cleanups from Christoph Hellwig in his series "remove
   follow_pfn".
 
 - More cleanups from Matthew Wilcox in the series "Various page->flags
   cleanups".
 
 - Vlastimil Babka has contributed maintainability improvements in the
   series "memcg_kmem hooks refactoring".
 
 - More folio conversions and cleanups in Matthew Wilcox's series
 
 	"Convert huge_zero_page to huge_zero_folio"
 	"khugepaged folio conversions"
 	"Remove page_idle and page_young wrappers"
 	"Use folio APIs in procfs"
 	"Clean up __folio_put()"
 	"Some cleanups for memory-failure"
 	"Remove page_mapping()"
 	"More folio compat code removal"
 
 - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
   functions to work on folis".
 
 - Code consolidation and cleanup work related to GUP's handling of
   hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
 
 - Rick Edgecombe has developed some fixes to stack guard gaps in the
   series "Cover a guard gap corner case".
 
 - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
   "mm/ksm: fix ksm exec support for prctl".
 
 - Baolin Wang has implemented NUMA balancing for multi-size THPs.  This
   is a simple first-cut implementation for now.  The series is "support
   multi-size THP numa balancing".
 
 - Cleanups to vma handling helper functions from Matthew Wilcox in the
   series "Unify vma_address and vma_pgoff_address".
 
 - Some selftests maintenance work from Dev Jain in the series
   "selftests/mm: mremap_test: Optimizations and style fixes".
 
 - Improvements to the swapping of multi-size THPs from Ryan Roberts in
   the series "Swap-out mTHP without splitting".
 
 - Kefeng Wang has significantly optimized the handling of arm64's
   permission page faults in the series
 
 	"arch/mm/fault: accelerate pagefault when badaccess"
 	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
 
 - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
   GUP-fast".
 
 - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
   use struct vm_fault".
 
 - selftests build fixes from John Hubbard in the series "Fix
   selftests/mm build without requiring "make headers"".
 
 - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
   series "Improved Memory Tier Creation for CPUless NUMA Nodes".  Fixes
   the initialization code so that migration between different memory types
   works as intended.
 
 - David Hildenbrand has improved follow_pte() and fixed an errant driver
   in the series "mm: follow_pte() improvements and acrn follow_pte()
   fixes".
 
 - David also did some cleanup work on large folio mapcounts in his
   series "mm: mapcount for large folios + page_mapcount() cleanups".
 
 - Folio conversions in KSM in Alex Shi's series "transfer page to folio
   in KSM".
 
 - Barry Song has added some sysfs stats for monitoring multi-size THP's
   in the series "mm: add per-order mTHP alloc and swpout counters".
 
 - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
   and limit checking cleanups".
 
 - Matthew Wilcox has been looking at buffer_head code and found the
   documentation to be lacking.  The series is "Improve buffer head
   documentation".
 
 - Multi-size THPs get more work, this time from Lance Yang.  His series
   "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
   the freeing of these things.
 
 - Kemeng Shi has added more userspace-visible writeback instrumentation
   in the series "Improve visibility of writeback".
 
 - Kemeng Shi then sent some maintenance work on top in the series "Fix
   and cleanups to page-writeback".
 
 - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
   series "Improve anon_vma scalability for anon VMAs".  Intel's test bot
   reported an improbable 3x improvement in one test.
 
 - SeongJae Park adds some DAMON feature work in the series
 
 	"mm/damon: add a DAMOS filter type for page granularity access recheck"
 	"selftests/damon: add DAMOS quota goal test"
 
 - Also some maintenance work in the series
 
 	"mm/damon/paddr: simplify page level access re-check for pageout"
 	"mm/damon: misc fixes and improvements"
 
 - David Hildenbrand has disabled some known-to-fail selftests ni the
   series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
 
 - memcg metadata storage optimizations from Shakeel Butt in "memcg:
   reduce memory consumption by memcg stats".
 
 - DAX fixes and maintenance work from Vishal Verma in the series
   "dax/bus.c: Fixups for dax-bus locking".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
 jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
 nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
 =V3R/
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm updates from Andrew Morton:
 "The usual shower of singleton fixes and minor series all over MM,
  documented (hopefully adequately) in the respective changelogs.
  Notable series include:

   - Lucas Stach has provided some page-mapping cleanup/consolidation/
     maintainability work in the series "mm/treewide: Remove pXd_huge()
     API".

   - In the series "Allow migrate on protnone reference with
     MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
     MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
     one test.

   - In their series "Memory allocation profiling" Kent Overstreet and
     Suren Baghdasaryan have contributed a means of determining (via
     /proc/allocinfo) whereabouts in the kernel memory is being
     allocated: number of calls and amount of memory.

   - Matthew Wilcox has provided the series "Various significant MM
     patches" which does a number of rather unrelated things, but in
     largely similar code sites.

   - In his series "mm: page_alloc: freelist migratetype hygiene"
     Johannes Weiner has fixed the page allocator's handling of
     migratetype requests, with resulting improvements in compaction
     efficiency.

   - In the series "make the hugetlb migration strategy consistent"
     Baolin Wang has fixed a hugetlb migration issue, which should
     improve hugetlb allocation reliability.

   - Liu Shixin has hit an I/O meltdown caused by readahead in a
     memory-tight memcg. Addressed in the series "Fix I/O high when
     memory almost met memcg limit".

   - In the series "mm/filemap: optimize folio adding and splitting"
     Kairui Song has optimized pagecache insertion, yielding ~10%
     performance improvement in one test.

   - Baoquan He has cleaned up and consolidated the early zone
     initialization code in the series "mm/mm_init.c: refactor
     free_area_init_core()".

   - Baoquan has also redone some MM initializatio code in the series
     "mm/init: minor clean up and improvement".

   - MM helper cleanups from Christoph Hellwig in his series "remove
     follow_pfn".

   - More cleanups from Matthew Wilcox in the series "Various
     page->flags cleanups".

   - Vlastimil Babka has contributed maintainability improvements in the
     series "memcg_kmem hooks refactoring".

   - More folio conversions and cleanups in Matthew Wilcox's series:
	"Convert huge_zero_page to huge_zero_folio"
	"khugepaged folio conversions"
	"Remove page_idle and page_young wrappers"
	"Use folio APIs in procfs"
	"Clean up __folio_put()"
	"Some cleanups for memory-failure"
	"Remove page_mapping()"
	"More folio compat code removal"

   - David Hildenbrand chipped in with "fs/proc/task_mmu: convert
     hugetlb functions to work on folis".

   - Code consolidation and cleanup work related to GUP's handling of
     hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".

   - Rick Edgecombe has developed some fixes to stack guard gaps in the
     series "Cover a guard gap corner case".

   - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
     series "mm/ksm: fix ksm exec support for prctl".

   - Baolin Wang has implemented NUMA balancing for multi-size THPs.
     This is a simple first-cut implementation for now. The series is
     "support multi-size THP numa balancing".

   - Cleanups to vma handling helper functions from Matthew Wilcox in
     the series "Unify vma_address and vma_pgoff_address".

   - Some selftests maintenance work from Dev Jain in the series
     "selftests/mm: mremap_test: Optimizations and style fixes".

   - Improvements to the swapping of multi-size THPs from Ryan Roberts
     in the series "Swap-out mTHP without splitting".

   - Kefeng Wang has significantly optimized the handling of arm64's
     permission page faults in the series
	"arch/mm/fault: accelerate pagefault when badaccess"
	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"

   - GUP cleanups from David Hildenbrand in "mm/gup: consistently call
     it GUP-fast".

   - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
     path to use struct vm_fault".

   - selftests build fixes from John Hubbard in the series "Fix
     selftests/mm build without requiring "make headers"".

   - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
     series "Improved Memory Tier Creation for CPUless NUMA Nodes".
     Fixes the initialization code so that migration between different
     memory types works as intended.

   - David Hildenbrand has improved follow_pte() and fixed an errant
     driver in the series "mm: follow_pte() improvements and acrn
     follow_pte() fixes".

   - David also did some cleanup work on large folio mapcounts in his
     series "mm: mapcount for large folios + page_mapcount() cleanups".

   - Folio conversions in KSM in Alex Shi's series "transfer page to
     folio in KSM".

   - Barry Song has added some sysfs stats for monitoring multi-size
     THP's in the series "mm: add per-order mTHP alloc and swpout
     counters".

   - Some zswap cleanups from Yosry Ahmed in the series "zswap
     same-filled and limit checking cleanups".

   - Matthew Wilcox has been looking at buffer_head code and found the
     documentation to be lacking. The series is "Improve buffer head
     documentation".

   - Multi-size THPs get more work, this time from Lance Yang. His
     series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
     optimizes the freeing of these things.

   - Kemeng Shi has added more userspace-visible writeback
     instrumentation in the series "Improve visibility of writeback".

   - Kemeng Shi then sent some maintenance work on top in the series
     "Fix and cleanups to page-writeback".

   - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
     the series "Improve anon_vma scalability for anon VMAs". Intel's
     test bot reported an improbable 3x improvement in one test.

   - SeongJae Park adds some DAMON feature work in the series
	"mm/damon: add a DAMOS filter type for page granularity access recheck"
	"selftests/damon: add DAMOS quota goal test"

   - Also some maintenance work in the series
	"mm/damon/paddr: simplify page level access re-check for pageout"
	"mm/damon: misc fixes and improvements"

   - David Hildenbrand has disabled some known-to-fail selftests ni the
     series "selftests: mm: cow: flag vmsplice() hugetlb tests as
     XFAIL".

   - memcg metadata storage optimizations from Shakeel Butt in "memcg:
     reduce memory consumption by memcg stats".

   - DAX fixes and maintenance work from Vishal Verma in the series
     "dax/bus.c: Fixups for dax-bus locking""

* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
  memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
  selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
  selftests: cgroup: add tests to verify the zswap writeback path
  mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
  mm/damon/core: fix return value from damos_wmark_metric_value
  mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
  selftests: cgroup: remove redundant enabling of memory controller
  Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
  Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
  Docs/mm/damon/design: use a list for supported filters
  Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
  Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
  selftests/damon: classify tests for functionalities and regressions
  selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
  selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
  selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
  mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
  selftests/damon: add a test for DAMOS quota goal
  ...
2024-05-19 09:21:03 -07:00
Linus Torvalds
ff2632d7d0 powerpc updates for 6.10
- Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.
 
  - Allow per-process DEXCR (Dynamic Execution Control Register) settings via
    prctl, notably NPHIE which controls hashst/hashchk for ROP protection.
 
  - Install powerpc selftests in sub-directories. Note this changes the way
    run_kselftest.sh needs to be invoked for powerpc selftests.
 
  - Change fadump (Firmware Assisted Dump) to better handle memory add/remove.
 
  - Add support for passing additional parameters to the fadump kernel.
 
  - Add support for updating the kdump image on CPU/memory add/remove events.
 
  - Other small features, cleanups and fixes.
 
 Thanks to: Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann,
 Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe Jaillet, Christophe
 Leroy, Colin Ian King, Cédric Le Goater, Dr. David Alan Gilbert, Erhard Furtner,
 Frank Li, GUO Zihua, Ganesh Goudar, Geoff Levand, Ghanshyam Agrawal, Greg Kurz,
 Hari Bathini, Joel Stanley, Justin Stitt, Kunwu Chan, Li Yang, Lidong Zhong,
 Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Matthias Schiffer,
 Naresh Kamboju, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas
 Miehlbradt, Ran Wang, Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta,
 Shrikanth Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
 Jain, Xiaowei Bao, Yang Li, Zhao Chenhui.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmZHLtwTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgCGdD/0cqQkYl6+E0/K68Y7jnAWF+l0LNFlm
 /4jZ+zKXPiPhSdaQq4xo2ZjEooUPsm3c+AHidmrAtOMBULvv4pyciu61hrVu4Y2b
 aAudkBMUc+i/Lfaz7fq1KnN4LDFVm7xZZ+i/ju9tOBLMpOZ3YZ+YoOGA6nqsshJF
 XuB5h0T+H55he1wBpvyyrsUUyss53Mp3IsajxdwBOsUDDp0fSAg8SLEyhoiK3BsQ
 EjEa6iEqJSBheqFEXPvqsMuqM3k51CHe/pCOMODjo7P+u/MNrClZUscZKXGB5xq9
 Bu3SPxIYfRmU4XE53517faElEPmlxSBrjQGCD1EGEVXGsjn6r7TD6R5voow3SoUq
 CLTy90KNNrS1cIqeomu6bJ/anzYrViqTdekImA7Vb+Ol8f+uT9l+l1D75eYOKPQ3
 N0AHoa4rnWIb5kjCAjHaZ54O+B2q2tPlQqFUmt+BrvZyKS13zjE36stnArxP3MPC
 Xw6y3huX3AkZiJ4mQYRiBn//xGOLwrRCd/EoTDnoe08yq0Hoor6qIm4uEy2Nu3Kf
 0mBsEOxMsmQd6NEq43B/sFgVbbxKhAyxfZ9gHqxDQZcgoxXcMesyj/n4+jM5sRYK
 zmavLlykM2Tjlh1evs8+e0mCEwDjDn2GRlqstJQTrmnGhbMKi3jvw9I7gGtZVqbS
 kAflTXzsIXvxBA==
 =GoCV
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.

 - Allow per-process DEXCR (Dynamic Execution Control Register) settings
   via prctl, notably NPHIE which controls hashst/hashchk for ROP
   protection.

 - Install powerpc selftests in sub-directories. Note this changes the
   way run_kselftest.sh needs to be invoked for powerpc selftests.

 - Change fadump (Firmware Assisted Dump) to better handle memory
   add/remove.

 - Add support for passing additional parameters to the fadump kernel.

 - Add support for updating the kdump image on CPU/memory add/remove
   events.

 - Other small features, cleanups and fixes.

Thanks to Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd
Bergmann, Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe
Jaillet, Christophe Leroy, Colin Ian King, Cédric Le Goater, Dr. David
Alan Gilbert, Erhard Furtner, Frank Li, GUO Zihua, Ganesh Goudar, Geoff
Levand, Ghanshyam Agrawal, Greg Kurz, Hari Bathini, Joel Stanley, Justin
Stitt, Kunwu Chan, Li Yang, Lidong Zhong, Madhavan Srinivasan, Mahesh
Salgaonkar, Masahiro Yamada, Matthias Schiffer, Naresh Kamboju, Nathan
Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Miehlbradt, Ran Wang,
Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta, Shrikanth
Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
Jain, Xiaowei Bao, Yang Li, and Zhao Chenhui.

* tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (85 commits)
  powerpc/fadump: Fix section mismatch warning
  powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP
  powerpc/fadump: update documentation about bootargs_append
  powerpc/fadump: pass additional parameters when fadump is active
  powerpc/fadump: setup additional parameters for dump capture kernel
  powerpc/pseries/fadump: add support for multiple boot memory regions
  selftests/powerpc/dexcr: Fix spelling mistake "predicition" -> "prediction"
  KVM: PPC: Book3S HV nestedv2: Fix an error handling path in gs_msg_ops_kvmhv_nestedv2_config_fill_info()
  KVM: PPC: Fix documentation for ppc mmu caps
  KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver
  KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception
  powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#"
  powerpc/code-patching: Use dedicated memory routines for patching
  powerpc/code-patching: Test patch_instructions() during boot
  powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region()
  powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX
  powerpc: Fix typos
  powerpc/eeh: Fix spelling of the word "auxillary" and update comment
  macintosh/ams: Fix unused variable warning
  powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large
  ...
2024-05-17 09:05:46 -07:00
Linus Torvalds
1b294a1f35 Networking changes for 6.10.
Core & protocols
 ----------------
 
  - Complete rework of garbage collection of AF_UNIX sockets.
    AF_UNIX is prone to forming reference count cycles due to fd passing
    functionality. New method based on Tarjan's Strongly Connected Components
    algorithm should be both faster and remove a lot of workarounds
    we accumulated over the years.
 
  - Add TCP fraglist GRO support, allowing chaining multiple TCP packets
    and forwarding them together. Useful for small switches / routers which
    lack basic checksum offload in some scenarios (e.g. PPPoE).
 
  - Support using SMP threads for handling packet backlog i.e. packet
    processing from software interfaces and old drivers which don't
    use NAPI. This helps move the processing out of the softirq jumble.
 
  - Continue work of converting from rtnl lock to RCU protection.
    Don't require rtnl lock when reading: IPv6 routing FIB, IPv6 address
    labels, netdev threaded NAPI sysfs files, bonding driver's sysfs files,
    MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics, TC Qdiscs,
    neighbor entries, ARP entries via ioctl(SIOCGARP), a lot of the link
    information available via rtnetlink.
 
  - Small optimizations from Eric to UDP wake up handling, memory accounting,
    RPS/RFS implementation, TCP packet sizing etc.
 
  - Allow direct page recycling in the bulk API used by XDP, for +2% PPS.
 
  - Support peek with an offset on TCP sockets.
 
  - Add MPTCP APIs for querying last time packets were received/sent/acked,
    and whether MPTCP "upgrade" succeeded on a TCP socket.
 
  - Add intra-node communication shortcut to improve SMC performance.
 
  - Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol driver.
 
  - Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
 
  - Add reset reasons for tracing what caused a TCP reset to be sent.
 
  - Introduce direction attribute for xfrm (IPSec) states.
    State can be used either for input or output packet processing.
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
    This required touch-ups and renaming of a few existing users.
 
  - Add Endian-dependent __counted_by_{le,be} annotations.
 
  - Make building selftests "quieter" by printing summaries like
    "CC object.o" rather than full commands with all the arguments.
 
 Netfilter
 ---------
 
  - Use GFP_KERNEL to clone elements, to deal better with OOM situations
    and avoid failures in the .commit step.
 
 BPF
 ---
 
  - Add eBPF JIT for ARCv2 CPUs.
 
  - Support attaching kprobe BPF programs through kprobe_multi link in
    a session mode, meaning, a BPF program is attached to both function entry
    and return, the entry program can decide if the return program gets
    executed and the entry program can share u64 cookie value with return
    program. "Session mode" is a common use-case for tetragon and bpftrace.
 
  - Add the ability to specify and retrieve BPF cookie for raw tracepoint
    programs in order to ease migration from classic to raw tracepoints.
 
  - Add an internal-only BPF per-CPU instruction for resolving per-CPU
    memory addresses and implement support in x86, ARM64 and RISC-V JITs.
    This allows inlining functions which need to access per-CPU state.
 
  - Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
    atomics in bpf_arena which can be JITed as a single x86 instruction.
    Support BPF arena on ARM64.
 
  - Add a new bpf_wq API for deferring events and refactor process-context
    bpf_timer code to keep common code where possible.
 
  - Harden the BPF verifier's and/or/xor value tracking.
 
  - Introduce crypto kfuncs to let BPF programs call kernel crypto APIs.
 
  - Support bpf_tail_call_static() helper for BPF programs with GCC 13.
 
  - Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
    program to have code sections where preemption is disabled.
 
 Driver API
 ----------
 
  - Skip software TC processing completely if all installed rules are
    marked as HW-only, instead of checking the HW-only flag rule by rule.
 
  - Add support for configuring PoE (Power over Ethernet), similar to
    the already existing support for PoDL (Power over Data Line) config.
 
  - Initial bits of a queue control API, for now allowing a single queue
    to be reset without disturbing packet flow to other queues.
 
  - Common (ethtool) statistics for hardware timestamping.
 
 Tests and tooling
 -----------------
 
  - Remove the need to create a config file to run the net forwarding tests
    so that a naive "make run_tests" can exercise them.
 
  - Define a method of writing tests which require an external endpoint
    to communicate with (to send/receive data towards the test machine).
    Add a few such tests.
 
  - Create a shared code library for writing Python tests. Expose the YAML
    Netlink library from tools/ to the tests for easy Netlink access.
 
  - Move netfilter tests under net/, extend them, separate performance tests
    from correctness tests, and iron out issues found by running them
    "on every commit".
 
  - Refactor BPF selftests to use common network helpers.
 
  - Further work filling in YAML definitions of Netlink messages for:
    nftables, team driver, bonding interfaces, vlan interfaces, VF info,
    TC u32 mark, TC police action.
 
  - Teach Python YAML Netlink to decode attribute policies.
 
  - Extend the definition of the "indexed array" construct in the specs
    to cover arrays of scalars rather than just nests.
 
  - Add hyperlinks between definitions in generated Netlink docs.
 
 Drivers
 -------
 
  - Make sure unsupported flower control flags are rejected by drivers,
    and make more drivers report errors directly to the application rather
    than dmesg (large number of driver changes from Asbjørn Sloth Tønnesen).
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt):
      - support multiple RSS contexts and steering traffic to them
      - support XDP metadata
      - make page pool allocations more NUMA aware
    - Intel (100G, ice, idpf):
      - extract datapath code common among Intel drivers into a library
      - use fewer resources in switchdev by sharing queues with the PF
      - add PFCP filter support
      - add Ethernet filter support
      - use a spinlock instead of HW lock in PTP clock ops
      - support 5 layer Tx scheduler topology
    - nVidia/Mellanox:
      - 800G link modes and 100G SerDes speeds
      - per-queue IRQ coalescing configuration
    - Marvell Octeon:
      - support offloading TC packet mark action
 
  - Ethernet NICs consumer, embedded and virtual:
    - stop lying about skb->truesize in USB Ethernet drivers, it messes up
      TCP memory calculations
    - Google cloud vNIC:
      - support changing ring size via ethtool
      - support ring reset using the queue control API
    - VirtIO net:
      - expose flow hash from RSS to XDP
      - per-queue statistics
      - add selftests
    - Synopsys (stmmac):
      - support controllers which require an RX clock signal from the MII
        bus to perform their hardware initialization
    - TI:
      - icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
      - icssg_prueth: add SW TX / RX Coalescing based on hrtimers
      - cpsw: minimal XDP support
    - Renesas (ravb):
      - support describing the MDIO bus
    - Realtek (r8169):
      - add support for RTL8168M
    - Microchip Sparx5:
      - matchall and flower actions mirred and redirect
 
  - Ethernet switches:
    - nVidia/Mellanox:
      - improve events processing performance
    - Marvell:
      - add support for MV88E6250 family internal PHYs
    - Microchip:
      - add DCB and DSCP mapping support for KSZ switches
      - vsc73xx: convert to PHYLINK
    - Realtek:
      - rtl8226b/rtl8221b: add C45 instances and SerDes switching
 
  - Many driver changes related to PHYLIB and PHYLINK deprecated API cleanup.
 
  - Ethernet PHYs:
    - Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
    - micrel: lan8814: add support for PPS out and external timestamp trigger
 
  - WiFi:
    - Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices drivers.
      Modern devices can only be configured using nl80211.
    - mac80211/cfg80211
      - handle color change per link for WiFi 7 Multi-Link Operation
    - Intel (iwlwifi):
      - don't support puncturing in 5 GHz
      - support monitor mode on passive channels
      - BZ-W device support
      - P2P with HE/EHT support
      - re-add support for firmware API 90
      - provide channel survey information for Automatic Channel Selection
    - MediaTek (mt76):
      - mt7921 LED control
      - mt7925 EHT radiotap support
      - mt7920e PCI support
    - Qualcomm (ath11k):
      - P2P support for QCA6390, WCN6855 and QCA2066
      - support hibernation
      - ieee80211-freq-limit Device Tree property support
    - Qualcomm (ath12k):
      - refactoring in preparation of multi-link support
      - suspend and hibernation support
      - ACPI support
      - debugfs support, including dfs_simulate_radar support
    - RealTek:
      - rtw88: RTL8723CS SDIO device support
      - rtw89: RTL8922AE Wi-Fi 7 PCI device support
      - rtw89: complete features of new WiFi 7 chip 8922AE including
        BT-coexistence and Wake-on-WLAN
      - rtw89: use BIOS ACPI settings to set TX power and channels
      - rtl8xxxu: enable Management Frame Protection (MFP) support
 
  - Bluetooth:
    - support for Intel BlazarI and Filmore Peak2 (BE201)
    - support for MediaTek MT7921S SDIO
    - initial support for Intel PCIe BT driver
    - remove HCI_AMP support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZD6sQACgkQMUZtbf5S
 IrtLYw/+I73ePGIye37o2jpbodcLAUZVfF3r6uYUzK8hokEcKD0QVJa9w7PizLZ3
 UO45ClOXFLJCkfP4reFenLfxGCel2AJI+F7VFl2xaO2XgrcH/lnVrHqKZEAEXjls
 KoYMnShIolv7h2MKP6hHtyTi2j1wvQUKsZC71o9/fuW+4fUT8gECx1YtYcL73wrw
 gEMdlUgBYC3jiiCUHJIFX6iPJ2t/TC+q1eIIF2K/Osrk2kIqQhzoozcL4vpuAZQT
 99ljx/qRelXa8oppDb7nM5eulg7WY8ZqxEfFZphTMC5nLEGzClxuOTTl2kDYI/D/
 UZmTWZDY+F5F0xvNk2gH84qVJXBOVDoobpT7hVA/tDuybobc/kvGDzRayEVqVzKj
 Q0tPlJs+xBZpkK5TVnxaFLJVOM+p1Xosxy3kNVXmuYNBvT/R89UbJiCrUKqKZF+L
 z/1mOYUv8UklHqYAeuJSptHvqJjTGa/fsEYP7dAUBbc1N2eVB8mzZ4mgU5rYXbtC
 E6UXXiWnoSRm8bmco9QmcWWoXt5UGEizHSJLz6t1R5Df/YmXhWlytll5aCwY1ksf
 FNoL7S4u7AZThL1Nwi7yUs4CAjhk/N4aOsk+41S0sALCx30BJuI6UdesAxJ0lu+Z
 fwCQYbs27y4p7mBLbkYwcQNxAxGm7PSK4yeyRIy2njiyV4qnLf8=
 =EsC2
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Complete rework of garbage collection of AF_UNIX sockets.

     AF_UNIX is prone to forming reference count cycles due to fd
     passing functionality. New method based on Tarjan's Strongly
     Connected Components algorithm should be both faster and remove a
     lot of workarounds we accumulated over the years.

   - Add TCP fraglist GRO support, allowing chaining multiple TCP
     packets and forwarding them together. Useful for small switches /
     routers which lack basic checksum offload in some scenarios (e.g.
     PPPoE).

   - Support using SMP threads for handling packet backlog i.e. packet
     processing from software interfaces and old drivers which don't use
     NAPI. This helps move the processing out of the softirq jumble.

   - Continue work of converting from rtnl lock to RCU protection.

     Don't require rtnl lock when reading: IPv6 routing FIB, IPv6
     address labels, netdev threaded NAPI sysfs files, bonding driver's
     sysfs files, MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics,
     TC Qdiscs, neighbor entries, ARP entries via ioctl(SIOCGARP), a lot
     of the link information available via rtnetlink.

   - Small optimizations from Eric to UDP wake up handling, memory
     accounting, RPS/RFS implementation, TCP packet sizing etc.

   - Allow direct page recycling in the bulk API used by XDP, for +2%
     PPS.

   - Support peek with an offset on TCP sockets.

   - Add MPTCP APIs for querying last time packets were received/sent/acked
     and whether MPTCP "upgrade" succeeded on a TCP socket.

   - Add intra-node communication shortcut to improve SMC performance.

   - Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol
     driver.

   - Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.

   - Add reset reasons for tracing what caused a TCP reset to be sent.

   - Introduce direction attribute for xfrm (IPSec) states. State can be
     used either for input or output packet processing.

  Things we sprinkled into general kernel code:

   - Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().

     This required touch-ups and renaming of a few existing users.

   - Add Endian-dependent __counted_by_{le,be} annotations.

   - Make building selftests "quieter" by printing summaries like
     "CC object.o" rather than full commands with all the arguments.

  Netfilter:

   - Use GFP_KERNEL to clone elements, to deal better with OOM
     situations and avoid failures in the .commit step.

  BPF:

   - Add eBPF JIT for ARCv2 CPUs.

   - Support attaching kprobe BPF programs through kprobe_multi link in
     a session mode, meaning, a BPF program is attached to both function
     entry and return, the entry program can decide if the return
     program gets executed and the entry program can share u64 cookie
     value with return program. "Session mode" is a common use-case for
     tetragon and bpftrace.

   - Add the ability to specify and retrieve BPF cookie for raw
     tracepoint programs in order to ease migration from classic to raw
     tracepoints.

   - Add an internal-only BPF per-CPU instruction for resolving per-CPU
     memory addresses and implement support in x86, ARM64 and RISC-V
     JITs. This allows inlining functions which need to access per-CPU
     state.

   - Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
     atomics in bpf_arena which can be JITed as a single x86
     instruction. Support BPF arena on ARM64.

   - Add a new bpf_wq API for deferring events and refactor
     process-context bpf_timer code to keep common code where possible.

   - Harden the BPF verifier's and/or/xor value tracking.

   - Introduce crypto kfuncs to let BPF programs call kernel crypto
     APIs.

   - Support bpf_tail_call_static() helper for BPF programs with GCC 13.

   - Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
     program to have code sections where preemption is disabled.

  Driver API:

   - Skip software TC processing completely if all installed rules are
     marked as HW-only, instead of checking the HW-only flag rule by
     rule.

   - Add support for configuring PoE (Power over Ethernet), similar to
     the already existing support for PoDL (Power over Data Line)
     config.

   - Initial bits of a queue control API, for now allowing a single
     queue to be reset without disturbing packet flow to other queues.

   - Common (ethtool) statistics for hardware timestamping.

  Tests and tooling:

   - Remove the need to create a config file to run the net forwarding
     tests so that a naive "make run_tests" can exercise them.

   - Define a method of writing tests which require an external endpoint
     to communicate with (to send/receive data towards the test
     machine). Add a few such tests.

   - Create a shared code library for writing Python tests. Expose the
     YAML Netlink library from tools/ to the tests for easy Netlink
     access.

   - Move netfilter tests under net/, extend them, separate performance
     tests from correctness tests, and iron out issues found by running
     them "on every commit".

   - Refactor BPF selftests to use common network helpers.

   - Further work filling in YAML definitions of Netlink messages for:
     nftables, team driver, bonding interfaces, vlan interfaces, VF
     info, TC u32 mark, TC police action.

   - Teach Python YAML Netlink to decode attribute policies.

   - Extend the definition of the "indexed array" construct in the specs
     to cover arrays of scalars rather than just nests.

   - Add hyperlinks between definitions in generated Netlink docs.

  Drivers:

   - Make sure unsupported flower control flags are rejected by drivers,
     and make more drivers report errors directly to the application
     rather than dmesg (large number of driver changes from Asbjørn
     Sloth Tønnesen).

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - support multiple RSS contexts and steering traffic to them
         - support XDP metadata
         - make page pool allocations more NUMA aware
      - Intel (100G, ice, idpf):
         - extract datapath code common among Intel drivers into a library
         - use fewer resources in switchdev by sharing queues with the PF
         - add PFCP filter support
         - add Ethernet filter support
         - use a spinlock instead of HW lock in PTP clock ops
         - support 5 layer Tx scheduler topology
      - nVidia/Mellanox:
         - 800G link modes and 100G SerDes speeds
         - per-queue IRQ coalescing configuration
      - Marvell Octeon:
         - support offloading TC packet mark action

   - Ethernet NICs consumer, embedded and virtual:
      - stop lying about skb->truesize in USB Ethernet drivers, it
        messes up TCP memory calculations
      - Google cloud vNIC:
         - support changing ring size via ethtool
         - support ring reset using the queue control API
      - VirtIO net:
         - expose flow hash from RSS to XDP
         - per-queue statistics
         - add selftests
      - Synopsys (stmmac):
         - support controllers which require an RX clock signal from the
           MII bus to perform their hardware initialization
      - TI:
         - icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
         - icssg_prueth: add SW TX / RX Coalescing based on hrtimers
         - cpsw: minimal XDP support
      - Renesas (ravb):
         - support describing the MDIO bus
      - Realtek (r8169):
         - add support for RTL8168M
      - Microchip Sparx5:
         - matchall and flower actions mirred and redirect

   - Ethernet switches:
      - nVidia/Mellanox:
         - improve events processing performance
      - Marvell:
         - add support for MV88E6250 family internal PHYs
      - Microchip:
         - add DCB and DSCP mapping support for KSZ switches
         - vsc73xx: convert to PHYLINK
      - Realtek:
         - rtl8226b/rtl8221b: add C45 instances and SerDes switching

   - Many driver changes related to PHYLIB and PHYLINK deprecated API
     cleanup

   - Ethernet PHYs:
      - Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
      - micrel: lan8814: add support for PPS out and external timestamp trigger

   - WiFi:
      - Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices
        drivers. Modern devices can only be configured using nl80211.
      - mac80211/cfg80211
         - handle color change per link for WiFi 7 Multi-Link Operation
      - Intel (iwlwifi):
         - don't support puncturing in 5 GHz
         - support monitor mode on passive channels
         - BZ-W device support
         - P2P with HE/EHT support
         - re-add support for firmware API 90
         - provide channel survey information for Automatic Channel Selection
      - MediaTek (mt76):
         - mt7921 LED control
         - mt7925 EHT radiotap support
         - mt7920e PCI support
      - Qualcomm (ath11k):
         - P2P support for QCA6390, WCN6855 and QCA2066
         - support hibernation
         - ieee80211-freq-limit Device Tree property support
      - Qualcomm (ath12k):
         - refactoring in preparation of multi-link support
         - suspend and hibernation support
         - ACPI support
         - debugfs support, including dfs_simulate_radar support
      - RealTek:
         - rtw88: RTL8723CS SDIO device support
         - rtw89: RTL8922AE Wi-Fi 7 PCI device support
         - rtw89: complete features of new WiFi 7 chip 8922AE including
           BT-coexistence and Wake-on-WLAN
         - rtw89: use BIOS ACPI settings to set TX power and channels
         - rtl8xxxu: enable Management Frame Protection (MFP) support

   - Bluetooth:
      - support for Intel BlazarI and Filmore Peak2 (BE201)
      - support for MediaTek MT7921S SDIO
      - initial support for Intel PCIe BT driver
      - remove HCI_AMP support"

* tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1827 commits)
  selftests: netfilter: fix packetdrill conntrack testcase
  net: gro: fix napi_gro_cb zeroed alignment
  Bluetooth: btintel_pcie: Refactor and code cleanup
  Bluetooth: btintel_pcie: Fix warning reported by sparse
  Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1
  Bluetooth: btintel: Fix compiler warning for multi_v7_defconfig config
  Bluetooth: btintel_pcie: Fix compiler warnings
  Bluetooth: btintel_pcie: Add *setup* function to download firmware
  Bluetooth: btintel_pcie: Add support for PCIe transport
  Bluetooth: btintel: Export few static functions
  Bluetooth: HCI: Remove HCI_AMP support
  Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init()
  Bluetooth: qca: Fix error code in qca_read_fw_build_info()
  Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warning
  Bluetooth: btintel: Add support for Filmore Peak2 (BE201)
  Bluetooth: btintel: Add support for BlazarI
  LE Create Connection command timeout increased to 20 secs
  dt-bindings: net: bluetooth: Add MediaTek MT7921S SDIO Bluetooth
  Bluetooth: compute LE flow credits based on recvbuf space
  Bluetooth: hci_sync: Use cmd->num_cis instead of magic number
  ...
2024-05-14 19:42:24 -07:00
Linus Torvalds
4b768bf062 linux_kselftest-nolibc-6.10-rc1
This nolibc update for Linux 6.10-rc1
 
 - adds support for uname(2)
 - removes open-coded strnlen()
 - exports strlen()
 - adds tests for strlcat() and strlcpy()
 - fixes memory error in realloc()
 - fixes strlcat() return code and size usage
 - fixes strlcpy() return code and size usage
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmZCKZ8ACgkQCwJExA0N
 QxznfRAA17AXswINyhQEu/2lJIFFpQazZqMyvZ4mv19FPI8P3zhwPN24XsmLNqdo
 py6dUIbondrs7UgOOG0R0E8N89rxDctok6F8OEScsnfvd+KKEg3+YJscJj2X1vaK
 H3qU+i/K3oGFHJo2dsaR1iYl+lzo9dyzvicOe5bFzFkVQGI0a3p88CQrhoF3HkNf
 jc9oqnxIGJn3v3UX6jxFWFiuF17CopNv4/5DZG2QIU5t0JGiqEYxl5ayDwqQsgGB
 E88eiutsuhkVTqKI/9VXs+YIE0B0xbEQfGGSQ1zzGZKfy7MBSk88PzLXa591DFZH
 SXETzfxqbI4ACX7L36mUX/JUw/sC5ch67UABZqwrj8oxXb9Ssgxfzmt7rCwBlp2i
 GdYHuoSDY5TJnskceL5a94BLuvhI9EnYsMHKA2aHM95eBBkPZo7jGqqOU8hJbL4t
 MbIu0z3iCNUBFcNSBXdfv81Dc4CJO4Orj1jnMqe0U6dSasT+hS2ulZPu6EjKIfMY
 Wqgfs89pFQ4OH3e3Crx2ngz57iL2fDYZfccfjOJQtYfxTOmXRKHjcDXcZdydsxNF
 6dXNDhtuyftvNf3h4o7B73Rv3xNbG1ObZmU2i4wxmY2fsYEsRhqR/UKhtqbTPsxF
 C/zL8KSraGzrgYuyOzQMDWAH/ybhZytn91kEd1QBWMO79TCC2JI=
 =2QlZ
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-nolibc-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull nolibc updates from Shuah Khan:

 - add support for uname(2)

 - remove open-coded strnlen()

 - export strlen()

 - add tests for strlcat() and strlcpy()

 - fix memory error in realloc()

 - fix strlcat() and strlcpy() return code and size usage

* tag 'linux_kselftest-nolibc-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  tools/nolibc: add support for uname(2)
  tools/nolibc/string: remove open-coded strnlen()
  selftests/nolibc: Add tests for strlcat() and strlcpy()
  tools/nolibc: Fix strlcpy() return code and size usage
  tools/nolibc: Fix strlcat() return code and size usage
  tools/nolibc/string: export strlen()
  tools/nolibc/stdlib: fix memory error in realloc()
2024-05-14 11:31:09 -07:00
Jakub Kicinski
6e62702feb bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZkGcZAAKCRDbK58LschI
 g6o6APwLsqhrM2w71VUN5ciCxu4H5VDtZp6wkdqtVbxxU4qNxQEApKgYgKt8ZLF3
 Kily5c7m+S4ZXhMX21rb8JhSAz0dfQk=
 =5Dk7
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-05-13

We've added 119 non-merge commits during the last 14 day(s) which contain
a total of 134 files changed, 9462 insertions(+), 4742 deletions(-).

The main changes are:

1) Add BPF JIT support for 32-bit ARCv2 processors, from Shahab Vahedi.

2) Add BPF range computation improvements to the verifier in particular
   around XOR and OR operators, refactoring of checks for range computation
   and relaxing MUL range computation so that src_reg can also be an unknown
   scalar, from Cupertino Miranda.

3) Add support to attach kprobe BPF programs through kprobe_multi link in
   a session mode, meaning, a BPF program is attached to both function entry
   and return, the entry program can decide if the return program gets
   executed and the entry program can share u64 cookie value with return
   program. Session mode is a common use-case for tetragon and bpftrace,
   from Jiri Olsa.

4) Fix a potential overflow in libbpf's ring__consume_n() and improve libbpf
   as well as BPF selftest's struct_ops handling, from Andrii Nakryiko.

5) Improvements to BPF selftests in context of BPF gcc backend,
   from Jose E. Marchesi & David Faust.

6) Migrate remaining BPF selftest tests from test_sock_addr.c to prog_test-
   -style in order to retire the old test, run it in BPF CI and additionally
   expand test coverage, from Jordan Rife.

7) Big batch for BPF selftest refactoring in order to remove duplicate code
   around common network helpers, from Geliang Tang.

8) Another batch of improvements to BPF selftests to retire obsolete
   bpf_tcp_helpers.h as everything is available vmlinux.h,
   from Martin KaFai Lau.

9) Fix BPF map tear-down to not walk the map twice on free when both timer
   and wq is used, from Benjamin Tissoires.

10) Fix BPF verifier assumptions about socket->sk that it can be non-NULL,
    from Alexei Starovoitov.

11) Change BTF build scripts to using --btf_features for pahole v1.26+,
    from Alan Maguire.

12) Small improvements to BPF reusing struct_size() and krealloc_array(),
    from Andy Shevchenko.

13) Fix s390 JIT to emit a barrier for BPF_FETCH instructions,
    from Ilya Leoshkevich.

14) Extend TCP ->cong_control() callback in order to feed in ack and
    flag parameters and allow write-access to tp->snd_cwnd_stamp
    from BPF program, from Miao Xu.

15) Add support for internal-only per-CPU instructions to inline
    bpf_get_smp_processor_id() helper call for arm64 and riscv64 BPF JITs,
    from Puranjay Mohan.

16) Follow-up to remove the redundant ethtool.h from tooling infrastructure,
    from Tushar Vyavahare.

17) Extend libbpf to support "module:<function>" syntax for tracing
    programs, from Viktor Malik.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
  bpf: make list_for_each_entry portable
  bpf: ignore expected GCC warning in test_global_func10.c
  bpf: disable strict aliasing in test_global_func9.c
  selftests/bpf: Free strdup memory in xdp_hw_metadata
  selftests/bpf: Fix a few tests for GCC related warnings.
  bpf: avoid gcc overflow warning in test_xdp_vlan.c
  tools: remove redundant ethtool.h from tooling infra
  selftests/bpf: Expand ATTACH_REJECT tests
  selftests/bpf: Expand getsockname and getpeername tests
  sefltests/bpf: Expand sockaddr hook deny tests
  selftests/bpf: Expand sockaddr program return value tests
  selftests/bpf: Retire test_sock_addr.(c|sh)
  selftests/bpf: Remove redundant sendmsg test cases
  selftests/bpf: Migrate ATTACH_REJECT test cases
  selftests/bpf: Migrate expected_attach_type tests
  selftests/bpf: Migrate wildcard destination rewrite test
  selftests/bpf: Migrate sendmsg6 v4 mapped address tests
  selftests/bpf: Migrate sendmsg deny test cases
  selftests/bpf: Migrate WILDCARD_IP test
  selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases
  ...
====================

Link: https://lore.kernel.org/r/20240513134114.17575-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 16:41:10 -07:00
Daniel Jurgens
b56035101e netdev: Add queue stats for TX stop and wake
TX queue stop and wake are counted by some drivers.
Support reporting these via netdev-genl queue stats.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240510201927.1821109-2-danielj@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:58:36 -07:00
Tushar Vyavahare
bbe91a9f68 tools: remove redundant ethtool.h from tooling infra
Remove the redundant ethtool.h header file from tools/include/uapi/linux.
The file is unnecessary as the system uses the kernel's
include/uapi/linux/ethtool.h directly.

Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240508104123.434769-1-tushar.vyavahare@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-12 17:17:55 -07:00
Arnaldo Carvalho de Melo
bbed8b9ffe tools lib rbtree: pick some improvements from the kernel rbtree code
The tools/lib/rbtree.c code came from the kernel.  Remove the
EXPORT_SYMBOL() that make sense only there.  Unfortunately it is not being
checked with tools/perf/check_headers.sh.  Will try to remedy this.  Until
then pick the improvements from:

  b0687c1119 ("lib/rbtree: use '+' instead of '|' for setting color.")

That I noticed by doing:

  diff -u tools/lib/rbtree.c lib/rbtree.c
  diff -u tools/include/linux/rbtree_augmented.h include/linux/rbtree_augmented.h

There is one other cases, but lets pick it in separate patches.

Link: https://lkml.kernel.org/r/ZigZzeFoukzRKG1Q@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Noah Goldstein <goldstein.w.n@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-08 08:41:27 -07:00
Joel Stanley
651d61bc8b KVM: PPC: Fix documentation for ppc mmu caps
The documentation mentions KVM_CAP_PPC_RADIX_MMU, but the defines in the
kvm headers spell it KVM_CAP_PPC_MMU_RADIX. Similarly with
KVM_CAP_PPC_MMU_HASH_V3.

Fixes: c927013227 ("KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230411061446.26324-1-joel@jms.id.au
2024-05-08 01:28:00 +10:00
Thorsten Blum
9c313ccdfc
bitops: Change function return types from long to int
Change the return types of bitops functions (ffs, fls, and fns) from
long to int. The expected return values are in the range [0, 64], for
which int is sufficient.

Additionally, int aligns well with the return types of the corresponding
__builtin_* functions, potentially reducing overall type conversions.

Many of the existing bitops functions already return an int and don't
need to be changed. The bitops functions in arch/ should be considered
separately.

Adjust some return variables to match the function return types.

With GCC 13 and defconfig, these changes reduced the size of a test
kernel image by 5,432 bytes on arm64 and by 248 bytes on riscv; there
were no changes in size on x86_64, powerpc, or m68k.

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-05-03 17:04:50 +02:00
Jiri Olsa
535a3692ba bpf: Add support for kprobe session attach
Adding support to attach bpf program for entry and return probe
of the same function. This is common use case which at the moment
requires to create two kprobe multi links.

Adding new BPF_TRACE_KPROBE_SESSION attach type that instructs
kernel to attach single link program to both entry and exit probe.

It's possible to control execution of the bpf program on return
probe simply by returning zero or non zero from the entry bpf
program execution to execute or not the bpf program on return
probe respectively.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240430112830.1184228-2-jolsa@kernel.org
2024-04-30 09:45:53 -07:00
Xuan Zhuo
0cfe71f45f netdev: add queue stats
These stats are commonly. Support reporting those via netdev-genl queue
stats.

name: rx-hw-drops
name: rx-hw-drop-overruns
name: rx-csum-unnecessary
name: rx-csum-none
name: rx-csum-bad
name: rx-hw-gro-packets
name: rx-hw-gro-bytes
name: rx-hw-gro-wire-packets
name: rx-hw-gro-wire-bytes
name: rx-hw-drop-ratelimits
name: tx-hw-drops
name: tx-hw-drop-errors
name: tx-csum-none
name: tx-needs-csum
name: tx-hw-gso-packets
name: tx-hw-gso-bytes
name: tx-hw-gso-wire-packets
name: tx-hw-gso-wire-bytes
name: tx-hw-drop-ratelimits

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30 10:51:33 +02:00
Jakub Kicinski
89de2db193 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZi9+AAAKCRDbK58LschI
 g0nEAP487m7L0nLVriC2oIOWsi29tklW3etm6DO7gmGRGIHgrgEAnMyV1xBj3bGj
 v6jJwDcybCym1hLx+1x1JCZ4eoAFswE=
 =xbna
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-04-29

We've added 147 non-merge commits during the last 32 day(s) which contain
a total of 158 files changed, 9400 insertions(+), 2213 deletions(-).

The main changes are:

1) Add an internal-only BPF per-CPU instruction for resolving per-CPU
   memory addresses and implement support in x86 BPF JIT. This allows
   inlining per-CPU array and hashmap lookups
   and the bpf_get_smp_processor_id() helper, from Andrii Nakryiko.

2) Add BPF link support for sk_msg and sk_skb programs, from Yonghong Song.

3) Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
   atomics in bpf_arena which can be JITed as a single x86 instruction,
   from Alexei Starovoitov.

4) Add support for passing mark with bpf_fib_lookup helper,
   from Anton Protopopov.

5) Add a new bpf_wq API for deferring events and refactor sleepable
   bpf_timer code to keep common code where possible,
   from Benjamin Tissoires.

6) Fix BPF_PROG_TEST_RUN infra with regards to bpf_dummy_struct_ops programs
   to check when NULL is passed for non-NULLable parameters,
   from Eduard Zingerman.

7) Harden the BPF verifier's and/or/xor value tracking,
   from Harishankar Vishwanathan.

8) Introduce crypto kfuncs to make BPF programs able to utilize the kernel
   crypto subsystem, from Vadim Fedorenko.

9) Various improvements to the BPF instruction set standardization doc,
   from Dave Thaler.

10) Extend libbpf APIs to partially consume items from the BPF ringbuffer,
    from Andrea Righi.

11) Bigger batch of BPF selftests refactoring to use common network helpers
    and to drop duplicate code, from Geliang Tang.

12) Support bpf_tail_call_static() helper for BPF programs with GCC 13,
    from Jose E. Marchesi.

13) Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
    program to have code sections where preemption is disabled,
    from Kumar Kartikeya Dwivedi.

14) Allow invoking BPF kfuncs from BPF_PROG_TYPE_SYSCALL programs,
    from David Vernet.

15) Extend the BPF verifier to allow different input maps for a given
    bpf_for_each_map_elem() helper call in a BPF program, from Philo Lu.

16) Add support for PROBE_MEM32 and bpf_addr_space_cast instructions
    for riscv64 and arm64 JITs to enable BPF Arena, from Puranjay Mohan.

17) Shut up a false-positive KMSAN splat in interpreter mode by unpoison
    the stack memory, from Martin KaFai Lau.

18) Improve xsk selftest coverage with new tests on maximum and minimum
    hardware ring size configurations, from Tushar Vyavahare.

19) Various ReST man pages fixes as well as documentation and bash completion
    improvements for bpftool, from Rameez Rehman & Quentin Monnet.

20) Fix libbpf with regards to dumping subsequent char arrays,
    from Quentin Deslandes.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (147 commits)
  bpf, docs: Clarify PC use in instruction-set.rst
  bpf_helpers.h: Define bpf_tail_call_static when building with GCC
  bpf, docs: Add introduction for use in the ISA Internet Draft
  selftests/bpf: extend BPF_SOCK_OPS_RTT_CB test for srtt and mrtt_us
  bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args
  selftests/bpf: dummy_st_ops should reject 0 for non-nullable params
  bpf: check bpf_dummy_struct_ops program params for test runs
  selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops
  selftests/bpf: adjust dummy_st_ops_success to detect additional error
  bpf: mark bpf_dummy_struct_ops.test_1 parameter as nullable
  selftests/bpf: Add ring_buffer__consume_n test.
  bpf: Add bpf_guard_preempt() convenience macro
  selftests: bpf: crypto: add benchmark for crypto functions
  selftests: bpf: crypto skcipher algo selftests
  bpf: crypto: add skcipher to bpf crypto
  bpf: make common crypto API for TC/XDP programs
  bpf: update the comment for BTF_FIELDS_MAX
  selftests/bpf: Fix wq test.
  selftests/bpf: Use make_sockaddr in test_sock_addr
  selftests/bpf: Use connect_to_addr in test_sock_addr
  ...
====================

Link: https://lore.kernel.org/r/20240429131657.19423-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-29 13:12:19 -07:00
Arnaldo Carvalho de Melo
450f941ea9 tools headers: Synchronize linux/bits.h with the kernel sources
To pick up the changes in this cset:

   3c7a8e190b ("uapi: introduce uapi-friendly macros for GENMASK")

That just causes perf to rebuild. Its just some macros going to an uapi
header that we now have to grab a copy into tools/ as well.

This addresses this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/linux/bits.h include/linux/bits.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/lkml/ZiwJsFOBez0MS4r9@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:13:10 -03:00
Arnaldo Carvalho de Melo
cd88c11c6d tools lib rbtree: Pick some improvements from the kernel rbtree code
The tools/lib/rbtree.c code came from the kernel, removing the
EXPORT_SYMBOL() that make sense only there, unfortunately it is not
being checked with tools/perf/check_headers.sh, will try to remedy this,
till then pick the improvements from:

  b0687c1119 ("lib/rbtree: use '+' instead of '|' for setting color.")

That I noticed by doing:

  diff -u tools/lib/rbtree.c lib/rbtree.c
  diff -u tools/include/linux/rbtree_augmented.h include/linux/rbtree_augmented.h

There is one other cases, but lets pick it in separate patches.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Noah Goldstein <goldstein.w.n@gmail.com>
Link: https://lore.kernel.org/lkml/ZigZzeFoukzRKG1Q@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:19 -03:00
John Hubbard
580ea358af selftests/mm: fix additional build errors for selftests
These build errors only occur if one fails to first run "make headers". 
However, that is a non-obvious and instrusive requirement, and so there
was a discussion on how to get rid of it [1].  This uses that solution.

These two files were created by taking a snapshot of the generated header
files that are created via "make headers".  These two files were copied
from ./usr/include/linux/ to ./tools/include/uapi/linux/ .

That fixes the selftests/mm build on today's Arch Linux (which required
the userfaultfd.h) and Ubuntu 23.04 (which additionally required memfd.h).

[1] https://lore.kernel.org/all/783a4178-1dec-4e30-989a-5174b8176b09@redhat.com/

Link: https://lkml.kernel.org/r/20240328033418.203790-3-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:42 -07:00
Philo Lu
48e2cd3e3d bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args
Two important arguments in RTT estimation, mrtt and srtt, are passed to
tcp_bpf_rtt(), so that bpf programs get more information about RTT
computation in BPF_SOCK_OPS_RTT_CB.

The difference between bpf_sock_ops->srtt_us and the srtt here is: the
former is an old rtt before update, while srtt passed by tcp_bpf_rtt()
is that after update.

Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240425161724.73707-2-lulie@linux.alibaba.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-25 14:09:05 -07:00
Jakub Kicinski
2bd87951de Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/ti/icssg/icssg_prueth.c

net/mac80211/chan.c
  89884459a0 ("wifi: mac80211: fix idle calculation with multi-link")
  87f5500285 ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()")
https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/

net/unix/garbage.c
  1971d13ffa ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().")
  4090fa373f ("af_unix: Replace garbage collection algorithm.")

drivers/net/ethernet/ti/icssg/icssg_prueth.c
drivers/net/ethernet/ti/icssg/icssg_common.c
  4dcd0e83ea ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()")
  e2dc7bfd67 ("net: ti: icssg-prueth: Move common functions into a separate file")

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-25 12:41:37 -07:00
Benjamin Tissoires
f1d0a2fbb0 tools: sync include/uapi/linux/bpf.h
cp include/uapi/linux/bpf.h tools/include/uapi/linux/bpf.h

Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-6-6c986a5a741f@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23 18:31:24 -07:00
Arnaldo Carvalho de Melo
173b0b5b0e Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up fixes sent via perf-tools, by Namhyung Kim.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-22 13:35:18 -03:00
Linus Torvalds
13a2e429f6 perf tools fixes for v6.9
A random set of small bug fixes:
 
  * Fix perf annotate TUI when used with data type profiling
  * Work around BPF verifier about sighand lock checking
 
 And a set of kernel header synchronization.
 
 Signed-off-by: Namhyung Kim <namhyung@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZiLYdwAKCRCMstVUGiXM
 g6kDAP9GYdWqmbYZhKgQQQwAVkgkB0j19vLu+6xphi6P3ul7pwD+Nr5mHEcqytU/
 I1r0lrV/ybNmpUK3wFyRsyvRmnxe/Qw=
 =/181
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.9-2024-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Namhyung Kim:
 "A random set of small bug fixes:

   - Fix perf annotate TUI when used with data type profiling

   - Work around BPF verifier about sighand lock checking

  And a set of kernel header synchronization"

* tag 'perf-tools-fixes-for-v6.9-2024-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  tools/include: Sync arm64 asm/cputype.h with the kernel sources
  tools/include: Sync asm-generic/bitops/fls.h with the kernel sources
  tools/include: Sync x86 asm/msr-index.h with the kernel sources
  tools/include: Sync x86 asm/irq_vectors.h with the kernel sources
  tools/include: Sync x86 CPU feature headers with the kernel sources
  tools/include: Sync uapi/sound/asound.h with the kernel sources
  tools/include: Sync uapi/linux/kvm.h and asm/kvm.h with the kernel sources
  tools/include: Sync uapi/linux/fs.h with the kernel sources
  tools/include: Sync uapi/drm/i915_drm.h with the kernel sources
  perf lock contention: Add a missing NULL check
  perf annotate: Make sure to call symbol__annotate2() in TUI
2024-04-19 16:34:10 -07:00
Arnaldo Carvalho de Melo
61ba075d99 Revert "tools headers: Remove almost unused copy of uapi/stat.h, add few conditional defines"
This reverts commit a672af9139.

By now it is not used for building tools/perf, but Stephen Rothwell
reported that when building on a O= directory that had been built with
torvalds/master and this perf build command line:

  $ make -C tools/perf -f Makefile.perf -s -O -j60 O=/home/sfr/next/perf NO_BPF_SKEL=1

If we then merge perf-tools-next, as he did for linux-next, then we end
up with a build failure for libbpf:

    PERF_VERSION = 6.9.rc3.g42c4635c8dee
  make[3]: *** No rule to make target '/home/sfr/next/next/tools/include/uapi/linux/stat.h', needed by '/home/sfr/next/perf/libbpf/staticobjs/libbpf.o'.  Stop.
  make[2]: *** [Makefile:157: /home/sfr/next/perf/libbpf/staticobjs/libbpf-in.o] Error 2
  make[1]: *** [Makefile.perf:892: /home/sfr/next/perf/libbpf/libbpf.a] Error 2
  make[1]: *** Waiting for unfinished jobs....
  make: *** [Makefile.perf:264: sub-make] Error 2

This needs to be further investigated to figure out how to check if
libbpf really needs something that is in that
tools/include/uapi/linux/stat.h file and if not to remove that file in a
way that we don't break the build in any situation, to avoid requiring
doing a 'make clean'.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # PowerPC le incermental build
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/20240413124340.4d48c6d8@canb.auug.org.au
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-18 22:22:52 -03:00
Thomas Weißschuh
0adab2b6b7 tools/nolibc: add support for uname(2)
All supported kernels are assumed to use struct new_utsname.
This is validated in test_uname().

uname(2) can for example be used in ksft_min_kernel_version() from the
kernels selftest framework.

Link: https://lore.kernel.org/lkml/20240412123536.GA32444@redhat.com/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
2024-04-14 20:28:54 +02:00
Jakub Kicinski
94426ed213 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/unix/garbage.c
  47d8ac011f ("af_unix: Fix garbage collector racing against connect()")
  4090fa373f ("af_unix: Replace garbage collection algorithm.")

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  faa12ca245 ("bnxt_en: Reset PTP tx_avail after possible firmware reset")
  b3d0083caf ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()")

drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
  7ac10c7d72 ("bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()")
  194fad5b27 ("bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functions")

drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
  958f56e483 ("net/mlx5e: Un-expose functions in en.h")
  49e6c93870 ("net/mlx5e: RSS, Block XOR hash with over 128 channels")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11 14:23:47 -07:00
Namhyung Kim
99e4e1174a tools/include: Sync asm-generic/bitops/fls.h with the kernel sources
To pick up the changes from:

  cb4ede9261 ("riscv: Avoid code duplication with generic bitops implementation")

This should address these tools/perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/asm-generic/bitops/__fls.h include/asm-generic/bitops/__fls.h
    diff -u tools/include/asm-generic/bitops/fls.h include/asm-generic/bitops/fls.h

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240408185520.1550865-9-namhyung@kernel.org
2024-04-11 10:38:29 -07:00
Namhyung Kim
b7ce17f257 tools/include: Sync uapi/sound/asound.h with the kernel sources
To pick up the changes from:

  85df6b5a66 ("ALSA: pcm: clarify and fix default msbits value for all formats")

This should be used to beautify sound syscall arguments and it addresses
these tools/perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h

Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: linux-sound@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240408185520.1550865-5-namhyung@kernel.org
2024-04-11 10:38:28 -07:00
Namhyung Kim
bee3b820c6 tools/include: Sync uapi/linux/kvm.h and asm/kvm.h with the kernel sources
To pick up the changes from:

  6bda055d62 ("KVM: define __KVM_HAVE_GUEST_DEBUG unconditionally")
  5d9cb71642 ("KVM: arm64: move ARM-specific defines to uapi/asm/kvm.h")
  71cd774ad2 ("KVM: s390: move s390-specific structs to uapi/asm/kvm.h")
  d750951c9e ("KVM: powerpc: move powerpc-specific structs to uapi/asm/kvm.h")
  bcac047727 ("KVM: x86: move x86-specific structs to uapi/asm/kvm.h")
  c0a411904e ("KVM: remove more traces of device assignment UAPI")
  f3c80061c0 ("KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP")

That should be used to beautify the KVM arguments and it addresses these
tools/perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
    diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
    diff -u tools/arch/powerpc/include/uapi/asm/kvm.h arch/powerpc/include/uapi/asm/kvm.h
    diff -u tools/arch/s390/include/uapi/asm/kvm.h arch/s390/include/uapi/asm/kvm.h
    diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240408185520.1550865-4-namhyung@kernel.org
2024-04-11 10:38:28 -07:00
Namhyung Kim
4cfa8a873d tools/include: Sync uapi/linux/fs.h with the kernel sources
To pick up the changes from:

  41bcbe59c3 ("fs: FS_IOC_GETUUID")
  ae8c511757 ("fs: add FS_IOC_GETFSSYSFSPATH")
  73fa7547c7 ("vfs: add RWF_NOAPPEND flag for pwritev2")

This should be used to beautify fs syscall arguments and it addresses
these tools/perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240408185520.1550865-3-namhyung@kernel.org
2024-04-11 10:37:49 -07:00
Namhyung Kim
3ef842a77e tools/include: Sync uapi/drm/i915_drm.h with the kernel sources
To pick up changes from:

   b112364867 ("drm/i915: Add GuC submission interface version query")
   5cf0fbf763 ("drm/i915: Add some boring kerneldoc")

This should be used to beautify DRM syscall arguments and it addresses
these tools/perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240408185520.1550865-2-namhyung@kernel.org
2024-04-11 10:35:32 -07:00
Yonghong Song
699c23f02c bpf: Add bpf_link support for sk_msg and sk_skb progs
Add bpf_link support for sk_msg and sk_skb programs. We have an
internal request to support bpf_link for sk_msg programs so user
space can have a uniform handling with bpf_link based libbpf
APIs. Using bpf_link based libbpf API also has a benefit which
makes system robust by decoupling prog life cycle and
attachment life cycle.

Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240410043527.3737160-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-10 19:52:25 -07:00
Thomas Weißschuh
e93b912ecf tools/nolibc/string: remove open-coded strnlen()
The same header already defines an implementation of strnlen(),
so use it.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2024-04-10 23:27:06 +02:00
Rodrigo Campos
fbffce819e tools/nolibc: Fix strlcpy() return code and size usage
The return code should always be strlen(src), and we should copy at most
size-1 bytes.

While we are there, make sure to null-terminate the dst buffer if we
copied something.

Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2024-04-10 23:19:02 +02:00
Rodrigo Campos
34d232c39a tools/nolibc: Fix strlcat() return code and size usage
The return code should always be strlen(src) + strnlen(dst, size).

Let's make sure to copy at most size-1 bytes from src and null-terminate
the dst buffer if we did copied something.

While we can use strnlen() and strncpy() to implement strlcat(), this is
simple enough and results in shorter code when compiled.

Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2024-04-10 23:19:01 +02:00
Rodrigo Campos
689230b674 tools/nolibc/string: export strlen()
As with commit 8d304a3740, "tools/nolibc/string: export memset() and
memmove()", gcc -Os without -ffreestanding may fail to compile with:

	cc -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib -lgcc  -static -o test test.c
	/usr/bin/ld: /tmp/cccIasKL.o: in function `main':
	test.c:(.text.startup+0x1e): undefined reference to `strlen'
	collect2: error: ld returned 1 exit status

As on the aforementioned commit, this patch adds a section to export
this function so compilation works on those cases too.

Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2024-04-10 23:19:01 +02:00
Brennan Xavier McManus
791f464114 tools/nolibc/stdlib: fix memory error in realloc()
Pass user_p_len to memcpy() instead of heap->len to prevent realloc()
from copying an extra sizeof(heap) bytes from beyond the allocated
region.

Signed-off-by: Brennan Xavier McManus <bxmcmanus@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Fixes: 0e0ff63840 ("tools/nolibc/stdlib: Implement `malloc()`, `calloc()`, `realloc()` and `free()`")
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2024-04-10 23:19:00 +02:00
Linus Torvalds
4f0a8fe321 memblock tests: fix build errors
* add stubs to functions that calls to them were recently added to memblock
 but they were missing in tests
 * update gfp_types.h to include bits.h so that BIT() definitions won't
 depend on other includes
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmYTgQ4QHHJwcHRAa2Vy
 bmVsLm9yZwAKCRA5A4Ymyw79kas7CACU1P6gzbtsz9w1YcJtElowyvQ22dhVO918
 Iu9Y95QkFrJHnKVjrGiIlZVi3Mt0c3xvYm8D16ik3CRvBbLkvGdPGeo7kKeZrF8j
 RvicmSjJduEsiPceCy4+bcFjQx0P3JQ6jKtYwJj21kJrV2j1T3Qob1KVjEmJt+h0
 xcphdyXW1bjU+EgZyEka5ct0nYUDP+a9DwUETFxG+s/Cnp5SMprZHjsdXormHx4j
 8QyU+P916wWmzEtWlLY4yLhe6849Rx9Cse8E8ViWqwxfIZELJ97A5qvHwD9dHCOX
 8oZHzRgUslKoSJWutkjYHeOtB7kDhv1l+KZYOYbC4XE7WUooQs/O
 =CVgG
 -----END PGP SIGNATURE-----

Merge tag 'fixes-2024-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock fixes from Mike Rapoport:
 "Fix build errors in memblock tests:

   - add stubs to functions that calls to them were recently added to
     memblock but they were missing in tests

   - update gfp_types.h to include bits.h so that BIT() definitions
     won't depend on other includes"

* tag 'fixes-2024-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock tests: fix undefined reference to `BIT'
  memblock tests: fix undefined reference to `panic'
  memblock tests: fix undefined reference to `early_pfn_to_nid'
2024-04-08 10:11:37 -07:00
Andrii Nakryiko
343ca8131c selftests/bpf: add fp-leaking precise subprog result tests
Add selftests validating that BPF verifier handles precision marking
for SCALAR registers derived from r10 (fp) register correctly.

Given `r0 = (s8)r10;` syntax is not supported by older Clang compilers,
use the raw BPF instruction syntax to maximize compatibility.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240404214536.3551295-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-04 18:31:08 -07:00
Jakub Kicinski
cf1ca1f66d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/ipv4/ip_gre.c
  17af420545 ("erspan: make sure erspan_base_hdr is present in skb->head")
  5832c4a77d ("ip_tunnel: convert __be16 tunnel flags to bitmaps")
https://lore.kernel.org/all/20240402103253.3b54a1cf@canb.auug.org.au/

Adjacent changes:

net/ipv6/ip6_fib.c
  d21d40605b ("ipv6: Fix infinite recursion in fib6_dump_done().")
  5fc68320c1 ("ipv6: remove RTNL protection from inet6_dump_fib()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04 18:01:07 -07:00
Anton Protopopov
f917170072 bpf: Pack struct bpf_fib_lookup
The struct bpf_fib_lookup is supposed to be of size 64. A recent commit
59b418c706 ("bpf: Add a check for struct bpf_fib_lookup size") added
a static assertion to check this property so that future changes to the
structure will not accidentally break this assumption.

As it immediately turned out, on some 32-bit arm systems, when AEABI=n,
the total size of the structure was equal to 68, see [1]. This happened
because the bpf_fib_lookup structure contains a union of two 16-bit
fields:

    union {
            __u16 tot_len;
            __u16 mtu_result;
    };

which was supposed to compile to a 16-bit-aligned 16-bit field. On the
aforementioned setups it was instead both aligned and padded to 32-bits.

Declare this inner union as __attribute__((packed, aligned(2))) such
that it always is of size 2 and is aligned to 16 bits.

  [1] https://lore.kernel.org/all/CA+G9fYtsoP51f-oP_Sp5MOq-Ffv8La2RztNpwvE6+R1VtFiLrw@mail.gmail.com/#t

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: e1850ea9bd ("bpf: bpf_fib_lookup return MTU value as output when looked up")
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240403123303.1452184-1-aspsk@isovalent.com
2024-04-04 15:52:32 -07:00
Wei Yang
e0f5a8e74b memblock tests: fix undefined reference to `panic'
commit e96c6b8f21 ("memblock: report failures when memblock_can_resize
is not set") introduced the usage of panic, which is not defined in
memblock test.

Let's define it directly in panic.h to fix it.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Song Shuai <songshuaishuai@tinylab.org>
CC: Mike Rapoport <rppt@kernel.org>
Link: https://lore.kernel.org/r/20240402132701.29744-3-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-04-04 11:08:33 +03:00
Wei Yang
7d8ed162e6 memblock tests: fix undefined reference to `early_pfn_to_nid'
commit 6a9531c3a8 ("memblock: fix crash when reserved memory is not
added to memory") introduce the usage of early_pfn_to_nid, which is not
defined in memblock tests.

The original definition of early_pfn_to_nid is defined in mm.h, so let
add this in the corresponding mm.h.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Yajun Deng <yajun.deng@linux.dev>
CC: Mike Rapoport <rppt@kernel.org>
Link: https://lore.kernel.org/r/20240402132701.29744-2-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-04-04 11:08:32 +03:00
Tushar Vyavahare
7effe3fdc0 tools: Add ethtool.h header to tooling infra
This commit duplicates the ethtool.h file from the include/uapi/linux
directory in the kernel source to the tools/include/uapi/linux directory.

This action ensures that the ethtool.h file used in the tools directory
is in sync with the kernel's version, maintaining consistency across the
codebase.

There are some checkpatch warnings in this file that could be cleaned up,
but I preferred to move it over as-is for now to avoid disrupting the code.

Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20240402114529.545475-2-tushar.vyavahare@intel.com
2024-04-03 16:00:14 +02:00
David Lechner
ca4ddc26f8 bpf: Fix typo in uapi doc comments
In a few places in the bpf uapi headers, EOPNOTSUPP is missing a "P" in
the doc comments. This adds the missing "P".

Signed-off-by: David Lechner <dlechner@baylibre.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240329152900.398260-2-dlechner@baylibre.com
2024-04-02 16:04:34 +02:00
Alexander Lobakin
a37fbe666c bitmap: introduce generic optimized bitmap_size()
The number of times yet another open coded
`BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge.
Some generic helper is long overdue.

Add one, bitmap_size(), but with one detail.
BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both
divident and divisor are compile-time constants or when the divisor
is not a pow-of-2. When it is however, the compilers sometimes tend
to generate suboptimal code (GCC 13):

48 83 c0 3f          	add    $0x3f,%rax
48 c1 e8 06          	shr    $0x6,%rax
48 8d 14 c5 00 00 00 00	lea    0x0(,%rax,8),%rdx

%BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does
full division of `nbits + 63` by it and then multiplication by 8.
Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC:

8d 50 3f             	lea    0x3f(%rax),%edx
c1 ea 03             	shr    $0x3,%edx
81 e2 f8 ff ff 1f    	and    $0x1ffffff8,%edx

Now it shifts `nbits + 63` by 3 positions (IOW performs fast division
by 8) and then masks bits[2:0]. bloat-o-meter:

add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617)

Clang does it better and generates the same code before/after starting
from -O1, except that with the ALIGN() approach it uses %edx and thus
still saves some bytes:

add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520)

Note that we can't expand DIV_ROUND_UP() by adding a check and using
this approach there, as it's used in array declarations where
expressions are not allowed.
Add this helper to tools/ as well.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01 10:49:28 +01:00
Alexander Lobakin
10a04ff09b tools: move alignment-related macros to new <linux/align.h>
Currently, tools have *ALIGN*() macros scattered across the unrelated
headers, as there are only 3 of them and they were added separately
each time on an as-needed basis.
Anyway, let's make it more consistent with the kernel headers and allow
using those macros outside of the mentioned headers. Create
<linux/align.h> inside the tools/ folder and include it where needed.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01 10:49:28 +01:00
Alexander Lobakin
7d8296b250 bitops: make BYTES_TO_BITS() treewide-available
Avoid open-coding that simple expression each time by moving
BYTES_TO_BITS() from the probes code to <linux/bitops.h> to export
it to the rest of the kernel.
Simplify the macro while at it. `BITS_PER_LONG / sizeof(long)` always
equals to %BITS_PER_BYTE, regardless of the target architecture.
Do the same for the tools ecosystem as well (incl. its version of
bitops.h). The previous implementation had its implicit type of long,
while the new one is int, so adjust the format literal accordingly in
the perf code.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01 10:49:27 +01:00
Natanael Copa
62248b22d0 tools/resolve_btfids: fix build with musl libc
Include the header that defines u32.
This fixes build of 6.6.23 and 6.1.83 kernels for Alpine Linux, which
uses musl libc. I assume that GNU libc indirecly pulls in linux/types.h.

Fixes: 9707ac4fe2 ("tools/resolve_btfids: Refactor set sorting with types from btf_ids.h")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218647
Cc: stable@vger.kernel.org
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Tested-by: Greg Thelen <gthelen@google.com>
Link: https://lore.kernel.org/r/20240328110103.28734-1-ncopa@alpinelinux.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:58:29 -07:00
Anton Protopopov
5311591fbb bpf: Add support for passing mark with bpf_fib_lookup
Extend the bpf_fib_lookup() helper by making it to utilize mark if
the BPF_FIB_LOOKUP_MARK flag is set. In order to pass the mark the
four bytes of struct bpf_fib_lookup are used, shared with the
output-only smac/dmac fields.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240326101742.17421-2-aspsk@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:30:52 -07:00
Jakub Kicinski
5e47fbe5ce Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts, or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 17:25:57 -07:00
Jakub Kicinski
2a702c2e57 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZgHylwAKCRDbK58LschI
 gzmaAPwKhDFFSU/DU08k22muJxLIXVR7Xx04baJ9mPiFrqZyyAEA8RFNamC7wZIB
 AnfwwoDjfDTP60rlXFaEf8UT5PpA7Ao=
 =/KF6
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-03-25

We've added 38 non-merge commits during the last 13 day(s) which contain
a total of 50 files changed, 867 insertions(+), 274 deletions(-).

The main changes are:

1) Add the ability to specify and retrieve BPF cookie also for raw
   tracepoint programs in order to ease migration from classic to raw
   tracepoints, from Andrii Nakryiko.

2) Allow the use of bpf_get_{ns_,}current_pid_tgid() helper for all
   program types and add additional BPF selftests, from Yonghong Song.

3) Several improvements to bpftool and its build, for example, enabling
   libbpf logs when loading pid_iter in debug mode, from Quentin Monnet.

4) Check the return code of all BPF-related set_memory_*() functions during
   load and bail out in case they fail, from Christophe Leroy.

5) Avoid a goto in regs_refine_cond_op() such that the verifier can
   be better integrated into Agni tool which doesn't support backedges
   yet, from Harishankar Vishwanathan.

6) Add a small BPF trie perf improvement by always inlining
   longest_prefix_match, from Jesper Dangaard Brouer.

7) Small BPF selftest refactor in bpf_tcp_ca.c to utilize start_server()
   helper instead of open-coding it, from Geliang Tang.

8) Improve test_tc_tunnel.sh BPF selftest to prevent client connect
   before the server bind, from Alessandro Carminati.

9) Fix BPF selftest benchmark for older glibc and use syscall(SYS_gettid)
   instead of gettid(), from Alan Maguire.

10) Implement a backward-compatible method for struct_ops types with
    additional fields which are not present in older kernels,
    from Kui-Feng Lee.

11) Add a small helper to check if an instruction is addr_space_cast
    from as(0) to as(1) and utilize it in x86-64 JIT, from Puranjay Mohan.

12) Small cleanup to remove unnecessary error check in
    bpf_struct_ops_map_update_elem, from Martin KaFai Lau.

13) Improvements to libbpf fd validity checks for BPF map/programs,
    from Mykyta Yatsenko.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (38 commits)
  selftests/bpf: Fix flaky test btf_map_in_map/lookup_update
  bpf: implement insn_is_cast_user() helper for JITs
  bpf: Avoid get_kernel_nofault() to fetch kprobe entry IP
  selftests/bpf: Use start_server in bpf_tcp_ca
  bpf: Sync uapi bpf.h to tools directory
  libbpf: Add new sec_def "sk_skb/verdict"
  selftests/bpf: Mark uprobe trigger functions with nocf_check attribute
  selftests/bpf: Use syscall(SYS_gettid) instead of gettid() wrapper in bench
  bpf-next: Avoid goto in regs_refine_cond_op()
  bpftool: Clean up HOST_CFLAGS, HOST_LDFLAGS for bootstrap bpftool
  selftests/bpf: scale benchmark counting by using per-CPU counters
  bpftool: Remove unnecessary source files from bootstrap version
  bpftool: Enable libbpf logs when loading pid_iter in debug mode
  selftests/bpf: add raw_tp/tp_btf BPF cookie subtests
  libbpf: add support for BPF cookie for raw_tp/tp_btf programs
  bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
  bpf: pass whole link instead of prog when triggering raw tracepoint
  bpf: flatten bpf_probe_register call chain
  selftests/bpf: Prevent client connect before server bind in test_tc_tunnel.sh
  selftests/bpf: Add a sk_msg prog bpf_get_ns_current_pid_tgid() test
  ...
====================

Link: https://lore.kernel.org/r/20240325233940.7154-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-27 07:52:34 -07:00
Yonghong Song
476a5e9291 bpf: Sync uapi bpf.h to tools directory
There is a difference between kernel uapi bpf.h and tools
uapi bpf.h. There is no functionality difference, but let
us sync properly to make it easy for later bpf.h update.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240325033842.1693553-1-yonghong.song@linux.dev
2024-03-25 16:36:56 +01:00
Linus Torvalds
1e3cd03c54 LoongArch changes for v6.9
1, Add objtool support for LoongArch;
 2, Add ORC stack unwinder support for LoongArch;
 3, Add kernel livepatching support for LoongArch;
 4, Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig;
 5, Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig;
 6, Some bug fixes and other small changes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmX69KsWHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImerjFD/9rAzm1+G4VxFvFzlOiJXEqquNJ
 +Vz2fAZLU3lJhBlx0uUKXFVijvLe8s/DnoLrM9e/M6gk4ivT9eszy3DnqT3NjGDX
 njYFkPUWZhZGACmbkoVk9St80R8sPIdZrwXtW3q7g3T0bC7LXUXrJw52Sh4gmbYx
 RqLsE6GoEWGY0zhhWqeeAM9LkKDuLxxyjH4fYE4g623EhQt7A7hP5okyaC+xHzp+
 qp/4dPFLu61LeqIfeBUKK7nQ6uzno3EWLiME2eHEHiuelYfzmh+BtNMcX9Ugb/En
 j0vLGNsoDGmEYw7xGa6OSRaCR/nCwVJz4SvuH33wbbbHhVAiUKUBVNFR3gmAtLlc
 BSa2dDZbKhHkiWSUCM9K2ihr7WiQNuraTK1kKHwBgfa+RbEVOTu1q8yokAB9XCaT
 T7lijJ8MKQmzHpMvgev7nN41baDB6V5bPIni0Ueh+NhQJKZ2/IxtYA3XzV5D0UgL
 TBovVgYB/VNThS9gzOrlenKuDX9hT+kCQgyudErXaoIo645P6dsPFowOZRQxCEIv
 WnLskZatLTCA8xWl1XyC1bqtGxhp34Gbhg0ZcvUqlNE20luaK/qi8wtW9Mv1Utp+
 aXFO3i7d93z99oAcUT0oc1N83T0x0M/p69Z+rL/2+L0sYQgBf1cwUEiDNRW4OCdI
 h15289rRTxjeL7NZPw==
 =lSkY
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

 - Add objtool support for LoongArch

 - Add ORC stack unwinder support for LoongArch

 - Add kernel livepatching support for LoongArch

 - Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig

 - Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig

 - Some bug fixes and other small changes

* tag 'loongarch-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch/crypto: Clean up useless assignment operations
  LoongArch: Define the __io_aw() hook as mmiowb()
  LoongArch: Remove superfluous flush_dcache_page() definition
  LoongArch: Move {dmw,tlb}_virt_to_page() definition to page.h
  LoongArch: Change __my_cpu_offset definition to avoid mis-optimization
  LoongArch: Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig
  LoongArch: Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig
  LoongArch: Add kernel livepatching support
  LoongArch: Add ORC stack unwinder support
  objtool: Check local label in read_unwind_hints()
  objtool: Check local label in add_dead_ends()
  objtool/LoongArch: Enable orc to be built
  objtool/x86: Separate arch-specific and generic parts
  objtool/LoongArch: Implement instruction decoder
  objtool/LoongArch: Enable objtool to be built
2024-03-22 10:22:45 -07:00
Jiri Olsa
af8d27bf15 selftests/bpf: Mark uprobe trigger functions with nocf_check attribute
Some distros seem to enable the -fcf-protection=branch by default,
which breaks our setup on first instruction of uprobe trigger
functions and place there endbr64 instruction.

Marking them with nocf_check attribute to skip that.

Ignoring unknown attribute warning in gcc for bench objects, because
nocf_check can be used only when -fcf-protection=branch is enabled,
otherwise we get a warning and break compilation.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240322134936.1075395-1-jolsa@kernel.org
2024-03-22 10:01:17 -07:00
Arnaldo Carvalho de Melo
4962e19496 perf beauty: Move uapi/linux/vhost.h copy out of the directory used to build perf
It is only used to generate string tables, not to build perf, so move it
to the tools/perf/trace/beauty/include/ hierarchy, that is used just for
scraping.

This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.

No other tools/ living code uses it, just <linux/vhost.h> coming from
either 'make install_headers' or from the system /usr/include/
directory.

Suggested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 20:44:35 -03:00
Arnaldo Carvalho de Melo
2316ef5891 perf beauty: Introduce scrape script for 'clone' syscall 'flags' argument
It was using the first variation on producing a string representation
for a binary flag, one that used the copy of uapi/linux/sched.h with
preprocessor tricks that had to be updated everytime a new flag was
introduced.

Use the more recent scrape script + strarray + strarray__scnprintf_flags() combo.

  $ tools/perf/trace/beauty/clone.sh | head -5
  static const char *clone_flags[] = {
  	[ilog2(0x00000100) + 1] = "VM",
  	[ilog2(0x00000200) + 1] = "FS",
  	[ilog2(0x00000400) + 1] = "FILES",
  	[ilog2(0x00000800) + 1] = "SIGHAND",
  $

Now we can move uapi/linux/sched.h from tools/include/, that is used for
building perf to the scrape only directory tools/perf/trace/beauty/include.

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZfnULIn3XKDq0bpc@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:29 -03:00
Arnaldo Carvalho de Melo
a672af9139 tools headers: Remove almost unused copy of uapi/stat.h, add few conditional defines
These were used to build perf to provide defines not available in older
distros, but this was back in 2017, nowadays most the distros that are
supported and I have build containers for work using just the system
headers, so ditch them.

For the few that don't have STATX_MNT_ID{_UNIQUE}, or STATX_MNT_DIOALIGN
add them conditionally.

Some of these older distros may not have things that are used in 'perf
trace', but then they also don't have libtraceevent packages, so don't
build 'perf trace'.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/20240315204835.748716-6-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo
8a1ad44135 tools headers: Remove now unused copies of uapi/{fcntl,openat2}.h and asm/fcntl.h
These were used to build perf to provide defines not available in older
distros, but this was back in 2017, nowadays all the distros that are
supported and I have build containers for work using just the system
headers, so ditch them.

Some of these older distros may not have things that are used in 'perf
trace', but then they also don't have libtraceevent packages, so don't
build 'perf trace'.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/20240315204835.748716-5-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo
eb01fe7abb perf beauty: Move prctl.h files (uapi/linux and x86's) copy out of the directory used to build perf
It is used only to generate string tables, not to build perf, so move it
to the tools/perf/trace/beauty/{include,arch}/ hierarchies, that is used
just for scraping.

This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.

No other tools/ living code uses it, just <linux/usbdevice_fs.h> coming
from either 'make install_headers' or from the system /usr/include/
directory.

Suggested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/20240315204835.748716-3-acme@kernel.org
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo
7050e33e86 perf beauty: Move uapi/sound/asound.h copy out of the directory used to build perf
It is used only to generate string tables, not to build perf, so move it
to the tools/perf/trace/beauty/include/ hierarchy, that is used just for
scraping.

This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.

Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo
44512bd613 perf beauty: Move uapi/linux/usbdevice_fs.h copy out of the directory used to build perf
It is mostly used only to generate string tables, not to build perf, so
move it to the tools/perf/trace/beauty/include/ hierarchy, that is used
just for scraping.

This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.

No other tools/ living code uses it, just <linux/usbdevice_fs.h> coming
from either 'make install_headers' or from the system /usr/include/
directory.

Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo
ab3316119f perf beauty: Move uapi/linux/mount.h copy out of the directory used to build perf
It is mostly used only to generate string tables, not to build perf, so
move it to the tools/perf/trace/beauty/include/ hierarchy, that is used
just for scraping.

This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.

No other tools/ living code uses it, just <linux/mount.h> coming from
either 'make install_headers' or from the system /usr/include/
directory.

Suggested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo
faf7217a39 perf beauty: Move uapi/linux/fs.h copy out of the directory used to build perf
It is mostly used only to generate string tables, not to build perf, so
move it to the tools/perf/trace/beauty/include/ hierarchy, that is used
just for scraping.

The only case where it was being used to build was in
tools/perf/trace/beauty/sync_file_range.c, because some older systems
doesn't have the SYNC_FILE_RANGE_WRITE_AND_WAIT define, just use the
system's linux/fs.h header instead, defining it if not available.

This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.

No other tools/ living code uses it, just <linux/fs.h> coming from
either 'make install_headers' or from the system /usr/include/
directory.

Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:27 -03:00
Andrii Nakryiko
68ca5d4eeb bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
Wire up BPF cookie for raw tracepoint programs (both BTF and non-BTF
aware variants). This brings them up to part w.r.t. BPF cookie usage
with classic tracepoint and fentry/fexit programs.

Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319233852.1977493-4-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-19 23:05:34 -07:00
Linus Torvalds
0ea680eda6 slab changes for 6.9
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmXwH0wACgkQu+CwddJF
 iJq3HAf6A/0m0pSr0QDcwjM8D7TVYQJ+Z/jPC6Mj+HfTcF8Otrgk8c0M6EsHGIGF
 GQNnYJRKmBla3mpVFvDtsVZuiakEtRLCpoP5n23s8p8gY9ibJcl6bpn9NaMVMKrq
 kBnhQ9VdLAgKVcTH8wz6jJqdWiZ7W4jGH5NWO+nr+r0H7vay7jfB0+tur1NO8J09
 HE5I76XE6ArRvaKYxvsZmOx1pihSmsJ7CerXN6Y8U5qcuxNXdUO/9rf+uv5llDIV
 gl54UAU79koZ9k88t5AiSKO2IZVhBgC/j66ds9MRRAFCf/ldxUtJIlsHTOnumfmy
 FApqwtR0MYNPeMPZpzogQbv58oOcNw==
 =XDxn
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

 - Freelist loading optimization (Chengming Zhou)

   When the per-cpu slab is depleted and a new one loaded from the cpu
   partial list, optimize the loading to avoid an irq enable/disable
   cycle. This results in a 3.5% performance improvement on the "perf
   bench sched messaging" test.

 - Kernel boot parameters cleanup after SLAB removal (Xiongwei Song)

   Due to two different main slab implementations we've had boot
   parameters prefixed either slab_ and slub_ with some later becoming
   an alias as both implementations gained the same functionality (i.e.
   slab_nomerge vs slub_nomerge). In order to eventually get rid of the
   implementation-specific names, the canonical and documented
   parameters are now all prefixed slab_ and the slub_ variants become
   deprecated but still working aliases.

 - SLAB_ kmem_cache creation flags cleanup (Vlastimil Babka)

   The flags had hardcoded #define values which became tedious and
   error-prone when adding new ones. Assign the values via an enum that
   takes care of providing unique bit numbers. Also deprecate
   SLAB_MEM_SPREAD which was only used by SLAB, so it's a no-op since
   SLAB removal. Assign it an explicit zero value. The removals of the
   flag usage are handled independently in the respective subsystems,
   with a final removal of any leftover usage planned for the next
   release.

 - Misc cleanups and fixes (Chengming Zhou, Xiaolei Wang, Zheng Yejian)

   Includes removal of unused code or function parameters and a fix of a
   memleak.

* tag 'slab-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  slab: remove PARTIAL_NODE slab_state
  mm, slab: remove memcg_from_slab_obj()
  mm, slab: remove the corner case of inc_slabs_node()
  mm/slab: Fix a kmemleak in kmem_cache_destroy()
  mm, slab, kasan: replace kasan_never_merge() with SLAB_NO_MERGE
  mm, slab: use an enum to define SLAB_ cache creation flags
  mm, slab: deprecate SLAB_MEM_SPREAD flag
  mm, slab: fix the comment of cpu partial list
  mm, slab: remove unused object_size parameter in kmem_cache_flags()
  mm/slub: remove parameter 'flags' in create_kmalloc_caches()
  mm/slub: remove unused parameter in next_freelist_entry()
  mm/slub: remove full list manipulation for non-debug slab
  mm/slub: directly load freelist from cpu partial slab in the likely case
  mm/slub: make the description of slab_min_objects helpful in doc
  mm/slub: replace slub_$params with slab_$params in slub.rst
  mm/slub: unify all sl[au]b parameters with "slab_$param"
  Documentation: kernel-parameters: remove noaliencache
2024-03-12 20:14:54 -07:00
Jakub Kicinski
5f20e6ab1f for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmXvm7IACgkQ6rmadz2v
 bTqdMA//VMHNHVLb4oROoXyQD9fw2mCmIUEKzP88RXfqcxsfEX7HF+k8B5ZTk0ro
 CHXTAnc79+Qqg0j24bkQKxup/fKBQVw9D+Ia4b3ytlm1I2MtyU/16xNEzVhAPU2D
 iKk6mVBsEdCbt/GjpWORy/VVnZlZpC7BOpZLxsbbxgXOndnCegyjXzSnLGJGxdvi
 zkrQTn2SrFzLi6aNpVLqrv6Nks6HJusfCKsIrtlbkQ85dulasHOtwK9s6GF60nte
 aaho+MPx3L+lWEgapsm8rR779pHaYIB/GbZUgEPxE/xUJ/V8BzDgFNLMzEiIBRMN
 a0zZam11BkBzCfcO9gkvDRByaei/dZz2jdqfU4GlHklFj1WFfz8Q7fRLEPINksvj
 WXLgJADGY5mtGbjG21FScThxzj+Ruqwx0a13ddlyI/W+P3y5yzSWsLwJG5F9p0oU
 6nlkJ4U8yg+9E1ie5ae0TibqvRJzXPjfOERZGwYDSVvfQGzv1z+DGSOPMmgNcWYM
 dIaO+A/+NS3zdbk8+1PP2SBbhHPk6kWyCUByWc7wMzCPTiwriFGY/DD2sN+Fsufo
 zorzfikUQOlTfzzD5jbmT49U8hUQUf6QIWsu7BijSiHaaC7am4S8QB2O6ibJMqdv
 yNiwvuX+ThgVIY3QKrLLqL0KPGeKMR5mtfq6rrwSpfp/b4g27FE=
 =eFgA
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2024-03-11

We've added 59 non-merge commits during the last 9 day(s) which contain
a total of 88 files changed, 4181 insertions(+), 590 deletions(-).

The main changes are:

1) Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce
   VM_SPARSE kind and vm_area_[un]map_pages to be used in bpf_arena,
   from Alexei.

2) Introduce bpf_arena which is sparse shared memory region between bpf
   program and user space where structures inside the arena can have
   pointers to other areas of the arena, and pointers work seamlessly for
   both user-space programs and bpf programs, from Alexei and Andrii.

3) Introduce may_goto instruction that is a contract between the verifier
   and the program. The verifier allows the program to loop assuming it's
   behaving well, but reserves the right to terminate it, from Alexei.

4) Use IETF format for field definitions in the BPF standard
   document, from Dave.

5) Extend struct_ops libbpf APIs to allow specify version suffixes for
   stuct_ops map types, share the same BPF program between several map
   definitions, and other improvements, from Eduard.

6) Enable struct_ops support for more than one page in trampolines,
   from Kui-Feng.

7) Support kCFI + BPF on riscv64, from Puranjay.

8) Use bpf_prog_pack for arm64 bpf trampoline, from Puranjay.

9) Fix roundup_pow_of_two undefined behavior on 32-bit archs, from Toke.
====================

Link: https://lore.kernel.org/r/20240312003646.8692-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 18:06:04 -07:00
Alexei Starovoitov
667a86ad9b bpf: Disasm support for addr_space_cast instruction.
LLVM generates rX = addr_space_cast(rY, dst_addr_space, src_addr_space)
instruction when pointers in non-zero address space are used by the bpf
program. Recognize this insn in uapi and in bpf disassembler.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-3-alexei.starovoitov@gmail.com
2024-03-11 15:37:24 -07:00
Alexei Starovoitov
317460317a bpf: Introduce bpf_arena.
Introduce bpf_arena, which is a sparse shared memory region between the bpf
program and user space.

Use cases:
1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed
   anonymous region, like memcached or any key/value storage. The bpf
   program implements an in-kernel accelerator. XDP prog can search for
   a key in bpf_arena and return a value without going to user space.
2. The bpf program builds arbitrary data structures in bpf_arena (hash
   tables, rb-trees, sparse arrays), while user space consumes it.
3. bpf_arena is a "heap" of memory from the bpf program's point of view.
   The user space may mmap it, but bpf program will not convert pointers
   to user base at run-time to improve bpf program speed.

Initially, the kernel vm_area and user vma are not populated. User space
can fault in pages within the range. While servicing a page fault,
bpf_arena logic will insert a new page into the kernel and user vmas. The
bpf program can allocate pages from that region via
bpf_arena_alloc_pages(). This kernel function will insert pages into the
kernel vm_area. The subsequent fault-in from user space will populate that
page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time
can be used to prevent fault-in from user space. In such a case, if a page
is not allocated by the bpf program and not present in the kernel vm_area,
the user process will segfault. This is useful for use cases 2 and 3 above.

bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages
either at a specific address within the arena or allocates a range with the
maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees
pages and removes the range from the kernel vm_area and from user process
vmas.

bpf_arena can be used as a bpf program "heap" of up to 4GB. The speed of
bpf program is more important than ease of sharing with user space. This is
use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended.
It will tell the verifier to treat the rX = bpf_arena_cast_user(rY)
instruction as a 32-bit move wX = wY, which will improve bpf prog
performance. Otherwise, bpf_arena_cast_user is translated by JIT to
conditionally add the upper 32 bits of user vm_start (if the pointer is not
NULL) to arena pointers before they are stored into memory. This way, user
space sees them as valid 64-bit pointers.

Diff https://github.com/llvm/llvm-project/pull/84410 enables LLVM BPF
backend generate the bpf_addr_space_cast() instruction to cast pointers
between address_space(1) which is reserved for bpf_arena pointers and
default address space zero. All arena pointers in a bpf program written in
C language are tagged as __attribute__((address_space(1))). Hence, clang
provides helpful diagnostics when pointers cross address space. Libbpf and
the kernel support only address_space == 1. All other address space
identifiers are reserved.

rX = bpf_addr_space_cast(rY, /* dst_as */ 1, /* src_as */ 0) tells the
verifier that rX->type = PTR_TO_ARENA. Any further operations on
PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will
mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate
them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar
to copy_from_kernel_nofault() except that no address checks are necessary.
The address is guaranteed to be in the 4GB range. If the page is not
present, the destination register is zeroed on read, and the operation is
ignored on write.

rX = bpf_addr_space_cast(rY, 0, 1) tells the verifier that rX->type =
unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the
verifier converts such cast instructions to mov32. Otherwise, JIT will emit
native code equivalent to:
rX = (u32)rY;
if (rY)
  rX |= clear_lo32_bits(arena->user_vm_start); /* replace hi32 bits in rX */

After such conversion, the pointer becomes a valid user pointer within
bpf_arena range. The user process can access data structures created in
bpf_arena without any additional computations. For example, a linked list
built by a bpf program can be walked natively by user space.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Barret Rhoden <brho@google.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-2-alexei.starovoitov@gmail.com
2024-03-11 15:37:23 -07:00
Tiezhu Yang
b2d23158e6 objtool/LoongArch: Implement instruction decoder
Only copy the minimal definitions of instruction opcodes and formats
in inst.h from arch/loongarch to tools/arch/loongarch, and also copy
the definition of sign_extend64() to tools/include/linux/bitops.h to
decode the following kinds of instructions:

(1) stack pointer related instructions
addi.d, ld.d, st.d, ldptr.d and stptr.d

(2) branch and jump related instructions
beq, bne, blt, bge, bltu, bgeu, beqz, bnez, bceqz, bcnez, b, bl and jirl

(3) other instructions
break, nop and ertn

See more info about instructions in LoongArch Reference Manual:
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html

Co-developed-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Co-developed-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-03-11 22:23:47 +08:00
Jakub Kicinski
92f8b1f5ca netdev: add queue stat for alloc failures
Rx alloc failures are commonly counted by drivers.
Support reporting those via netdev-genl queue stats.

Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306195509.1502746-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:13:26 -08:00
Jakub Kicinski
ab63a2387c netdev: add per-queue statistics
The ethtool-nl family does a good job exposing various protocol
related and IEEE/IETF statistics which used to get dumped under
ethtool -S, with creative names. Queue stats don't have a netlink
API, yet, and remain a lion's share of ethtool -S output for new
drivers. Not only is that bad because the names differ driver to
driver but it's also bug-prone. Intuitively drivers try to report
only the stats for active queues, but querying ethtool stats
involves multiple system calls, and the number of stats is
read separately from the stats themselves. Worse still when user
space asks for values of the stats, it doesn't inform the kernel
how big the buffer is. If number of stats increases in the meantime
kernel will overflow user buffer.

Add a netlink API for dumping queue stats. Queue information is
exposed via the netdev-genl family, so add the stats there.
Support per-queue and sum-for-device dumps. Latter will be useful
when subsequent patches add more interesting common stats than
just bytes and packets.

The API does not currently distinguish between HW and SW stats.
The expectation is that the source of the stats will either not
matter much (good packets) or be obvious (skb alloc errors).

Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:13:25 -08:00
Alexei Starovoitov
011832b97b bpf: Introduce may_goto instruction
Introduce may_goto instruction that from the verifier pov is similar to
open coded iterators bpf_for()/bpf_repeat() and bpf_loop() helper, but it
doesn't iterate any objects.
In assembly 'may_goto' is a nop most of the time until bpf runtime has to
terminate the program for whatever reason. In the current implementation
may_goto has a hidden counter, but other mechanisms can be used.
For programs written in C the later patch introduces 'cond_break' macro
that combines 'may_goto' with 'break' statement and has similar semantics:
cond_break is a nop until bpf runtime has to break out of this loop.
It can be used in any normal "for" or "while" loop, like

  for (i = zero; i < cnt; cond_break, i++) {

The verifier recognizes that may_goto is used in the program, reserves
additional 8 bytes of stack, initializes them in subprog prologue, and
replaces may_goto instruction with:
aux_reg = *(u64 *)(fp - 40)
if aux_reg == 0 goto pc+off
aux_reg -= 1
*(u64 *)(fp - 40) = aux_reg

may_goto instruction can be used by LLVM to implement __builtin_memcpy,
__builtin_strcmp.

may_goto is not a full substitute for bpf_for() macro.
bpf_for() doesn't have induction variable that verifiers sees,
so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded.

But when the code is written as:
for (i = 0; i < 100; cond_break, i++)
the verifier see 'i' as precise constant zero,
hence cond_break (aka may_goto) doesn't help to converge the loop.
A static or global variable can be used as a workaround:
static int zero = 0;
for (i = zero; i < 100; cond_break, i++) // works!

may_goto works well with arena pointers that don't need to be bounds
checked on access. Load/store from arena returns imprecise unbounded
scalar and loops with may_goto pass the verifier.

Reserve new opcode BPF_JMP | BPF_JCOND for may_goto insn.
JCOND stands for conditional pseudo jump.
Since goto_or_nop insn was proposed, it may use the same opcode.
may_goto vs goto_or_nop can be distinguished by src_reg:
code = BPF_JMP | BPF_JCOND
src_reg = 0 - may_goto
src_reg = 1 - goto_or_nop

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240306031929.42666-2-alexei.starovoitov@gmail.com
2024-03-06 15:17:31 -08:00
Chengming Zhou
fae1b01293 slab: remove PARTIAL_NODE slab_state
The PARTIAL_NODE slab_state has gone with SLAB removed, so just
remove it.

Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-03-05 11:12:47 +01:00
Jakub Kicinski
4b2765ae41 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZeEKVAAKCRDbK58LschI
 g7oYAQD5Jlv4fIVTvxvfZrTTZ2tU+OsPa75mc8SDKwpash3YygEA8kvESy8+t6pg
 D6QmSf1DIZdFoSp/bV+pfkNWMeR8gwg=
 =mTAj
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-02-29

We've added 119 non-merge commits during the last 32 day(s) which contain
a total of 150 files changed, 3589 insertions(+), 995 deletions(-).

The main changes are:

1) Extend the BPF verifier to enable static subprog calls in spin lock
   critical sections, from Kumar Kartikeya Dwivedi.

2) Fix confusing and incorrect inference of PTR_TO_CTX argument type
   in BPF global subprogs, from Andrii Nakryiko.

3) Larger batch of riscv BPF JIT improvements and enabling inlining
   of the bpf_kptr_xchg() for RV64, from Pu Lehui.

4) Allow skeleton users to change the values of the fields in struct_ops
   maps at runtime, from Kui-Feng Lee.

5) Extend the verifier's capabilities of tracking scalars when they
   are spilled to stack, especially when the spill or fill is narrowing,
   from Maxim Mikityanskiy & Eduard Zingerman.

6) Various BPF selftest improvements to fix errors under gcc BPF backend,
   from Jose E. Marchesi.

7) Avoid module loading failure when the module trying to register
   a struct_ops has its BTF section stripped, from Geliang Tang.

8) Annotate all kfuncs in .BTF_ids section which eventually allows
   for automatic kfunc prototype generation from bpftool, from Daniel Xu.

9) Several updates to the instruction-set.rst IETF standardization
   document, from Dave Thaler.

10) Shrink the size of struct bpf_map resp. bpf_array,
    from Alexei Starovoitov.

11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer,
    from Benjamin Tissoires.

12) Fix bpftool to be more portable to musl libc by using POSIX's
    basename(), from Arnaldo Carvalho de Melo.

13) Add libbpf support to gcc in CORE macro definitions,
    from Cupertino Miranda.

14) Remove a duplicate type check in perf_event_bpf_event,
    from Florian Lehner.

15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them
    with notrace correctly, from Yonghong Song.

16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible
    array to fix build warnings, from Kees Cook.

17) Fix resolve_btfids cross-compilation to non host-native endianness,
    from Viktor Malik.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
  selftests/bpf: Test if shadow types work correctly.
  bpftool: Add an example for struct_ops map and shadow type.
  bpftool: Generated shadow variables for struct_ops maps.
  libbpf: Convert st_ops->data to shadow type.
  libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
  bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
  bpf, arm64: use bpf_prog_pack for memory management
  arm64: patching: implement text_poke API
  bpf, arm64: support exceptions
  arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT
  bpf: add is_async_callback_calling_insn() helper
  bpf: introduce in_sleepable() helper
  bpf: allow more maps in sleepable bpf programs
  selftests/bpf: Test case for lacking CFI stub functions.
  bpf: Check cfi_stubs before registering a struct_ops type.
  bpf: Clarify batch lookup/lookup_and_delete semantics
  bpf, docs: specify which BPF_ABS and BPF_IND fields were zero
  bpf, docs: Fix typos in instruction-set.rst
  selftests/bpf: update tcp_custom_syncookie to use scalar packet offset
  bpf: Shrink size of struct bpf_map/bpf_array.
  ...
====================

Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-02 20:50:59 -08:00
Kees Cook
896880ff30 bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
Replace deprecated 0-length array in struct bpf_lpm_trie_key with
flexible array. Found with GCC 13:

../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=]
  207 |                                        *(__be16 *)&key->data[i]);
      |                                                   ^~~~~~~~~~~~~
../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16'
  102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
      |                                                      ^
../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu'
   97 | #define be16_to_cpu __be16_to_cpu
      |                     ^~~~~~~~~~~~~
../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu'
  206 |                 u16 diff = be16_to_cpu(*(__be16 *)&node->data[i]
^
      |                            ^~~~~~~~~~~
In file included from ../include/linux/bpf.h:7:
../include/uapi/linux/bpf.h:82:17: note: while referencing 'data'
   82 |         __u8    data[0];        /* Arbitrary size */
      |                 ^~~~

And found at run-time under CONFIG_FORTIFY_SOURCE:

  UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49
  index 0 is out of range for type '__u8 [*]'

Changing struct bpf_lpm_trie_key is difficult since has been used by
userspace. For example, in Cilium:

	struct egress_gw_policy_key {
	        struct bpf_lpm_trie_key lpm_key;
	        __u32 saddr;
	        __u32 daddr;
	};

While direct references to the "data" member haven't been found, there
are static initializers what include the final member. For example,
the "{}" here:

        struct egress_gw_policy_key in_key = {
                .lpm_key = { 32 + 24, {} },
                .saddr   = CLIENT_IP,
                .daddr   = EXTERNAL_SVC_IP & 0Xffffff,
        };

To avoid the build time and run time warnings seen with a 0-sized
trailing array for struct bpf_lpm_trie_key, introduce a new struct
that correctly uses a flexible array for the trailing bytes,
struct bpf_lpm_trie_key_u8. As part of this, include the "header"
portion (which is just the "prefixlen" member), so it can be used
by anything building a bpf_lpr_trie_key that has trailing members that
aren't a u8 flexible array (like the self-test[1]), which is named
struct bpf_lpm_trie_key_hdr.

Unfortunately, C++ refuses to parse the __struct_group() helper, so
it is not possible to define struct bpf_lpm_trie_key_hdr directly in
struct bpf_lpm_trie_key_u8, so we must open-code the union directly.

Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out,
and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment
to the UAPI header directing folks to the two new options.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Closes: https://paste.debian.net/hidden/ca500597/
Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1]
Link: https://lore.kernel.org/bpf/20240222155612.it.533-kees@kernel.org
2024-02-29 22:52:43 +01:00
Martin Kelly
58fd62e0aa bpf: Clarify batch lookup/lookup_and_delete semantics
The batch lookup and lookup_and_delete APIs have two parameters,
in_batch and out_batch, to facilitate iterative
lookup/lookup_and_deletion operations for supported maps. Except NULL
for in_batch at the start of these two batch operations, both parameters
need to point to memory equal or larger than the respective map key
size, except for various hashmaps (hash, percpu_hash, lru_hash,
lru_percpu_hash) where the in_batch/out_batch memory size should be
at least 4 bytes.

Document these semantics to clarify the API.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240221211838.1241578-1-martin.kelly@crowdstrike.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-22 10:24:38 -08:00
Jakub Kicinski
73be9a3aab Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

net/core/dev.c
  9f30831390 ("net: add rcu safety to rtnl_prop_list_size()")
  723de3ebef ("net: free altname using an RCU callback")

net/unix/garbage.c
  11498715f2 ("af_unix: Remove io_uring code for GC.")
  25236c91b5 ("af_unix: Fix task hung while purging oob_skb in GC.")

drivers/net/ethernet/renesas/ravb_main.c
  ed4adc0720 ("net: ravb: Count packets instead of descriptors in GbEth RX path"
)
  c2da940857 ("ravb: Add Rx checksum offload support for GbEth")

net/mptcp/protocol.c
  bdd70eb689 ("mptcp: drop the push_pending field")
  28e5c13805 ("mptcp: annotate lockless accesses around read-mostly fields")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15 16:20:04 -08:00
Linus Torvalds
4356e9f841 work around gcc bugs with 'asm goto' with outputs
We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c323 ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f180345f ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").

Then, much later, we ended up removing the workaround in commit
43c249ea0b ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.

Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround.  But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.

It looks like there are at least two separate issues that all hit in
this area:

 (a) some versions of gcc don't mark the asm goto as 'volatile' when it
     has outputs:

        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420

     which is easy to work around by just adding the 'volatile' by hand.

 (b) Internal compiler errors:

        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

     which are worked around by adding the extra empty 'asm' as a
     barrier, as in the original workaround.

but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.

but the same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.

Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-02-09 15:57:48 -08:00
Jakub Kicinski
3be042cf46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

drivers/net/ethernet/stmicro/stmmac/common.h
  38cc3c6dcc ("net: stmmac: protect updates of 64-bit statistics counters")
  fd5a6a7131 ("net: stmmac: est: Per Tx-queue error count for HLBF")
  c5c3e1bfc9 ("net: stmmac: Offload queueMaxSDU from tc-taprio")

drivers/net/wireless/microchip/wilc1000/netdev.c
  c901388028 ("wifi: fill in MODULE_DESCRIPTION()s for wilc1000")
  328efda22a ("wifi: wilc1000: do not realloc workqueue everytime an interface is added")

net/unix/garbage.c
  11498715f2 ("af_unix: Remove io_uring code for GC.")
  1279f9d9de ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-08 15:30:33 -08:00
Viktor Malik
9707ac4fe2 tools/resolve_btfids: Refactor set sorting with types from btf_ids.h
Instead of using magic offsets to access BTF ID set data, leverage types
from btf_ids.h (btf_id_set and btf_id_set8) which define the actual
layout of the data. Thanks to this change, set sorting should also
continue working if the layout changes.

This requires to sync the definition of 'struct btf_id_set8' from
include/linux/btf_ids.h to tools/include/linux/btf_ids.h. We don't sync
the rest of the file at the moment, b/c that would require to also sync
multiple dependent headers and we don't need any other defs from
btf_ids.h.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/ff7f062ddf6a00815fda3087957c4ce667f50532.1707223196.git.vmalik@redhat.com
2024-02-07 15:56:18 -08:00
Aahil Awatramani
240fd40552 bonding: Add independent control state machine
Add support for the independent control state machine per IEEE
802.1AX-2008 5.4.15 in addition to the existing implementation of the
coupled control state machine.

Introduces two new states, AD_MUX_COLLECTING and AD_MUX_DISTRIBUTING in
the LACP MUX state machine for separated handling of an initial
Collecting state before the Collecting and Distributing state. This
enables a port to be in a state where it can receive incoming packets
while not still distributing. This is useful for reducing packet loss when
a port begins distributing before its partner is able to collect.

Added new functions such as bond_set_slave_tx_disabled_flags and
bond_set_slave_rx_enabled_flags to precisely manage the port's collecting
and distributing states. Previously, there was no dedicated method to
disable TX while keeping RX enabled, which this patch addresses.

Note that the regular flow process in the kernel's bonding driver remains
unaffected by this patch. The extension requires explicit opt-in by the
user (in order to ensure no disruptions for existing setups) via netlink
support using the new bonding parameter coupled_control. The default value
for coupled_control is set to 1 so as to preserve existing behaviour.

Signed-off-by: Aahil Awatramani <aahila@google.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240202175858.1573852-1-aahila@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-06 13:17:54 +01:00
Arnaldo Carvalho de Melo
fdd0ae72b3 perf tools headers: update the asm-generic/unaligned.h copy with the kernel sources
To pick up the changes in:

  1ab33c0314 ("asm-generic: make sparse happy with odd-sized put_unaligned_*()")

Addressing this perf tools build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/asm-generic/unaligned.h include/asm-generic/unaligned.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zbp9I7rmFj1Owhug@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-01-31 14:02:41 -03:00
Arnaldo Carvalho de Melo
1f8c43b09e tools include UAPI: Sync linux/mount.h copy with the kernel sources
To pick the changes from:

  35e27a5744 ("fs: keep struct mnt_id_req extensible")
  b4c2bea8ce ("add listmount(2) syscall")
  46eae99ef7 ("add statmount(2) syscall")

That doesn't change anything in tools this time as nothing that is
harvested by the beauty scripts got changed:

  $ ls -1 tools/perf/trace/beauty/*mount*sh
  tools/perf/trace/beauty/fsmount.sh
  tools/perf/trace/beauty/mount_flags.sh
  tools/perf/trace/beauty/move_mount_flags.sh
  $

This addresses this perf build warning.

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/mount.h include/uapi/linux/mount.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZbkMiB7ZcOsLP2V5@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-01-30 11:49:54 -03:00
Arnaldo Carvalho de Melo
21fdd8dd37 tools headers UAPI: Sync unistd.h to pick {list,stat}mount, lsm_{[gs]et_self_attr,list_modules} syscall numbers
To pick the changes in these csets:

  d8b0f54650 ("wire up syscalls for statmount/listmount")
  5f42375904 ("LSM: wireup Linux Security Module syscalls")

Used in some architectures to create syscall tables.

This addresses this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h

Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZbfMuAlUMRO9Hqa6@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-01-29 13:05:44 -03:00
Arnaldo Carvalho de Melo
e30dca91e5 tools headers UAPI: Sync kvm headers with the kernel sources
To pick the changes in:

  a5d3df8ae1 ("KVM: remove deprecated UAPIs")
  6d72283526 ("KVM x86/xen: add an override for PVCLOCK_TSC_STABLE_BIT")
  89ea60c2c7 ("KVM: x86: Add support for "protected VMs" that can utilize private memory")
  8dd2eee9d5 ("KVM: x86/mmu: Handle page fault for private memory")
  a7800aa80e ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
  5a475554db ("KVM: Introduce per-page memory attributes")
  16f95f3b95 ("KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace")
  bb58b90b1a ("KVM: Introduce KVM_SET_USER_MEMORY_REGION2")
  3f9cd0ca84 ("KVM: arm64: Allow userspace to get the writable masks for feature ID registers")

That automatically adds support for some new ioctls and remove a bunch
of deprecated ones.

This ends up making the new binary to forget about the deprecated one,
so when used in an older system it will not be able to resolve those
codes to strings.

  $ tools/perf/trace/beauty/kvm_ioctl.sh > before
  $ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
  $ tools/perf/trace/beauty/kvm_ioctl.sh > after
  $ diff -u before after
  --- before	2024-01-27 14:48:16.523014020 -0300
  +++ after	2024-01-27 14:48:24.183932866 -0300
  @@ -14,6 +14,7 @@
   	[0x46] = "SET_USER_MEMORY_REGION",
   	[0x47] = "SET_TSS_ADDR",
   	[0x48] = "SET_IDENTITY_MAP_ADDR",
  +	[0x49] = "SET_USER_MEMORY_REGION2",
   	[0x60] = "CREATE_IRQCHIP",
   	[0x61] = "IRQ_LINE",
   	[0x62] = "GET_IRQCHIP",
  @@ -22,14 +23,8 @@
   	[0x65] = "GET_PIT",
   	[0x66] = "SET_PIT",
   	[0x67] = "IRQ_LINE_STATUS",
  -	[0x69] = "ASSIGN_PCI_DEVICE",
   	[0x6a] = "SET_GSI_ROUTING",
  -	[0x70] = "ASSIGN_DEV_IRQ",
   	[0x71] = "REINJECT_CONTROL",
  -	[0x72] = "DEASSIGN_PCI_DEVICE",
  -	[0x73] = "ASSIGN_SET_MSIX_NR",
  -	[0x74] = "ASSIGN_SET_MSIX_ENTRY",
  -	[0x75] = "DEASSIGN_DEV_IRQ",
   	[0x76] = "IRQFD",
   	[0x77] = "CREATE_PIT2",
   	[0x78] = "SET_BOOT_CPU_ID",
  @@ -66,7 +61,6 @@
   	[0x9f] = "GET_VCPU_EVENTS",
   	[0xa0] = "SET_VCPU_EVENTS",
   	[0xa3] = "ENABLE_CAP",
  -	[0xa4] = "ASSIGN_SET_INTX_MASK",
   	[0xa5] = "SIGNAL_MSI",
   	[0xa6] = "GET_XCRS",
   	[0xa7] = "SET_XCRS",
  @@ -97,6 +91,8 @@
   	[0xcd] = "SET_SREGS2",
   	[0xce] = "GET_STATS_FD",
   	[0xd0] = "XEN_HVM_EVTCHN_SEND",
  +	[0xd2] = "SET_MEMORY_ATTRIBUTES",
  +	[0xd4] = "CREATE_GUEST_MEMFD",
   	[0xe0] = "CREATE_DEVICE",
   	[0xe1] = "SET_DEVICE_ATTR",
   	[0xe2] = "GET_DEVICE_ATTR",
  $

This silences these perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
    diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jing Zhang <jingzhangos@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/lkml/ZbVLbkngp4oq13qN@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-01-27 15:29:47 -03:00
Jakub Kicinski
92046e83c0 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZbQV+gAKCRDbK58LschI
 g2OeAP0VvhZS9SPiS+/AMAFuw2W1BkMrFNbfBTc3nzRnyJSmNAD+NG4CLLJvsKI9
 olu7VC20B8pLTGLUGIUSwqnjOC+Kkgc=
 =wVMl
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-01-26

We've added 107 non-merge commits during the last 4 day(s) which contain
a total of 101 files changed, 6009 insertions(+), 1260 deletions(-).

The main changes are:

1) Add BPF token support to delegate a subset of BPF subsystem
   functionality from privileged system-wide daemons such as systemd
   through special mount options for userns-bound BPF fs to a trusted
   & unprivileged application. With addressed changes from Christian
   and Linus' reviews, from Andrii Nakryiko.

2) Support registration of struct_ops types from modules which helps
   projects like fuse-bpf that seeks to implement a new struct_ops type,
   from Kui-Feng Lee.

3) Add support for retrieval of cookies for perf/kprobe multi links,
   from Jiri Olsa.

4) Bigger batch of prep-work for the BPF verifier to eventually support
   preserving boundaries and tracking scalars on narrowing fills,
   from Maxim Mikityanskiy.

5) Extend the tc BPF flavor to support arbitrary TCP SYN cookies to help
   with the scenario of SYN floods, from Kuniyuki Iwashima.

6) Add code generation to inline the bpf_kptr_xchg() helper which
   improves performance when stashing/popping the allocated BPF objects,
   from Hou Tao.

7) Extend BPF verifier to track aligned ST stores as imprecise spilled
   registers, from Yonghong Song.

8) Several fixes to BPF selftests around inline asm constraints and
   unsupported VLA code generation, from Jose E. Marchesi.

9) Various updates to the BPF IETF instruction set draft document such
   as the introduction of conformance groups for instructions,
   from Dave Thaler.

10) Fix BPF verifier to make infinite loop detection in is_state_visited()
    exact to catch some too lax spill/fill corner cases,
    from Eduard Zingerman.

11) Refactor the BPF verifier pointer ALU check to allow ALU explicitly
    instead of implicitly for various register types, from Hao Sun.

12) Fix the flaky tc_redirect_dtime BPF selftest due to slowness
    in neighbor advertisement at setup time, from Martin KaFai Lau.

13) Change BPF selftests to skip callback tests for the case when the
    JIT is disabled, from Tiezhu Yang.

14) Add a small extension to libbpf which allows to auto create
    a map-in-map's inner map, from Andrey Grafin.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (107 commits)
  selftests/bpf: Add missing line break in test_verifier
  bpf, docs: Clarify definitions of various instructions
  bpf: Fix error checks against bpf_get_btf_vmlinux().
  bpf: One more maintainer for libbpf and BPF selftests
  selftests/bpf: Incorporate LSM policy to token-based tests
  selftests/bpf: Add tests for LIBBPF_BPF_TOKEN_PATH envvar
  libbpf: Support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar
  selftests/bpf: Add tests for BPF object load with implicit token
  selftests/bpf: Add BPF object loading tests with explicit token passing
  libbpf: Wire up BPF token support at BPF object level
  libbpf: Wire up token_fd into feature probing logic
  libbpf: Move feature detection code into its own file
  libbpf: Further decouple feature checking logic from bpf_object
  libbpf: Split feature detectors definitions from cached results
  selftests/bpf: Utilize string values for delegate_xxx mount options
  bpf: Support symbolic BPF FS delegation mount options
  bpf: Fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS
  bpf,selinux: Allocate bpf_security_struct per BPF token
  selftests/bpf: Add BPF token-enabled tests
  libbpf: Add BPF token support to bpf_prog_load() API
  ...
====================

Link: https://lore.kernel.org/r/20240126215710.19855-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26 21:08:22 -08:00
Arnaldo Carvalho de Melo
1233d1d54b tools headers UAPI: Update tools's copy of drm.h headers to pick DRM_IOCTL_MODE_CLOSEFB
Picking the changes from:

  8570c27932 ("drm/syncobj: Add deadline support for syncobj waits")
  9724ed6c1b ("drm: Introduce DRM_CLIENT_CAP_CURSOR_PLANE_HOTSPOT")
  e4d983acff ("drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP")
  d208d87566 ("drm: introduce CLOSEFB IOCTL")
  afa5cf3175 ("drm/i915/uapi: fix typos/spellos and punctuation")

Addressing these perf build warnings:

  Warning: Kernel ABI header differences:

Now 'perf trace' and other code that might use the
tools/perf/trace/beauty autogenerated tables will be able to translate
this new ioctl command into a string:

  $ tools/perf/trace/beauty/drm_ioctl.sh > before
  $ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h
  $ tools/perf/trace/beauty/drm_ioctl.sh > after
  $ diff -u before after
  --- before	2024-01-26 10:54:23.486381862 -0300
  +++ after	2024-01-26 10:54:35.767902442 -0300
  @@ -109,6 +109,7 @@
   	[0xCD] = "SYNCOBJ_TIMELINE_SIGNAL",
   	[0xCE] = "MODE_GETFB2",
   	[0xCF] = "SYNCOBJ_EVENTFD",
  +	[0xD0] = "MODE_CLOSEFB",
   	[DRM_COMMAND_BASE + 0x00] = "I915_INIT",
   	[DRM_COMMAND_BASE + 0x01] = "I915_FLUSH",
   	[DRM_COMMAND_BASE + 0x02] = "I915_FLIP",
  $

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rob Clark <robdclark@chromium.org>
Cc: Simon Ser <contact@emersion.fr>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Zack Rusin <zack.rusin@broadcom.com>
Link: https://lore.kernel.org/lkml/ZbPIN9Dcc5AM0uxo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-01-26 11:57:34 -03:00
Arnaldo Carvalho de Melo
b0dc992155 tools headers UAPI: Sync linux/fcntl.h with the kernel sources
To get the changes in:

  8a924db2d7 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface function")

That don't add anything that is handled by existing hard coded tables or
table generation scripts.

This silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stefan Berger <stefanb@linux.ibm.com>
Link: https://lore.kernel.org/lkml/ZbJv9fGF_k2xXEdr@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-01-26 10:51:48 -03:00
Arnaldo Carvalho de Melo
690811f012 tools headers uapi: Sync linux/stat.h with the kernel sources to pick STATX_MNT_ID_UNIQUE
To pick the changes from:

  98d2b43081 ("add unique mount ID")

That add STATX_MNT_ID_UNIQUE that was manually added to
tools/perf/trace/beauty/statx.c, at some point this should move to the
shell based automated way.

This silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/stat.h include/uapi/linux/stat.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZbJq08s19890WDo-@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-01-26 10:51:48 -03:00
Andrii Nakryiko
caf8f28e03 bpf: Add BPF token support to BPF_PROG_LOAD command
Add basic support of BPF token to BPF_PROG_LOAD. BPF_F_TOKEN_FD flag
should be set in prog_flags field when providing prog_token_fd.

Wire through a set of allowed BPF program types and attach types,
derived from BPF FS at BPF token creation time. Then make sure we
perform bpf_token_capable() checks everywhere where it's relevant.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-7-andrii@kernel.org
2024-01-24 16:21:01 -08:00
Andrii Nakryiko
9ea7c4bf17 bpf: Add BPF token support to BPF_BTF_LOAD command
Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading
through delegated BPF token. BPF_F_TOKEN_FD flag has to be specified
when passing BPF token FD. Given BPF_BTF_LOAD command didn't have flags
field before, we also add btf_flags field.

BTF loading is a pretty straightforward operation, so as long as BPF
token is created with allow_cmds granting BPF_BTF_LOAD command, kernel
proceeds to parsing BTF data and creating BTF object.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-6-andrii@kernel.org
2024-01-24 16:21:01 -08:00
Andrii Nakryiko
a177fc2bf6 bpf: Add BPF token support to BPF_MAP_CREATE command
Allow providing token_fd for BPF_MAP_CREATE command to allow controlled
BPF map creation from unprivileged process through delegated BPF token.
New BPF_F_TOKEN_FD flag is added to specify together with BPF token FD
for BPF_MAP_CREATE command.

Wire through a set of allowed BPF map types to BPF token, derived from
BPF FS at BPF token creation time. This, in combination with allowed_cmds
allows to create a narrowly-focused BPF token (controlled by privileged
agent) with a restrictive set of BPF maps that application can attempt
to create.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-5-andrii@kernel.org
2024-01-24 16:21:01 -08:00
Andrii Nakryiko
35f96de041 bpf: Introduce BPF token object
Add new kind of BPF kernel object, BPF token. BPF token is meant to
allow delegating privileged BPF functionality, like loading a BPF
program or creating a BPF map, from privileged process to a *trusted*
unprivileged process, all while having a good amount of control over which
privileged operations could be performed using provided BPF token.

This is achieved through mounting BPF FS instance with extra delegation
mount options, which determine what operations are delegatable, and also
constraining it to the owning user namespace (as mentioned in the
previous patch).

BPF token itself is just a derivative from BPF FS and can be created
through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF
FS FD, which can be attained through open() API by opening BPF FS mount
point. Currently, BPF token "inherits" delegated command, map types,
prog type, and attach type bit sets from BPF FS as is. In the future,
having an BPF token as a separate object with its own FD, we can allow
to further restrict BPF token's allowable set of things either at the
creation time or after the fact, allowing the process to guard itself
further from unintentionally trying to load undesired kind of BPF
programs. But for now we keep things simple and just copy bit sets as is.

When BPF token is created from BPF FS mount, we take reference to the
BPF super block's owning user namespace, and then use that namespace for
checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
capabilities that are normally only checked against init userns (using
capable()), but now we check them using ns_capable() instead (if BPF
token is provided). See bpf_token_capable() for details.

Such setup means that BPF token in itself is not sufficient to grant BPF
functionality. User namespaced process has to *also* have necessary
combination of capabilities inside that user namespace. So while
previously CAP_BPF was useless when granted within user namespace, now
it gains a meaning and allows container managers and sys admins to have
a flexible control over which processes can and need to use BPF
functionality within the user namespace (i.e., container in practice).
And BPF FS delegation mount options and derived BPF tokens serve as
a per-container "flag" to grant overall ability to use bpf() (plus further
restrict on which parts of bpf() syscalls are treated as namespaced).

Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
within the BPF FS owning user namespace, rounding up the ns_capable()
story of BPF token. Also creating BPF token in init user namespace is
currently not supported, given BPF token doesn't have any effect in init
user namespace anyways.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-4-andrii@kernel.org
2024-01-24 16:21:01 -08:00
Kui-Feng Lee
fcc2c1fb06 bpf: pass attached BTF to the bpf_struct_ops subsystem
Pass the fd of a btf from the userspace to the bpf() syscall, and then
convert the fd into a btf. The btf is generated from the module that
defines the target BPF struct_ops type.

In order to inform the kernel about the module that defines the target
struct_ops type, the userspace program needs to provide a btf fd for the
respective module's btf. This btf contains essential information on the
types defined within the module, including the target struct_ops type.

A btf fd must be provided to the kernel for struct_ops maps and for the bpf
programs attached to those maps.

In the case of the bpf programs, the attach_btf_obj_fd parameter is passed
as part of the bpf_attr and is converted into a btf. This btf is then
stored in the prog->aux->attach_btf field. Here, it just let the verifier
access attach_btf directly.

In the case of struct_ops maps, a btf fd is passed as value_type_btf_obj_fd
of bpf_attr. The bpf_struct_ops_map_alloc() function converts the fd to a
btf and stores it as st_map->btf. A flag BPF_F_VTYPE_BTF_OBJ_FD is added
for map_flags to indicate that the value of value_type_btf_obj_fd is set.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240119225005.668602-9-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-23 16:37:44 -08:00
Kui-Feng Lee
1338b93346 bpf: pass btf object id in bpf_map_info.
Include btf object id (btf_obj_id) in bpf_map_info so that tools (ex:
bpftools struct_ops dump) know the correct btf from the kernel to look up
type information of struct_ops types.

Since struct_ops types can be defined and registered in a module. The
type information of a struct_ops type are defined in the btf of the
module defining it.  The userspace tools need to know which btf is for
the module defining a struct_ops type.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240119225005.668602-7-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-23 16:37:44 -08:00
Jiri Olsa
9fd112b1f8 bpf: Store cookies in kprobe_multi bpf_link_info data
Storing cookies in kprobe_multi bpf_link_info data. The cookies
field is optional and if provided it needs to be an array of
__u64 with kprobe_multi.count length.

Acked-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240119110505.400573-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23 16:05:27 -08:00
Jiri Olsa
d5c16492c6 bpf: Add cookie to perf_event bpf_link_info records
At the moment we don't store cookie for perf_event probes,
while we do that for the rest of the probes.

Adding cookie fields to struct bpf_link_info perf event
probe records:

  perf_event.uprobe
  perf_event.kprobe
  perf_event.tracepoint
  perf_event.perf_event

And the code to store that in bpf_link_info struct.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20240119110505.400573-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23 16:05:27 -08:00
Daniel Borkmann
091f2bf60d bpf: Sync uapi bpf.h header for the tooling infra
Both commit 91051f0039 ("tcp: Dump bound-only sockets in inet_diag.")
and commit 985b8ea9ec7e ("bpf, docs: Fix bpf_redirect_peer header doc")
missed the tooling header sync. Fix it.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23 14:43:12 -08:00
Linus Torvalds
9d64bf433c perf tools improvements and fixes for v6.8:
- Add Namhyung Kim as tools/perf/ co-maintainer, we're taking turns processing
   patches, switching roles from perf-tools to perf-tools-next at each Linux
   release.
 
 Data profiling:
 
 - Associate samples that identify loads and stores with data structures. This
   uses events available on Intel, AMD and others and DWARF info:
 
     # To get memory access samples in kernel for 1 second (on Intel)
     $ perf mem record -a -K --ldlat=4 -- sleep 1
 
     # Similar for the AMD (but it requires 6.3+ kernel for BPF filters)
     $ perf mem record -a --filter 'mem_op == load || mem_op == store, ip > 0x8000000000000000' -- sleep 1
 
   Then, amongst several modes of post processing, one can do things like:
 
     $ perf report -s type,typeoff --hierarchy --group --stdio
     ...
     #
     # Samples: 10K of events 'cpu/mem-loads,ldlat=4/P, cpu/mem-stores/P, dummy:u'
     # Event count (approx.): 602758064
     #
     #                    Overhead  Data Type / Data Type Offset
     # ...........................  ............................
     #
         26.09%   3.28%   0.00%     long unsigned int
            26.09%   3.28%   0.00%     long unsigned int +0 (no field)
         18.48%   0.73%   0.00%     struct page
            10.83%   0.02%   0.00%     struct page +8 (lru.next)
             3.90%   0.28%   0.00%     struct page +0 (flags)
             3.45%   0.06%   0.00%     struct page +24 (mapping)
             0.25%   0.28%   0.00%     struct page +48 (_mapcount.counter)
             0.02%   0.06%   0.00%     struct page +32 (index)
             0.02%   0.00%   0.00%     struct page +52 (_refcount.counter)
             0.02%   0.01%   0.00%     struct page +56 (memcg_data)
             0.00%   0.01%   0.00%     struct page +16 (lru.prev)
         15.37%  17.54%   0.00%     (stack operation)
            15.37%  17.54%   0.00%     (stack operation) +0 (no field)
         11.71%  50.27%   0.00%     (unknown)
            11.71%  50.27%   0.00%     (unknown) +0 (no field)
 
     $ perf annotate --data-type
     ...
     Annotate type: 'struct cfs_rq' in [kernel.kallsyms] (13 samples):
     ============================================================================
         samples     offset       size  field
              13          0        640  struct cfs_rq         {
               2          0         16      struct load_weight       load {
               2          0          8          unsigned long        weight;
               0          8          4          u32  inv_weight;
                                            };
               0         16          8      unsigned long    runnable_weight;
               0         24          4      unsigned int     nr_running;
               1         28          4      unsigned int     h_nr_running;
     ...
 
     $ perf annotate --data-type=page --group
     Annotate type: 'struct page' in [kernel.kallsyms] (480 samples):
      event[0] = cpu/mem-loads,ldlat=4/P
      event[1] = cpu/mem-stores/P
      event[2] = dummy:u
     ===================================================================================
              samples  offset  size  field
     447  33        0       0    64  struct page     {
     108   8        0       0     8	 long unsigned int  flags;
     319  13        0       8    40	 union       {
     319  13        0       8    40          struct          {
     236   2        0       8    16              union       {
     236   2        0       8    16                  struct list_head       lru {
     236   1        0       8     8                      struct list_head*  next;
       0   1        0      16     8                      struct list_head*  prev;
                                                     };
     236   2        0       8    16                  struct          {
     236   1        0       8     8                      void*      __filler;
       0   1        0      16     4                      unsigned int       mlock_count;
                                                     };
     236   2        0       8    16                  struct list_head       buddy_list {
     236   1        0       8     8                      struct list_head*  next;
       0   1        0      16     8                      struct list_head*  prev;
                                                     };
     236   2        0       8    16                  struct list_head       pcp_list {
     236   1        0       8     8                      struct list_head*  next;
       0   1        0      16     8                      struct list_head*  prev;
                                                     };
                                                 };
      82   4        0      24     8              struct address_space*      mapping;
       1   7        0      32     8              union       {
       1   7        0      32     8                  long unsigned int      index;
       1   7        0      32     8                  long unsigned int      share;
                                                 };
       0   0        0      40     8              long unsigned int  private;
                                                               };
 
   This uses the existing annotate code, calling objdump to do the disassembly,
   with improvements to avoid having this take too long, but longer term a
   switch to a disassembler library, possibly reusing code in the kernel will
   be pursued.
 
   This is the initial implementation, please use it and report impressions and
   bugs. Make sure the kernel-debuginfo packages match the running kernel. The
   'perf report' phase for non short perf.data files may take a while.
 
   There is a great article about it on LWN:
 
   https://lwn.net/Articles/955709/ - "Data-type profiling for perf"
 
   One last test I did while writing this text, on a AMD Ryzen 5950X, using a distro
   kernel, while doing a simple 'find /' on an otherwise idle system resulted in:
 
   # uname -r
   6.6.9-100.fc38.x86_64
   # perf -vv | grep BPF_
                    bpf: [ on  ]  # HAVE_LIBBPF_SUPPORT
          bpf_skeletons: [ on  ]  # HAVE_BPF_SKEL
   # rpm -qa | grep kernel-debuginfo
   kernel-debuginfo-common-x86_64-6.6.9-100.fc38.x86_64
   kernel-debuginfo-6.6.9-100.fc38.x86_64
   #
   # perf mem record -a --filter 'mem_op == load || mem_op == store, ip > 0x8000000000000000'
   ^C[ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 2.199 MB perf.data (2913 samples) ]
   #
   # ls -la perf.data
   -rw-------. 1 root root 2346486 Jan  9 18:36 perf.data
   # perf evlist
   ibs_op//
   dummy:u
   # perf evlist -v
   ibs_op//: type: 11, size: 136, config: 0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1
   dummy:u: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT, read_format: ID, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
   #
   # perf report -s type,typeoff --hierarchy --group --stdio
   # Total Lost Samples: 0
   #
   # Samples: 2K of events 'ibs_op//, dummy:u'
   # Event count (approx.): 1904553038
   #
   #            Overhead  Data Type / Data Type Offset
   # ...................  ............................
   #
       73.70%   0.00%     (unknown)
          73.70%   0.00%     (unknown) +0 (no field)
        3.01%   0.00%     long unsigned int
           3.00%   0.00%     long unsigned int +0 (no field)
           0.01%   0.00%     long unsigned int +2 (no field)
        2.73%   0.00%     struct task_struct
           1.71%   0.00%     struct task_struct +52 (on_cpu)
           0.38%   0.00%     struct task_struct +2104 (rcu_read_unlock_special.b.blocked)
           0.23%   0.00%     struct task_struct +2100 (rcu_read_lock_nesting)
           0.14%   0.00%     struct task_struct +2384 ()
           0.06%   0.00%     struct task_struct +3096 (signal)
           0.05%   0.00%     struct task_struct +3616 (cgroups)
           0.05%   0.00%     struct task_struct +2344 (active_mm)
           0.02%   0.00%     struct task_struct +46 (flags)
           0.02%   0.00%     struct task_struct +2096 (migration_disabled)
           0.01%   0.00%     struct task_struct +24 (__state)
           0.01%   0.00%     struct task_struct +3956 (mm_cid_active)
           0.01%   0.00%     struct task_struct +1048 (cpus_ptr)
           0.01%   0.00%     struct task_struct +184 (se.group_node.next)
           0.01%   0.00%     struct task_struct +20 (thread_info.cpu)
           0.00%   0.00%     struct task_struct +104 (on_rq)
           0.00%   0.00%     struct task_struct +2456 (pid)
        1.36%   0.00%     struct module
           0.59%   0.00%     struct module +952 (kallsyms)
           0.42%   0.00%     struct module +0 (state)
           0.23%   0.00%     struct module +8 (list.next)
           0.12%   0.00%     struct module +216 (syms)
        0.95%   0.00%     struct inode
           0.41%   0.00%     struct inode +40 (i_sb)
           0.22%   0.00%     struct inode +0 (i_mode)
           0.06%   0.00%     struct inode +76 (i_rdev)
           0.06%   0.00%     struct inode +56 (i_security)
   <SNIP>
 
 perf top/report:
 
 - Don't ignore job control, allowing control+Z + bg to work.
 
 - Add s390 raw data interpretation for PAI (Processor Activity Instrumentation)
   counters.
 
 perf archive:
 
 - Add new option '--all' to pack perf.data with DSOs.
 
 - Add new option '--unpack' to expand tarballs.
 
 Initialization speedups:
 
 - Lazily initialize zstd streams to save memory when not using it.
 
 - Lazily allocate/size mmap event copy.
 
 - Lazy load kernel symbols in 'perf record'.
 
 - Be lazier in allocating lost samples buffer in 'perf record'.
 
 - Don't synthesize BPF events when disabled via the command line (perf record --no-bpf-event).
 
 Assorted improvements:
 
 - Show note on AMD systems that the :p, :pp, :ppp and :P are all the same, as
   IBS (Instruction Based Sampling) is used and it is inherentely precise, not
   having levels of precision like in Intel systems.
 
 - When 'cycles' isn't available, fall back to the "task-clock" event when not
   system wide, not to 'cpu-clock'.
 
 - Add --debug-file option to redirect debug output, e.g.:
 
     $ perf --debug-file /tmp/perf.log record -v true
 
 - Shrink 'struct map' to under one cacheline by avoiding function pointers for
   selecting if addresses are identity or DSO relative, and using just a byte for
   some boolean struct members.
 
 - Resolve the arch specific strerrno just once to use in perf_env__arch_strerrno().
 
 - Reduce memory for recording PERF_RECORD_LOST_SAMPLES event.
 
 Assorted fixes:
 
 - Fix the default 'perf top' usage on Intel hybrid systems, now it starts with
   a browser showing the number of samples for Efficiency (cpu_atom/cycles/P) and
   Performance (cpu_core/cycles/P). This behaviour is similar on ARM64, with its
   respective set of big.LITTLE processors.
 
 - Fix segfault on build_mem_topology() error path.
 
 - Fix 'perf mem' error on hybrid related to availability of mem event in a PMU.
 
 - Fix missing reference count gets (map, maps) in the db-export code.
 
 - Avoid recursively taking env->bpf_progs.lock in the 'perf_env' code.
 
 - Use the newly introduced maps__for_each_map() to add missing locking around
   iteration of 'struct map' entries.
 
 - Parse NOTE segments until the build id is found, don't stop on the first one,
   ELF files may have several such NOTE segments.
 
 - Remove 'egrep' usage, its deprecated, use 'grep -E' instead.
 
 - Warn first about missing libelf, not libbpf, that depends on libelf.
 
 - Use alternative to 'find ... -printf' as this isn't supported in busybox.
 
 - Address python 3.6 DeprecationWarning for string scapes.
 
 - Fix memory leak in uniq() in libsubcmd.
 
 - Fix man page formatting for 'perf lock'
 
 - Fix some spelling mistakes.
 
 perf tests:
 
 - Fail shell tests that needs some symbol in perf itself if it is stripped.
   These tests check if a symbol is resolved, if some hot function is indeed
   detected by profiling, etc.
 
 - The 'perf test sigtrap' test is currently failing on PREEMPT_RT, skip it if
   sleeping spinlocks are detected (using BTF) and point to the mailing list
   discussion about it. This test is also being skipped on several architectures
   (powerpc, s390x, arm and aarch64) due to other pending issues with intruction
   breakpoints.
 
 - Adjust test case perf record offcpu profiling tests for s390.
 
 - Fix 'Setup struct perf_event_attr' fails on s390 on z/VM guest, addressing
   issues caused by the fallback from cycles to task-clock done in this release.
 
 - Fix mask for VG register in the user-regs test.
 
 - Use shellcheck on 'perf test' shell scripts automatically to make sure changes
   don't introduce things it flags as problematic.
 
 - Add option to change objdump binary and allow it to be set via 'perf config'.
 
 - Add basic 'perf script', 'perf list --json" and 'perf diff' tests.
 
 - Basic branch counter support.
 
 - Make DSO tests a suite rather than individual.
 
 - Remove atomics from test_loop to avoid test failures.
 
 - Fix call chain match on powerpc for the record+probe_libc_inet_pton test.
 
 - Improve Intel hybrid tests.
 
 Vendor event files (JSON):
 
 powerpc:
 
 - Update datasource event name to fix duplicate events on IBM's Power10.
 
 - Add PVN for HX-C2000 CPU with Power8 Architecture.
 
 Intel:
 
 - Alderlake/rocketlake metric fixes.
 
 - Update emeraldrapids events to v1.02.
 
 - Update icelakex events to v1.23.
 
 - Update sapphirerapids events to v1.17.
 
 - Add skx, clx, icx and spr upi bandwidth metric.
 
 AMD:
 
 - Add Zen 4 memory controller events.
 
 RISC-V:
 
 - Add StarFive Dubhe-80 and Dubhe-90 JSON files.
   https://www.starfivetech.com/en/site/cpu-u
 
 - Add T-HEAD C9xx JSON file.
   https://github.com/riscv-software-src/opensbi/blob/master/docs/platform/thead-c9xx.md
 
 ARM64:
 
 - Remove UTF-8 characters from cmn.json, that were causing build failure in some
   distros.
 
 - Add core PMU events and metrics for Ampere One X.
 
 - Rename Ampere One's BPU_FLUSH_MEM_FAULT to GPC_FLUSH_MEM_FAULT
 
 libperf:
 
 - Rename several perf_cpu_map constructor names to clarify what they really do.
 
 - Ditto for some other methods, coping with some issues in their semantics,
   like perf_cpu_map__empty() -> perf_cpu_map__has_any_cpu_or_is_empty().
 
 - Document perf_cpu_map__nr()'s behavior
 
 perf stat:
 
 - Exit if parse groups fails.
 
 - Combine the -A/--no-aggr and --no-merge options.
 
 - Fix help message for --metric-no-threshold option.
 
 Hardware tracing:
 
 ARM64 CoreSight:
 
 - Bump minimum OpenCSD version to ensure a bugfix is present.
 
 - Add 'T' itrace option for timestamp trace
 
 - Set start vm addr of exectable file to 0 and don't ignore first sample on the
   arm-cs-trace-disasm.py 'perf script'.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZZ3FpgAKCRCyPKLppCJ+
 Jz21AQDB93J4X05bwHJlRloN3KuA3LuwzvAQkwFoJSfFFMDnzgEAgbAMF1sANirP
 5UcGxVgqoXWdrp9pkMcGlcFc7jsz5gA=
 =SM26
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.8-1-2024-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Arnaldo Carvalho de Melo:
 "Add Namhyung Kim as tools/perf/ co-maintainer, we're taking turns
  processing patches, switching roles from perf-tools to perf-tools-next
  at each Linux release.

  Data profiling:

   - Associate samples that identify loads and stores with data
     structures. This uses events available on Intel, AMD and others and
     DWARF info:

       # To get memory access samples in kernel for 1 second (on Intel)
       $ perf mem record -a -K --ldlat=4 -- sleep 1

       # Similar for the AMD (but it requires 6.3+ kernel for BPF filters)
       $ perf mem record -a --filter 'mem_op == load || mem_op == store, ip > 0x8000000000000000' -- sleep 1

     Then, amongst several modes of post processing, one can do things like:

       $ perf report -s type,typeoff --hierarchy --group --stdio
       ...
       #
       # Samples: 10K of events 'cpu/mem-loads,ldlat=4/P, cpu/mem-stores/P, dummy:u'
       # Event count (approx.): 602758064
       #
       #                    Overhead  Data Type / Data Type Offset
       # ...........................  ............................
       #
           26.09%   3.28%   0.00%     long unsigned int
              26.09%   3.28%   0.00%     long unsigned int +0 (no field)
           18.48%   0.73%   0.00%     struct page
              10.83%   0.02%   0.00%     struct page +8 (lru.next)
               3.90%   0.28%   0.00%     struct page +0 (flags)
               3.45%   0.06%   0.00%     struct page +24 (mapping)
               0.25%   0.28%   0.00%     struct page +48 (_mapcount.counter)
               0.02%   0.06%   0.00%     struct page +32 (index)
               0.02%   0.00%   0.00%     struct page +52 (_refcount.counter)
               0.02%   0.01%   0.00%     struct page +56 (memcg_data)
               0.00%   0.01%   0.00%     struct page +16 (lru.prev)
           15.37%  17.54%   0.00%     (stack operation)
              15.37%  17.54%   0.00%     (stack operation) +0 (no field)
           11.71%  50.27%   0.00%     (unknown)
              11.71%  50.27%   0.00%     (unknown) +0 (no field)

       $ perf annotate --data-type
       ...
       Annotate type: 'struct cfs_rq' in [kernel.kallsyms] (13 samples):
       ============================================================================
           samples     offset       size  field
                13          0        640  struct cfs_rq         {
                 2          0         16      struct load_weight       load {
                 2          0          8          unsigned long        weight;
                 0          8          4          u32  inv_weight;
                                              };
                 0         16          8      unsigned long    runnable_weight;
                 0         24          4      unsigned int     nr_running;
                 1         28          4      unsigned int     h_nr_running;
       ...

       $ perf annotate --data-type=page --group
       Annotate type: 'struct page' in [kernel.kallsyms] (480 samples):
        event[0] = cpu/mem-loads,ldlat=4/P
        event[1] = cpu/mem-stores/P
        event[2] = dummy:u
       ===================================================================================
                samples  offset  size  field
       447  33        0       0    64  struct page     {
       108   8        0       0     8	 long unsigned int  flags;
       319  13        0       8    40	 union       {
       319  13        0       8    40          struct          {
       236   2        0       8    16              union       {
       236   2        0       8    16                  struct list_head       lru {
       236   1        0       8     8                      struct list_head*  next;
         0   1        0      16     8                      struct list_head*  prev;
                                                       };
       236   2        0       8    16                  struct          {
       236   1        0       8     8                      void*      __filler;
         0   1        0      16     4                      unsigned int       mlock_count;
                                                       };
       236   2        0       8    16                  struct list_head       buddy_list {
       236   1        0       8     8                      struct list_head*  next;
         0   1        0      16     8                      struct list_head*  prev;
                                                       };
       236   2        0       8    16                  struct list_head       pcp_list {
       236   1        0       8     8                      struct list_head*  next;
         0   1        0      16     8                      struct list_head*  prev;
                                                       };
                                                   };
        82   4        0      24     8              struct address_space*      mapping;
         1   7        0      32     8              union       {
         1   7        0      32     8                  long unsigned int      index;
         1   7        0      32     8                  long unsigned int      share;
                                                   };
         0   0        0      40     8              long unsigned int  private;
                                                                 };

     This uses the existing annotate code, calling objdump to do the
     disassembly, with improvements to avoid having this take too long,
     but longer term a switch to a disassembler library, possibly
     reusing code in the kernel will be pursued.

     This is the initial implementation, please use it and report
     impressions and bugs. Make sure the kernel-debuginfo packages match
     the running kernel. The 'perf report' phase for non short perf.data
     files may take a while.

     There is a great article about it on LWN:

       https://lwn.net/Articles/955709/ - "Data-type profiling for perf"

     One last test I did while writing this text, on a AMD Ryzen 5950X,
     using a distro kernel, while doing a simple 'find /' on an
     otherwise idle system resulted in:

     # uname -r
     6.6.9-100.fc38.x86_64
     # perf -vv | grep BPF_
                      bpf: [ on  ]  # HAVE_LIBBPF_SUPPORT
            bpf_skeletons: [ on  ]  # HAVE_BPF_SKEL
     # rpm -qa | grep kernel-debuginfo
     kernel-debuginfo-common-x86_64-6.6.9-100.fc38.x86_64
     kernel-debuginfo-6.6.9-100.fc38.x86_64
     #
     # perf mem record -a --filter 'mem_op == load || mem_op == store, ip > 0x8000000000000000'
     ^C[ perf record: Woken up 1 times to write data ]
     [ perf record: Captured and wrote 2.199 MB perf.data (2913 samples) ]
     #
     # ls -la perf.data
     -rw-------. 1 root root 2346486 Jan  9 18:36 perf.data
     # perf evlist
     ibs_op//
     dummy:u
     # perf evlist -v
     ibs_op//: type: 11, size: 136, config: 0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1
     dummy:u: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT, read_format: ID, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
     #
     # perf report -s type,typeoff --hierarchy --group --stdio
     # Total Lost Samples: 0
     #
     # Samples: 2K of events 'ibs_op//, dummy:u'
     # Event count (approx.): 1904553038
     #
     #            Overhead  Data Type / Data Type Offset
     # ...................  ............................
     #
         73.70%   0.00%     (unknown)
            73.70%   0.00%     (unknown) +0 (no field)
          3.01%   0.00%     long unsigned int
             3.00%   0.00%     long unsigned int +0 (no field)
             0.01%   0.00%     long unsigned int +2 (no field)
          2.73%   0.00%     struct task_struct
             1.71%   0.00%     struct task_struct +52 (on_cpu)
             0.38%   0.00%     struct task_struct +2104 (rcu_read_unlock_special.b.blocked)
             0.23%   0.00%     struct task_struct +2100 (rcu_read_lock_nesting)
             0.14%   0.00%     struct task_struct +2384 ()
             0.06%   0.00%     struct task_struct +3096 (signal)
             0.05%   0.00%     struct task_struct +3616 (cgroups)
             0.05%   0.00%     struct task_struct +2344 (active_mm)
             0.02%   0.00%     struct task_struct +46 (flags)
             0.02%   0.00%     struct task_struct +2096 (migration_disabled)
             0.01%   0.00%     struct task_struct +24 (__state)
             0.01%   0.00%     struct task_struct +3956 (mm_cid_active)
             0.01%   0.00%     struct task_struct +1048 (cpus_ptr)
             0.01%   0.00%     struct task_struct +184 (se.group_node.next)
             0.01%   0.00%     struct task_struct +20 (thread_info.cpu)
             0.00%   0.00%     struct task_struct +104 (on_rq)
             0.00%   0.00%     struct task_struct +2456 (pid)
          1.36%   0.00%     struct module
             0.59%   0.00%     struct module +952 (kallsyms)
             0.42%   0.00%     struct module +0 (state)
             0.23%   0.00%     struct module +8 (list.next)
             0.12%   0.00%     struct module +216 (syms)
          0.95%   0.00%     struct inode
             0.41%   0.00%     struct inode +40 (i_sb)
             0.22%   0.00%     struct inode +0 (i_mode)
             0.06%   0.00%     struct inode +76 (i_rdev)
             0.06%   0.00%     struct inode +56 (i_security)
     <SNIP>

  perf top/report:

   - Don't ignore job control, allowing control+Z + bg to work.

   - Add s390 raw data interpretation for PAI (Processor Activity
     Instrumentation) counters.

  perf archive:

   - Add new option '--all' to pack perf.data with DSOs.

   - Add new option '--unpack' to expand tarballs.

  Initialization speedups:

   - Lazily initialize zstd streams to save memory when not using it.

   - Lazily allocate/size mmap event copy.

   - Lazy load kernel symbols in 'perf record'.

   - Be lazier in allocating lost samples buffer in 'perf record'.

   - Don't synthesize BPF events when disabled via the command line
     (perf record --no-bpf-event).

  Assorted improvements:

   - Show note on AMD systems that the :p, :pp, :ppp and :P are all the
     same, as IBS (Instruction Based Sampling) is used and it is
     inherentely precise, not having levels of precision like in Intel
     systems.

   - When 'cycles' isn't available, fall back to the "task-clock" event
     when not system wide, not to 'cpu-clock'.

   - Add --debug-file option to redirect debug output, e.g.:

       $ perf --debug-file /tmp/perf.log record -v true

   - Shrink 'struct map' to under one cacheline by avoiding function
     pointers for selecting if addresses are identity or DSO relative,
     and using just a byte for some boolean struct members.

   - Resolve the arch specific strerrno just once to use in
     perf_env__arch_strerrno().

   - Reduce memory for recording PERF_RECORD_LOST_SAMPLES event.

  Assorted fixes:

   - Fix the default 'perf top' usage on Intel hybrid systems, now it
     starts with a browser showing the number of samples for Efficiency
     (cpu_atom/cycles/P) and Performance (cpu_core/cycles/P). This
     behaviour is similar on ARM64, with its respective set of
     big.LITTLE processors.

   - Fix segfault on build_mem_topology() error path.

   - Fix 'perf mem' error on hybrid related to availability of mem event
     in a PMU.

   - Fix missing reference count gets (map, maps) in the db-export code.

   - Avoid recursively taking env->bpf_progs.lock in the 'perf_env'
     code.

   - Use the newly introduced maps__for_each_map() to add missing
     locking around iteration of 'struct map' entries.

   - Parse NOTE segments until the build id is found, don't stop on the
     first one, ELF files may have several such NOTE segments.

   - Remove 'egrep' usage, its deprecated, use 'grep -E' instead.

   - Warn first about missing libelf, not libbpf, that depends on
     libelf.

   - Use alternative to 'find ... -printf' as this isn't supported in
     busybox.

   - Address python 3.6 DeprecationWarning for string scapes.

   - Fix memory leak in uniq() in libsubcmd.

   - Fix man page formatting for 'perf lock'

   - Fix some spelling mistakes.

  perf tests:

   - Fail shell tests that needs some symbol in perf itself if it is
     stripped. These tests check if a symbol is resolved, if some hot
     function is indeed detected by profiling, etc.

   - The 'perf test sigtrap' test is currently failing on PREEMPT_RT,
     skip it if sleeping spinlocks are detected (using BTF) and point to
     the mailing list discussion about it. This test is also being
     skipped on several architectures (powerpc, s390x, arm and aarch64)
     due to other pending issues with intruction breakpoints.

   - Adjust test case perf record offcpu profiling tests for s390.

   - Fix 'Setup struct perf_event_attr' fails on s390 on z/VM guest,
     addressing issues caused by the fallback from cycles to task-clock
     done in this release.

   - Fix mask for VG register in the user-regs test.

   - Use shellcheck on 'perf test' shell scripts automatically to make
     sure changes don't introduce things it flags as problematic.

   - Add option to change objdump binary and allow it to be set via
     'perf config'.

   - Add basic 'perf script', 'perf list --json" and 'perf diff' tests.

   - Basic branch counter support.

   - Make DSO tests a suite rather than individual.

   - Remove atomics from test_loop to avoid test failures.

   - Fix call chain match on powerpc for the record+probe_libc_inet_pton
     test.

   - Improve Intel hybrid tests.

  Vendor event files (JSON):

  powerpc:

   - Update datasource event name to fix duplicate events on IBM's
     Power10.

   - Add PVN for HX-C2000 CPU with Power8 Architecture.

  Intel:

   - Alderlake/rocketlake metric fixes.

   - Update emeraldrapids events to v1.02.

   - Update icelakex events to v1.23.

   - Update sapphirerapids events to v1.17.

   - Add skx, clx, icx and spr upi bandwidth metric.

  AMD:

   - Add Zen 4 memory controller events.

  RISC-V:

   - Add StarFive Dubhe-80 and Dubhe-90 JSON files.
       https://www.starfivetech.com/en/site/cpu-u

   - Add T-HEAD C9xx JSON file.
       https://github.com/riscv-software-src/opensbi/blob/master/docs/platform/thead-c9xx.md

  ARM64:

   - Remove UTF-8 characters from cmn.json, that were causing build
     failure in some distros.

   - Add core PMU events and metrics for Ampere One X.

   - Rename Ampere One's BPU_FLUSH_MEM_FAULT to GPC_FLUSH_MEM_FAULT

  libperf:

   - Rename several perf_cpu_map constructor names to clarify what they
     really do.

   - Ditto for some other methods, coping with some issues in their
     semantics, like perf_cpu_map__empty() ->
     perf_cpu_map__has_any_cpu_or_is_empty().

   - Document perf_cpu_map__nr()'s behavior

  perf stat:

   - Exit if parse groups fails.

   - Combine the -A/--no-aggr and --no-merge options.

   - Fix help message for --metric-no-threshold option.

  Hardware tracing:

  ARM64 CoreSight:

   - Bump minimum OpenCSD version to ensure a bugfix is present.

   - Add 'T' itrace option for timestamp trace

   - Set start vm addr of exectable file to 0 and don't ignore first
     sample on the arm-cs-trace-disasm.py 'perf script'"

* tag 'perf-tools-for-v6.8-1-2024-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (179 commits)
  MAINTAINERS: Add Namhyung as tools/perf/ co-maintainer
  perf test: test case 'Setup struct perf_event_attr' fails on s390 on z/vm
  perf db-export: Fix missing reference count get in call_path_from_sample()
  perf tests: Add perf script test
  libsubcmd: Fix memory leak in uniq()
  perf TUI: Don't ignore job control
  perf vendor events intel: Update sapphirerapids events to v1.17
  perf vendor events intel: Update icelakex events to v1.23
  perf vendor events intel: Update emeraldrapids events to v1.02
  perf vendor events intel: Alderlake/rocketlake metric fixes
  perf x86 test: Add hybrid test for conflicting legacy/sysfs event
  perf x86 test: Update hybrid expectations
  perf vendor events amd: Add Zen 4 memory controller events
  perf stat: Fix hard coded LL miss units
  perf record: Reduce memory for recording PERF_RECORD_LOST_SAMPLES event
  perf env: Avoid recursively taking env->bpf_progs.lock
  perf annotate: Add --insn-stat option for debugging
  perf annotate: Add --type-stat option for debugging
  perf annotate: Support event group display
  perf annotate: Add --data-type option
  ...
2024-01-19 14:25:23 -08:00
Linus Torvalds
7912a6391f sound updates for 6.8-rc1
It was a clam development cycle.  There were an ALSA core extension
 for subformat PCM bits and a few ASoC core changes to support N:M
 mappings, while the most of remaining changes are driver-specific.
 
 Core:
 - API extensions for properly limiting PCM format bits via subformat
 - Enhanced support for N:M CPU:CODEC mappings in the core and in
   audio-graph-card2
 
 ASoC:
 - Lots of SOF updates: fallback support to older IPC versions,
   notification on control changes with IPC4.
   Also supports for ACPI parse for the ES83xx driver that reduces
   quirks.
 - Device tree support for describing parts of the card which can be
   active over suspend (for very low power playback or wake word use
   cases)
 - Support for more AMD and Intel systems, NXP i.MX8m MICFIL, Qualcomm
   SM8250, SM8550, SM8650 and X1E80100
 - Drop of Freescale MPC8610 code that is no longer supported
 
 HD-audio:
 - More CS35L41 codec extensions for Dell, HP and Lenovo models
 - TAS2781 codec extensions for Lenovo and co
 - New PCM subformat supports
 
 Others:
 - More enhancement for Scarlett2 USB mixer support
 - Various kselftest fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmWfzJgOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE/NHg//VT0JRZDKqAZMUIaFZswUIKgoUy9fY9WaePO+
 db38uFzjuTC1ZN1PIDFnU0bQ2uL7IYF/xnceqitk+G76+Wt/pcDP/1hX+F9UhsVd
 wgRrihBeNRtSmPku8cwKu+duD/46439JyWo34PdHm8FxNoyilBx6fIApk6JqB57V
 HYU3ZFtQE84s8TmPHefasLeewY5thFtWYXLAaEZR5oohyXUTN0Np7h7vG8nchh1F
 zZOwoQ+nBo607PwUoXd0BFYkcuXwHlK4vLVmAM05KPaH1Q/kesEecYMhIVbnDHOj
 a4caMz+/tMbNbw3/GRsg8HgIeiyp1NoyC4LAqufa+Pj6BgPREVra5j8XoxAnCxXr
 8X8EDtBds6frMzqtQyNvTkyRCf3Iki8fhz60Re5nEkoXLcv34E3kleQDLG/FnHqC
 qeH0J3FEed84Gf6KrnpjkPHLFRx5ZKyahOHZ7Xc76fUYMCwvczkc5CKeG2EoivE4
 koEkhlQU1gnNyjNTTi4JchWis+EZG/oNA91eud1lMDm320lFJmxdZ5z31xZubVvs
 WTcMStgHCDPIKOeSBuwBCYFwugMtV/o/ejE567E4bxVC5ZA/zbxyvpxU9iDAjUNK
 T0JdPf/KKy1YJiNe9xuJn9/1ZpV6BXFCl7b7wILV+ZbGduOczoMCEH5T7dwAmZqq
 /lQtT/8=
 =yVxc
 -----END PGP SIGNATURE-----

Merge tag 'sound-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound updates from Takashi Iwai:
 "It was a calm development cycle. There were an ALSA core extension for
  subformat PCM bits and a few ASoC core changes to support N:M
  mappings, while the most of remaining changes are driver-specific.

  Core:
   - API extensions for properly limiting PCM format bits via subformat
   - Enhanced support for N:M CPU:CODEC mappings in the core and in
     audio-graph-card2

  ASoC:
   - Lots of SOF updates: fallback support to older IPC versions,
     notification on control changes with IPC4. Also supports for ACPI
     parse for the ES83xx driver that reduces quirks.
   - Device tree support for describing parts of the card which can be
     active over suspend (for very low power playback or wake word use
     cases)
   - Support for more AMD and Intel systems, NXP i.MX8m MICFIL, Qualcomm
     SM8250, SM8550, SM8650 and X1E80100
   - Drop of Freescale MPC8610 code that is no longer supported

  HD-audio:
   - More CS35L41 codec extensions for Dell, HP and Lenovo models
   - TAS2781 codec extensions for Lenovo and co
   - New PCM subformat supports

  Others:
   - More enhancement for Scarlett2 USB mixer support
   - Various kselftest fixes"

* tag 'sound-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (337 commits)
  kselftest/alsa - conf: Stringify the printed errno in sysfs_get()
  kselftest/alsa - mixer-test: Fix the print format specifier warning
  kselftest/alsa - mixer-test: Fix the print format specifier warning
  kselftest/alsa - mixer-test: fix the number of parameters to ksft_exit_fail_msg()
  ALSA: hda/tas2781: annotate calibration data endianness
  ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP Envy X360 13-ay0xxx
  ALSA: hda/conexant: Fix headset auto detect fail in cx8070 and SN6140
  ALSA: ac97: fix build regression
  ALSA: hda: cs35l41: Support more HP models without _DSD
  ALSA: hda/tas2781: add fixup for Lenovo 14ARB7
  ALSA: hda/tas2781: add TAS2563 support for 14ARB7
  ALSA: hda/tas2781: add configurable global i2c address
  ALSA: hda/tas2781: add ptrs to calibration functions
  ALSA: hda: Add driver properties for cs35l41 for Lenovo Legion Slim 7 Gen 8 serie
  ALSA: hda/realtek: enable SND_PCI_QUIRK for Lenovo Legion Slim 7 Gen 8 (2023) serie
  ALSA: hda/tas2781: configure the amp after firmware load
  ALSA: mark all struct bus_type as const
  ASoC: pxa: sspa: Don't select SND_ARM
  ASoC: rt5663: cancel the work when system suspends
  ALSA: scarlett2: Add PCM Input Switch for Solo Gen 4
  ...
2024-01-12 11:55:28 -08:00
Linus Torvalds
3e7aeb78ab Networking changes for 6.8.
Core & protocols
 ----------------
 
  - Analyze and reorganize core networking structs (socks, netdev,
    netns, mibs) to optimize cacheline consumption and set up
    build time warnings to safeguard against future header changes.
    This improves TCP performances with many concurrent connections
    up to 40%.
 
  - Add page-pool netlink-based introspection, exposing the
    memory usage and recycling stats. This helps indentify
    bad PP users and possible leaks.
 
  - Refine TCP/DCCP source port selection to no longer favor even
    source port at connect() time when IP_LOCAL_PORT_RANGE is set.
    This lowers the time taken by connect() for hosts having
    many active connections to the same destination.
 
  - Refactor the TCP bind conflict code, shrinking related socket
    structs.
 
  - Refactor TCP SYN-Cookie handling, as a preparation step to
    allow arbitrary SYN-Cookie processing via eBPF.
 
  - Tune optmem_max for 0-copy usage, increasing the default value
    to 128KB and namespecifying it.
 
  - Allow coalescing for cloned skbs coming from page pools, improving
    RX performances with some common configurations.
 
  - Reduce extension header parsing overhead at GRO time.
 
  - Add bridge MDB bulk deletion support, allowing user-space to
    request the deletion of matching entries.
 
  - Reorder nftables struct members, to keep data accessed by the
    datapath first.
 
  - Introduce TC block ports tracking and use. This allows supporting
    multicast-like behavior at the TC layer.
 
  - Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and
    classifiers (RSVP and tcindex).
 
  - More data-race annotations.
 
  - Extend the diag interface to dump TCP bound-only sockets.
 
  - Conditional notification of events for TC qdisc class and actions.
 
  - Support for WPAN dynamic associations with nearby devices, to form
    a sub-network using a specific PAN ID.
 
  - Implement SMCv2.1 virtual ISM device support.
 
  - Add support for Batman-avd mulicast packet type.
 
 BPF
 ---
 
  - Tons of verifier improvements:
    - BPF register bounds logic and range support along with a large
      test suite
    - log improvements
    - complete precision tracking support for register spills
    - track aligned STACK_ZERO cases as imprecise spilled registers. It
      improves the verifier "instructions processed" metric from single
      digit to 50-60% for some programs
    - support for user's global BPF subprogram arguments with few
      commonly requested annotations for a better developer experience
    - support tracking of BPF_JNE which helps cases when the compiler
      transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the
      like
    - several fixes
 
  - Add initial TX metadata implementation for AF_XDP with support in
    mlx5 and stmmac drivers. Two types of offloads are supported right
    now, that is, TX timestamp and TX checksum offload.
 
  - Fix kCFI bugs in BPF all forms of indirect calls from BPF into
    kernel and from kernel into BPF work with CFI enabled. This allows
    BPF to work with CONFIG_FINEIBT=y.
 
  - Change BPF verifier logic to validate global subprograms lazily
    instead of unconditionally before the main program, so they can be
    guarded using BPF CO-RE techniques.
 
  - Support uid/gid options when mounting bpffs.
 
  - Add a new kfunc which acquires the associated cgroup of a task
    within a specific cgroup v1 hierarchy where the latter is identified
    by its id.
 
  - Extend verifier to allow bpf_refcount_acquire() of a map value field
    obtained via direct load which is a use-case needed in sched_ext.
 
  - Add BPF link_info support for uprobe multi link along with bpftool
    integration for the latter.
 
  - Support for VLAN tag in XDP hints.
 
  - Remove deprecated bpfilter kernel leftovers given the project
    is developed in user-space (https://github.com/facebook/bpfilter).
 
 Misc
 ----
 
  - Support for parellel TC self-tests execution.
 
  - Increase MPTCP self-tests coverage.
 
  - Updated the bridge documentation, including several so-far
    undocumented features.
 
  - Convert all the net self-tests to run in unique netns, to
    avoid random failures due to conflict and allow concurrent
    runs.
 
  - Add TCP-AO self-tests.
 
  - Add kunit tests for both cfg80211 and mac80211.
 
  - Autogenerate Netlink families documentation from YAML spec.
 
  - Add yml-gen support for fixed headers and recursive nests, the
    tool can now generate user-space code for all genetlink families
    for which we have specs.
 
  - A bunch of additional module descriptions fixes.
 
  - Catch incorrect freeing of pages belonging to a page pool.
 
 Driver API
 ----------
 
  - Rust abstractions for network PHY drivers; do not cover yet the
    full C API, but already allow implementing functional PHY drivers
    in rust.
 
  - Introduce queue and NAPI support in the netdev Netlink interface,
    allowing complete access to the device <> NAPIs <> queues
    relationship.
 
  - Introduce notifications filtering for devlink to allow control
    application scale to thousands of instances.
 
  - Improve PHY validation, requesting rate matching information for
    each ethtool link mode supported by both the PHY and host.
 
  - Add support for ethtool symmetric-xor RSS hash.
 
  - ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD
    platform.
 
  - Expose pin fractional frequency offset value over new DPLL generic
    netlink attribute.
 
  - Convert older drivers to platform remove callback returning void.
 
  - Add support for PHY package MMD read/write.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Octeon CN10K devices
    - Broadcom 5760X P7
    - Qualcomm SM8550 SoC
    - Texas Instrument DP83TG720S PHY
 
  - Bluetooth:
    - IMC Networks Bluetooth radio
 
 Removed
 -------
 
  - WiFi:
    - libertas 16-bit PCMCIA support
    - Atmel at76c50x drivers
    - HostAP ISA/PCMCIA style 802.11b driver
    - zd1201 802.11b USB dongles
    - Orinoco ISA/PCMCIA 802.11b driver
    - Aviator/Raytheon driver
    - Planet WL3501 driver
    - RNDIS USB 802.11b driver
 
 Drivers
 -------
 
  - Ethernet high-speed NICs:
    - Intel (100G, ice, idpf):
      - allow one by one port representors creation and removal
      - add temperature and clock information reporting
      - add get/set for ethtool's header split ringparam
      - add again FW logging
      - adds support switchdev hardware packet mirroring
      - iavf: implement symmetric-xor RSS hash
      - igc: add support for concurrent physical and free-running timers
      - i40e: increase the allowable descriptors
    - nVidia/Mellanox:
      - Preparation for Socket-Direct multi-dev netdev. That will allow
        in future releases combining multiple PFs devices attached to
        different NUMA nodes under the same netdev
    - Broadcom (bnxt):
      - TX completion handling improvements
      - add basic ntuple filter support
      - reduce MSIX vectors usage for MQPRIO offload
      - add VXLAN support, USO offload and TX coalesce completion for P7
    - Marvell Octeon EP:
      - xmit-more support
      - add PF-VF mailbox support and use it for FW notifications for VFs
    - Wangxun (ngbe/txgbe):
      - implement ethtool functions to operate pause param, ring param,
        coalesce channel number and msglevel
    - Netronome/Corigine (nfp):
      - add flow-steering support
      - support UDP segmentation offload
 
  - Ethernet NICs embedded, slower, virtual:
    - Xilinx AXI: remove duplicate DMA code adopting the dma engine driver
    - stmmac: add support for HW-accelerated VLAN stripping
    - TI AM654x sw: add mqprio, frame preemption & coalescing
    - gve: add support for non-4k page sizes.
    - virtio-net: support dynamic coalescing moderation
 
  - nVidia/Mellanox Ethernet datacenter switches:
    - allow firmware upgrade without a reboot
    - more flexible support for bridge flooding via the compressed
      FID flooding mode
 
  - Ethernet embedded switches:
    - Microchip:
      - fine-tune flow control and speed configurations in KSZ8xxx
      - KSZ88X3: enable setting rmii reference
    - Renesas:
      - add jumbo frames support
    - Marvell:
      - 88E6xxx: add "eth-mac" and "rmon" stats support
 
  - Ethernet PHYs:
    - aquantia: add firmware load support
    - at803x: refactor the driver to simplify adding support for more
      chip variants
    - NXP C45 TJA11xx: Add MACsec offload support
 
  - Wifi:
    - MediaTek (mt76):
      - NVMEM EEPROM improvements
      - mt7996 Extremely High Throughput (EHT) improvements
      - mt7996 Wireless Ethernet Dispatcher (WED) support
      - mt7996 36-bit DMA support
    - Qualcomm (ath12k):
      - support for a single MSI vector
      - WCN7850: support AP mode
    - Intel (iwlwifi):
      - new debugfs file fw_dbg_clear
      - allow concurrent P2P operation on DFS channels
 
  - Bluetooth:
    - QCA2066: support HFP offload
    - ISO: more broadcast-related improvements
    - NXP: better recovery in case receiver/transmitter get out of sync
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmWdamsSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkGC4P/2xjLzdw22ckSssuE9ORbGko9SNjnqHk
 PQh1E+26BHiCg5KB8VvzMsL78E79MRNXEattSW+1g7dhCvln3oi+Vd0WkdRkgt35
 98Iv18zLbbwFAJeyKvmLAPAkQkMLtVj19QILBBRrugF+egEZgVSE3JBcTAiKv2ZQ
 HzkabA171Ri6LpCcEEtY5XuaKvimGnGzF8YMFf8rX0wtqd2p5kbY9aMe47WAGxvU
 Vf9548XvH+A5yVH2/4/gujtUOpA/RHuhuCMb+oo0cZ+VCC1x9MGzoXzj6r87OTkf
 k2W1whNzcGoin92f+9Lk1JYMuiGKBH4QVaDdNXJnYFSJWPTE7RvRsPzYTSD4/GzK
 yEZbzSJXpy/2vDQm16NoAxl7evRs8Sorzkw4LQRviZHI/5SAkK2ZQiCK5CO8QSYy
 C1LELcV5kn6Foe24xWnrWLjAGug9oJnYoGPMU5gvPmFJMvUMXqm5rmbBgUWL5Rxw
 q1M6gVzabCyWUy6z2G2vaqW2ZntNVvCkdsLtIX0XZkcTzNoP0MA+TuhyGz4wbiuo
 PeyQp/mbGnDgCYggqKIA0YWrTVxkhFrKN520cbO8qXBQytV9oFbM/0/+C0/r/5WX
 pL1JVzLrh6l5ME7EIQfha8UOF9j8q4ueSwb40P3AR2NaZiDABM0zfUZ6+sx+91WF
 ucqPEcZB5cRE
 =1bW6
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "The most interesting thing is probably the networking structs
  reorganization and a significant amount of changes is around
  self-tests.

  Core & protocols:

   - Analyze and reorganize core networking structs (socks, netdev,
     netns, mibs) to optimize cacheline consumption and set up build
     time warnings to safeguard against future header changes

     This improves TCP performances with many concurrent connections up
     to 40%

   - Add page-pool netlink-based introspection, exposing the memory
     usage and recycling stats. This helps indentify bad PP users and
     possible leaks

   - Refine TCP/DCCP source port selection to no longer favor even
     source port at connect() time when IP_LOCAL_PORT_RANGE is set. This
     lowers the time taken by connect() for hosts having many active
     connections to the same destination

   - Refactor the TCP bind conflict code, shrinking related socket
     structs

   - Refactor TCP SYN-Cookie handling, as a preparation step to allow
     arbitrary SYN-Cookie processing via eBPF

   - Tune optmem_max for 0-copy usage, increasing the default value to
     128KB and namespecifying it

   - Allow coalescing for cloned skbs coming from page pools, improving
     RX performances with some common configurations

   - Reduce extension header parsing overhead at GRO time

   - Add bridge MDB bulk deletion support, allowing user-space to
     request the deletion of matching entries

   - Reorder nftables struct members, to keep data accessed by the
     datapath first

   - Introduce TC block ports tracking and use. This allows supporting
     multicast-like behavior at the TC layer

   - Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and
     classifiers (RSVP and tcindex)

   - More data-race annotations

   - Extend the diag interface to dump TCP bound-only sockets

   - Conditional notification of events for TC qdisc class and actions

   - Support for WPAN dynamic associations with nearby devices, to form
     a sub-network using a specific PAN ID

   - Implement SMCv2.1 virtual ISM device support

   - Add support for Batman-avd mulicast packet type

  BPF:

   - Tons of verifier improvements:
       - BPF register bounds logic and range support along with a large
         test suite
       - log improvements
       - complete precision tracking support for register spills
       - track aligned STACK_ZERO cases as imprecise spilled registers.
         This improves the verifier "instructions processed" metric from
         single digit to 50-60% for some programs
       - support for user's global BPF subprogram arguments with few
         commonly requested annotations for a better developer
         experience
       - support tracking of BPF_JNE which helps cases when the compiler
         transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the
         like
       - several fixes

   - Add initial TX metadata implementation for AF_XDP with support in
     mlx5 and stmmac drivers. Two types of offloads are supported right
     now, that is, TX timestamp and TX checksum offload

   - Fix kCFI bugs in BPF all forms of indirect calls from BPF into
     kernel and from kernel into BPF work with CFI enabled. This allows
     BPF to work with CONFIG_FINEIBT=y

   - Change BPF verifier logic to validate global subprograms lazily
     instead of unconditionally before the main program, so they can be
     guarded using BPF CO-RE techniques

   - Support uid/gid options when mounting bpffs

   - Add a new kfunc which acquires the associated cgroup of a task
     within a specific cgroup v1 hierarchy where the latter is
     identified by its id

   - Extend verifier to allow bpf_refcount_acquire() of a map value
     field obtained via direct load which is a use-case needed in
     sched_ext

   - Add BPF link_info support for uprobe multi link along with bpftool
     integration for the latter

   - Support for VLAN tag in XDP hints

   - Remove deprecated bpfilter kernel leftovers given the project is
     developed in user-space (https://github.com/facebook/bpfilter)

  Misc:

   - Support for parellel TC self-tests execution

   - Increase MPTCP self-tests coverage

   - Updated the bridge documentation, including several so-far
     undocumented features

   - Convert all the net self-tests to run in unique netns, to avoid
     random failures due to conflict and allow concurrent runs

   - Add TCP-AO self-tests

   - Add kunit tests for both cfg80211 and mac80211

   - Autogenerate Netlink families documentation from YAML spec

   - Add yml-gen support for fixed headers and recursive nests, the tool
     can now generate user-space code for all genetlink families for
     which we have specs

   - A bunch of additional module descriptions fixes

   - Catch incorrect freeing of pages belonging to a page pool

  Driver API:

   - Rust abstractions for network PHY drivers; do not cover yet the
     full C API, but already allow implementing functional PHY drivers
     in rust

   - Introduce queue and NAPI support in the netdev Netlink interface,
     allowing complete access to the device <> NAPIs <> queues
     relationship

   - Introduce notifications filtering for devlink to allow control
     application scale to thousands of instances

   - Improve PHY validation, requesting rate matching information for
     each ethtool link mode supported by both the PHY and host

   - Add support for ethtool symmetric-xor RSS hash

   - ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD
     platform

   - Expose pin fractional frequency offset value over new DPLL generic
     netlink attribute

   - Convert older drivers to platform remove callback returning void

   - Add support for PHY package MMD read/write

  New hardware / drivers:

   - Ethernet:
       - Octeon CN10K devices
       - Broadcom 5760X P7
       - Qualcomm SM8550 SoC
       - Texas Instrument DP83TG720S PHY

   - Bluetooth:
       - IMC Networks Bluetooth radio

  Removed:

   - WiFi:
       - libertas 16-bit PCMCIA support
       - Atmel at76c50x drivers
       - HostAP ISA/PCMCIA style 802.11b driver
       - zd1201 802.11b USB dongles
       - Orinoco ISA/PCMCIA 802.11b driver
       - Aviator/Raytheon driver
       - Planet WL3501 driver
       - RNDIS USB 802.11b driver

  Driver updates:

   - Ethernet high-speed NICs:
       - Intel (100G, ice, idpf):
          - allow one by one port representors creation and removal
          - add temperature and clock information reporting
          - add get/set for ethtool's header split ringparam
          - add again FW logging
          - adds support switchdev hardware packet mirroring
          - iavf: implement symmetric-xor RSS hash
          - igc: add support for concurrent physical and free-running
            timers
          - i40e: increase the allowable descriptors
       - nVidia/Mellanox:
          - Preparation for Socket-Direct multi-dev netdev. That will
            allow in future releases combining multiple PFs devices
            attached to different NUMA nodes under the same netdev
       - Broadcom (bnxt):
          - TX completion handling improvements
          - add basic ntuple filter support
          - reduce MSIX vectors usage for MQPRIO offload
          - add VXLAN support, USO offload and TX coalesce completion
            for P7
       - Marvell Octeon EP:
          - xmit-more support
          - add PF-VF mailbox support and use it for FW notifications
            for VFs
       - Wangxun (ngbe/txgbe):
          - implement ethtool functions to operate pause param, ring
            param, coalesce channel number and msglevel
       - Netronome/Corigine (nfp):
          - add flow-steering support
          - support UDP segmentation offload

   - Ethernet NICs embedded, slower, virtual:
       - Xilinx AXI: remove duplicate DMA code adopting the dma engine
         driver
       - stmmac: add support for HW-accelerated VLAN stripping
       - TI AM654x sw: add mqprio, frame preemption & coalescing
       - gve: add support for non-4k page sizes.
       - virtio-net: support dynamic coalescing moderation

   - nVidia/Mellanox Ethernet datacenter switches:
       - allow firmware upgrade without a reboot
       - more flexible support for bridge flooding via the compressed
         FID flooding mode

   - Ethernet embedded switches:
       - Microchip:
          - fine-tune flow control and speed configurations in KSZ8xxx
          - KSZ88X3: enable setting rmii reference
       - Renesas:
          - add jumbo frames support
       - Marvell:
          - 88E6xxx: add "eth-mac" and "rmon" stats support

   - Ethernet PHYs:
       - aquantia: add firmware load support
       - at803x: refactor the driver to simplify adding support for more
         chip variants
       - NXP C45 TJA11xx: Add MACsec offload support

   - Wifi:
       - MediaTek (mt76):
          - NVMEM EEPROM improvements
          - mt7996 Extremely High Throughput (EHT) improvements
          - mt7996 Wireless Ethernet Dispatcher (WED) support
          - mt7996 36-bit DMA support
       - Qualcomm (ath12k):
          - support for a single MSI vector
          - WCN7850: support AP mode
       - Intel (iwlwifi):
          - new debugfs file fw_dbg_clear
          - allow concurrent P2P operation on DFS channels

   - Bluetooth:
       - QCA2066: support HFP offload
       - ISO: more broadcast-related improvements
       - NXP: better recovery in case receiver/transmitter get out of sync"

* tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1714 commits)
  lan78xx: remove redundant statement in lan78xx_get_eee
  lan743x: remove redundant statement in lan743x_ethtool_get_eee
  bnxt_en: Fix RCU locking for ntuple filters in bnxt_rx_flow_steer()
  bnxt_en: Fix RCU locking for ntuple filters in bnxt_srxclsrldel()
  bnxt_en: Remove unneeded variable in bnxt_hwrm_clear_vnic_filter()
  tcp: Revert no longer abort SYN_SENT when receiving some ICMP
  Revert "mlx5 updates 2023-12-20"
  Revert "net: stmmac: Enable Per DMA Channel interrupt"
  ipvlan: Remove usage of the deprecated ida_simple_xx() API
  ipvlan: Fix a typo in a comment
  net/sched: Remove ipt action tests
  net: stmmac: Use interrupt mode INTM=1 for per channel irq
  net: stmmac: Add support for TX/RX channel interrupt
  net: stmmac: Make MSI interrupt routine generic
  dt-bindings: net: snps,dwmac: per channel irq
  net: phy: at803x: make read_status more generic
  net: phy: at803x: add support for cdt cross short test for qca808x
  net: phy: at803x: refactor qca808x cable test get status function
  net: phy: at803x: generalize cdt fault length function
  net: ethernet: cortina: Drop TSO support
  ...
2024-01-11 10:07:29 -08:00
Linus Torvalds
5d09f61e50 linux_kselftest-nolibc-6.8-rc1
This nolibc update for Linux 6.8-rc1 consists of:
 
 * Support for PIC mode on MIPS.
 * Support for getrlimit()/setrlimit().
 * Replace some custom declarations with UAPI includes.
 * A new script "run-tests.sh" to run the testsuite over different architectures
   and configurations.
 * A few non-functional code cleanups.
 * Minor improvements to nolibc-test, primarily to support the test script.
 
 There are no urgent fixes available at this time.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmWckk8ACgkQCwJExA0N
 QxyHRQ//eXSdYmn3VkNepc3iFn75ntzH8KAFN29ZKCtuTu7+kVjSx+swpRjY4NYE
 jE3n9V8YXw7+R4VNj/AmJlSsnZXsx/PrRa9DNtjHeAza7jFYhWowM9LSSJUUyl78
 bOh6EvdRhoKuz3zz9A68OTDYSUwA3LaZ0vin8f+WtLH05NfSdafmX1pHLRB9LHzj
 J235WktJHoSXOwSAkPZ6NHdtkyeqxy7QomHkuxmmxeVxHnI5SIEDexfa+1FNffGa
 9n5TXGZtcgKPE/m1EqBvW02GbIpflpu6H2fAzssaDb9QxhOXEw2wySn06i5q3hGD
 6gwTsNqBUfPxZCj2tF6FH/7TxxPmNqLqrJVag/e4pO1rDZzrcTL+Dd6HP5TagJtV
 O6/L6UJvqzogIjZD9lk/rWyKfXW0TKk5zBGczduZj/W/McjQ9BDfjR9EjRD/F57Y
 fTB3kHd4TFL4DJyN+AHEdzpm1gwc+0NeGE9CJcrMkKzvqjafo2MNMrlYD9GSxKLy
 aPlWExE7KIBbLIyrwDNxQbt42RYVfkNFGNVX274TighQ9nGBRjXybflPioTUpMw1
 Qyi7qrIDA7QGIrsEgCr6pPeA+LVkWoSyvpXGi/RULQUeg94V9TPXpO4jmf4VNPDQ
 NtYwI/D3UcroYDYY2K2M7KUpv0J4vM9kgQ1wqQ4n/6aKth6kek4=
 =gQNC
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-nolibc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull nolibc updates from Shuah Khan:

 - Support for PIC mode on MIPS

 - Support for getrlimit()/setrlimit()

 - Replace some custom declarations with UAPI includes

 - A new script "run-tests.sh" to run the testsuite over different
   architectures and configurations

 - A few non-functional code cleanups

 - Minor improvements to nolibc-test, primarily to support the test
   script

* tag 'linux_kselftest-nolibc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (22 commits)
  selftests/nolibc: disable coredump via setrlimit
  tools/nolibc: add support for getrlimit/setrlimit
  tools/nolibc: drop custom definition of struct rusage
  tools/nolibc: drop duplicated testcase ioctl_tiocinq
  tools/nolibc: annotate va_list printf formats
  selftests/nolibc: make result alignment more robust
  tools/nolibc: mips: add support for PIC
  selftests/nolibc: run-tests.sh: enable testing via qemu-user
  selftests/nolibc: introduce QEMU_ARCH_USER
  selftests/nolibc: fix testcase status alignment
  selftests/nolibc: add configuration for mipso32be
  selftests/nolibc: extraconfig support
  selftests/nolibc: explicitly specify ABI for MIPS
  selftests/nolibc: use XARCH for MIPS
  tools/nolibc: move MIPS ABI validation into arch-mips.h
  tools/nolibc: error out on unsupported architecture
  selftests/nolibc: add script to run testsuite
  selftests/nolibc: support out-of-tree builds
  selftests/nolibc: anchor paths in $(srcdir) if possible
  selftests/nolibc: use EFI -bios for LoongArch qemu
  ...
2024-01-09 17:14:32 -08:00
Linus Torvalds
fb46e22a9e Many singleton patches against the MM code. The patch series which
are included in this merge do the following:
 
 - Peng Zhang has done some mapletree maintainance work in the
   series
 
 	"maple_tree: add mt_free_one() and mt_attr() helpers"
 	"Some cleanups of maple tree"
 
 - In the series "mm: use memmap_on_memory semantics for dax/kmem"
   Vishal Verma has altered the interworking between memory-hotplug
   and dax/kmem so that newly added 'device memory' can more easily
   have its memmap placed within that newly added memory.
 
 - Matthew Wilcox continues folio-related work (including a few
   fixes) in the patch series
 
 	"Add folio_zero_tail() and folio_fill_tail()"
 	"Make folio_start_writeback return void"
 	"Fix fault handler's handling of poisoned tail pages"
 	"Convert aops->error_remove_page to ->error_remove_folio"
 	"Finish two folio conversions"
 	"More swap folio conversions"
 
 - Kefeng Wang has also contributed folio-related work in the series
 
 	"mm: cleanup and use more folio in page fault"
 
 - Jim Cromie has improved the kmemleak reporting output in the
   series "tweak kmemleak report format".
 
 - In the series "stackdepot: allow evicting stack traces" Andrey
   Konovalov to permits clients (in this case KASAN) to cause
   eviction of no longer needed stack traces.
 
 - Charan Teja Kalla has fixed some accounting issues in the page
   allocator's atomic reserve calculations in the series "mm:
   page_alloc: fixes for high atomic reserve caluculations".
 
 - Dmitry Rokosov has added to the samples/ dorectory some sample
   code for a userspace memcg event listener application.  See the
   series "samples: introduce cgroup events listeners".
 
 - Some mapletree maintanance work from Liam Howlett in the series
   "maple_tree: iterator state changes".
 
 - Nhat Pham has improved zswap's approach to writeback in the
   series "workload-specific and memory pressure-driven zswap
   writeback".
 
 - DAMON/DAMOS feature and maintenance work from SeongJae Park in
   the series
 
 	"mm/damon: let users feed and tame/auto-tune DAMOS"
 	"selftests/damon: add Python-written DAMON functionality tests"
 	"mm/damon: misc updates for 6.8"
 
 - Yosry Ahmed has improved memcg's stats flushing in the series
   "mm: memcg: subtree stats flushing and thresholds".
 
 - In the series "Multi-size THP for anonymous memory" Ryan Roberts
   has added a runtime opt-in feature to transparent hugepages which
   improves performance by allocating larger chunks of memory during
   anonymous page faults.
 
 - Matthew Wilcox has also contributed some cleanup and maintenance
   work against eh buffer_head code int he series "More buffer_head
   cleanups".
 
 - Suren Baghdasaryan has done work on Andrea Arcangeli's series
   "userfaultfd move option".  UFFDIO_MOVE permits userspace heap
   compaction algorithms to move userspace's pages around rather than
   UFFDIO_COPY'a alloc/copy/free.
 
 - Stefan Roesch has developed a "KSM Advisor", in the series
   "mm/ksm: Add ksm advisor".  This is a governor which tunes KSM's
   scanning aggressiveness in response to userspace's current needs.
 
 - Chengming Zhou has optimized zswap's temporary working memory
   use in the series "mm/zswap: dstmem reuse optimizations and
   cleanups".
 
 - Matthew Wilcox has performed some maintenance work on the
   writeback code, both code and within filesystems.  The series is
   "Clean up the writeback paths".
 
 - Andrey Konovalov has optimized KASAN's handling of alloc and
   free stack traces for secondary-level allocators, in the series
   "kasan: save mempool stack traces".
 
 - Andrey also performed some KASAN maintenance work in the series
   "kasan: assorted clean-ups".
 
 - David Hildenbrand has gone to town on the rmap code.  Cleanups,
   more pte batching, folio conversions and more.  See the series
   "mm/rmap: interface overhaul".
 
 - Kinsey Ho has contributed some maintenance work on the MGLRU
   code in the series "mm/mglru: Kconfig cleanup".
 
 - Matthew Wilcox has contributed lruvec page accounting code
   cleanups in the series "Remove some lruvec page accounting
   functions".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA
 jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27
 Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU=
 =0NHs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Many singleton patches against the MM code. The patch series which are
  included in this merge do the following:

   - Peng Zhang has done some mapletree maintainance work in the series

	'maple_tree: add mt_free_one() and mt_attr() helpers'
	'Some cleanups of maple tree'

   - In the series 'mm: use memmap_on_memory semantics for dax/kmem'
     Vishal Verma has altered the interworking between memory-hotplug
     and dax/kmem so that newly added 'device memory' can more easily
     have its memmap placed within that newly added memory.

   - Matthew Wilcox continues folio-related work (including a few fixes)
     in the patch series

	'Add folio_zero_tail() and folio_fill_tail()'
	'Make folio_start_writeback return void'
	'Fix fault handler's handling of poisoned tail pages'
	'Convert aops->error_remove_page to ->error_remove_folio'
	'Finish two folio conversions'
	'More swap folio conversions'

   - Kefeng Wang has also contributed folio-related work in the series

	'mm: cleanup and use more folio in page fault'

   - Jim Cromie has improved the kmemleak reporting output in the series
     'tweak kmemleak report format'.

   - In the series 'stackdepot: allow evicting stack traces' Andrey
     Konovalov to permits clients (in this case KASAN) to cause eviction
     of no longer needed stack traces.

   - Charan Teja Kalla has fixed some accounting issues in the page
     allocator's atomic reserve calculations in the series 'mm:
     page_alloc: fixes for high atomic reserve caluculations'.

   - Dmitry Rokosov has added to the samples/ dorectory some sample code
     for a userspace memcg event listener application. See the series
     'samples: introduce cgroup events listeners'.

   - Some mapletree maintanance work from Liam Howlett in the series
     'maple_tree: iterator state changes'.

   - Nhat Pham has improved zswap's approach to writeback in the series
     'workload-specific and memory pressure-driven zswap writeback'.

   - DAMON/DAMOS feature and maintenance work from SeongJae Park in the
     series

	'mm/damon: let users feed and tame/auto-tune DAMOS'
	'selftests/damon: add Python-written DAMON functionality tests'
	'mm/damon: misc updates for 6.8'

   - Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
     memcg: subtree stats flushing and thresholds'.

   - In the series 'Multi-size THP for anonymous memory' Ryan Roberts
     has added a runtime opt-in feature to transparent hugepages which
     improves performance by allocating larger chunks of memory during
     anonymous page faults.

   - Matthew Wilcox has also contributed some cleanup and maintenance
     work against eh buffer_head code int he series 'More buffer_head
     cleanups'.

   - Suren Baghdasaryan has done work on Andrea Arcangeli's series
     'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
     compaction algorithms to move userspace's pages around rather than
     UFFDIO_COPY'a alloc/copy/free.

   - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
     Add ksm advisor'. This is a governor which tunes KSM's scanning
     aggressiveness in response to userspace's current needs.

   - Chengming Zhou has optimized zswap's temporary working memory use
     in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.

   - Matthew Wilcox has performed some maintenance work on the writeback
     code, both code and within filesystems. The series is 'Clean up the
     writeback paths'.

   - Andrey Konovalov has optimized KASAN's handling of alloc and free
     stack traces for secondary-level allocators, in the series 'kasan:
     save mempool stack traces'.

   - Andrey also performed some KASAN maintenance work in the series
     'kasan: assorted clean-ups'.

   - David Hildenbrand has gone to town on the rmap code. Cleanups, more
     pte batching, folio conversions and more. See the series 'mm/rmap:
     interface overhaul'.

   - Kinsey Ho has contributed some maintenance work on the MGLRU code
     in the series 'mm/mglru: Kconfig cleanup'.

   - Matthew Wilcox has contributed lruvec page accounting code cleanups
     in the series 'Remove some lruvec page accounting functions'"

* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
  mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
  mm, treewide: introduce NR_PAGE_ORDERS
  selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
  selftests/mm: skip test if application doesn't has root privileges
  selftests/mm: conform test to TAP format output
  selftests: mm: hugepage-mmap: conform to TAP format output
  selftests/mm: gup_test: conform test to TAP format output
  mm/selftests: hugepage-mremap: conform test to TAP format output
  mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
  mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
  mm/memcontrol: remove __mod_lruvec_page_state()
  mm/khugepaged: use a folio more in collapse_file()
  slub: use a folio in __kmalloc_large_node
  slub: use folio APIs in free_large_kmalloc()
  slub: use alloc_pages_node() in alloc_slab_page()
  mm: remove inc/dec lruvec page state functions
  mm: ratelimit stat flush from workingset shrinker
  kasan: stop leaking stack trace handles
  mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
  mm/mglru: add dummy pmd_dirty()
  ...
2024-01-09 11:18:47 -08:00
Linus Torvalds
ab5f3fcb7c arm64 updates for 6.8
* for-next/cpufeature
 
   - Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye olde
     Thunder-X machines.
   - Avoid mapping KPTI trampoline when it is not required.
   - Make CPU capability API more robust during early initialisation.
 
 * for-next/early-idreg-overrides
 
   - Remove dependencies on core kernel helpers from the early
     command-line parsing logic in preparation for moving this code
     before the kernel is mapped.
 
 * for-next/fpsimd
 
   - Restore kernel-mode fpsimd context lazily, allowing us to run fpsimd
     code sequences in the kernel with pre-emption enabled.
 
 * for-next/kbuild
 
   - Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y.
   - Makefile cleanups.
 
 * for-next/lpa2-prep
 
   - Preparatory work for enabling the 'LPA2' extension, which will
     introduce 52-bit virtual and physical addressing even with 4KiB
     pages (including for KVM guests).
 
 * for-next/misc
 
   - Remove dead code and fix a typo.
 
 * for-next/mm
 
   - Pass NUMA node information for IRQ stack allocations.
 
 * for-next/perf
 
   - Add perf support for the Synopsys DesignWare PCIe PMU.
   - Add support for event counting thresholds (FEAT_PMUv3_TH) introduced
     in Armv8.8.
   - Add support for i.MX8DXL SoCs to the IMX DDR PMU driver.
   - Minor PMU driver fixes and optimisations.
 
 * for-next/rip-vpipt
 
   - Remove what support we had for the obsolete VPIPT I-cache policy.
 
 * for-next/selftests
 
   - Improvements to the SVE and SME selftests.
 
 * for-next/stacktrace
 
   - Refactor kernel unwind logic so that it can used by BPF unwinding
     and, eventually, reliable backtracing.
 
 * for-next/sysregs
 
   - Update a bunch of register definitions based on the latest XML drop
     from Arm.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmWWvKYQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNIiTB/9agZBkEhZjP2sNDGyE4UFwawweWHkt2r8h
 WyvdwP91Z/AIsYSsGYu36J0l4pOnMKp/i6t+rt031SK4j+Q8hJYhSfDt3RvVbc0/
 Pz9D18V6cLrfq+Yxycqq9ufVdjs+m+CQ5WeLaRGmNIyEzJ/Jv/qrAN+2r603EeLP
 nq08qMZhDIQd2ZzbigCnGaNrTsVSafFfBFv1GsgDvnMZAjs1G6457A6zu+NatNUc
 +TMSG+3EawutHZZ2noXl0Ra7VOfIbVZFiUssxRPenKQByHHHR+QB2c/O1blri+dm
 XLMutvqO2/WvYGIfXO5koqZqvpVeR3zXxPwmGi5hQBsmOjtXzKd+
 =U4mo
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "CPU features:

   - Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye
     olde Thunder-X machines

   - Avoid mapping KPTI trampoline when it is not required

   - Make CPU capability API more robust during early initialisation

  Early idreg overrides:

   - Remove dependencies on core kernel helpers from the early
     command-line parsing logic in preparation for moving this code
     before the kernel is mapped

  FPsimd:

   - Restore kernel-mode fpsimd context lazily, allowing us to run
     fpsimd code sequences in the kernel with pre-emption enabled

  KBuild:

   - Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y

   - Makefile cleanups

  LPA2 prep:

   - Preparatory work for enabling the 'LPA2' extension, which will
     introduce 52-bit virtual and physical addressing even with 4KiB
     pages (including for KVM guests).

  Misc:

   - Remove dead code and fix a typo

  MM:

   - Pass NUMA node information for IRQ stack allocations

  Perf:

   - Add perf support for the Synopsys DesignWare PCIe PMU

   - Add support for event counting thresholds (FEAT_PMUv3_TH)
     introduced in Armv8.8

   - Add support for i.MX8DXL SoCs to the IMX DDR PMU driver.

   - Minor PMU driver fixes and optimisations

  RIP VPIPT:

   - Remove what support we had for the obsolete VPIPT I-cache policy

  Selftests:

   - Improvements to the SVE and SME selftests

  Stacktrace:

   - Refactor kernel unwind logic so that it can used by BPF unwinding
     and, eventually, reliable backtracing

  Sysregs:

   - Update a bunch of register definitions based on the latest XML drop
     from Arm"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits)
  kselftest/arm64: Don't probe the current VL for unsupported vector types
  efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad
  arm64: properly install vmlinuz.efi
  arm64/sysreg: Add missing system instruction definitions for FGT
  arm64/sysreg: Add missing system register definitions for FGT
  arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1
  arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1
  arm64: memory: remove duplicated include
  arm: perf: Fix ARCH=arm build with GCC
  arm64: Align boot cpucap handling with system cpucap handling
  arm64: Cleanup system cpucap handling
  MAINTAINERS: add maintainers for DesignWare PCIe PMU driver
  drivers/perf: add DesignWare PCIe PMU driver
  PCI: Move pci_clear_and_set_dword() helper to PCI header
  PCI: Add Alibaba Vendor ID to linux/pci_ids.h
  docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
  arm64: irq: set the correct node for shadow call stack
  Revert "perf/arm_dmc620: Remove duplicate format attribute #defines"
  arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
  arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
  ...
2024-01-08 16:32:09 -08:00
Jamal Hadi Salim
33241dca48 net/sched: Remove uapi support for CBQ qdisc
Commit 051d442098 ("net/sched: Retire CBQ qdisc") retired the CBQ qdisc.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 14:25:51 +00:00
Jamal Hadi Salim
26cc8714fc net/sched: Remove uapi support for ATM qdisc
Commit fb38306ceb ("net/sched: Retire ATM qdisc") retired the ATM qdisc.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 14:25:51 +00:00
Jamal Hadi Salim
fe3b739a54 net/sched: Remove uapi support for dsmark qdisc
Commit bbe77c14ee ("net/sched: Retire dsmark qdisc") retired the dsmark
classifier. Remove UAPI support for it.
Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 14:25:51 +00:00
Jamal Hadi Salim
82b2545ed9 net/sched: Remove uapi support for tcindex classifier
commit 8c710f7525 ("net/sched: Retire tcindex classifier") retired the TC
tcindex classifier.
Remove UAPI for it.  Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 14:25:51 +00:00
Jamal Hadi Salim
41bc3e8fc1 net/sched: Remove uapi support for rsvp classifier
commit 265b4da82d ("net/sched: Retire rsvp classifier") retired the TC RSVP
classifier.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 14:25:51 +00:00
Andrii Nakryiko
d17aff807f Revert BPF token-related functionality
This patch includes the following revert (one  conflicting BPF FS
patch and three token patch sets, represented by merge commits):
  - revert 0f5d5454c7 "Merge branch 'bpf-fs-mount-options-parsing-follow-ups'";
  - revert 750e785796 "bpf: Support uid and gid when mounting bpffs";
  - revert 733763285a "Merge branch 'bpf-token-support-in-libbpf-s-bpf-object'";
  - revert c35919dcce "Merge branch 'bpf-token-and-bpf-fs-based-delegation'".

Link: https://lore.kernel.org/bpf/CAHk-=wg7JuFYwGy=GOMbRCtOL+jwSQsdUaBsRWkDVYbxipbM5A@mail.gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-12-19 08:23:03 -08:00
Jakub Kicinski
c49b292d03 netdev
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmWAz2EACgkQ6rmadz2v
 bToqrw/9EwroZCc8GEHOKAlb/fzrMvn92rLo0ZW/cGN84QJPnx4zM6Zo0+fgLaaN
 oqqztwMUwdzGC3uX3FfVXaaLKbJ/MeHeL9BXFZNW8zkRHciw4R7kIBhOdPnHyET7
 uT+rQ4xPe1Mt7e9PjepKlSL5mEsxWfBkdUgsdn19Z2Vjdfr9mZMhYWYMJGcfTCD1
 TwxHKBPhq5fN3IsshmMBB8IrRp1HStUKb65MgZ4dI22LJXxTsFkx5XMFXcmuqvkH
 NhKj8jDcPEEh31bYcb6aG2Z4onw5F2lquygjk1Qyy5cyw45m/ipJKAXKdAyvJG+R
 VZCWOET/9wbRwFSK5wxwihCuKghFiofK52i2PcGtXZh0PCouyZZneSJOKM0yVWKO
 BvuJBxK4ETRnQyN6ZxhuJiEXG3/YMBBhyR2TX1LntVK9ct/k7qFVzATG49J39/sR
 SYMbptBRj4a5oMJ1qn0nFVEDFkg0jTnTDNnsEpcz60Ayt6EsJ1XosO5yz2huf861
 xgRMTKMseyG1/uV45tQ8ZPzbSPpBxjUi9Dl3coYsIm1a+y6clWUXcarONY5KVrpS
 CR98DuFgl+E7dXuisd/Kz2p2KxxSPq8nytsmLlgOvrUqhwiXqB+TKN8EHgIapVOt
 l1A5LrzXFTcGlT9MlaWBqEIy83Bu1nqQqbxrAFOE0k8A5jomXaw=
 =stU2
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2023-12-18

This PR is larger than usual and contains changes in various parts
of the kernel.

The main changes are:

1) Fix kCFI bugs in BPF, from Peter Zijlstra.

End result: all forms of indirect calls from BPF into kernel
and from kernel into BPF work with CFI enabled. This allows BPF
to work with CONFIG_FINEIBT=y.

2) Introduce BPF token object, from Andrii Nakryiko.

It adds an ability to delegate a subset of BPF features from privileged
daemon (e.g., systemd) through special mount options for userns-bound
BPF FS to a trusted unprivileged application. The design accommodates
suggestions from Christian Brauner and Paul Moore.

Example:
$ sudo mkdir -p /sys/fs/bpf/token
$ sudo mount -t bpf bpffs /sys/fs/bpf/token \
             -o delegate_cmds=prog_load:MAP_CREATE \
             -o delegate_progs=kprobe \
             -o delegate_attachs=xdp

3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei.

 - Complete precision tracking support for register spills
 - Fix verification of possibly-zero-sized stack accesses
 - Fix access to uninit stack slots
 - Track aligned STACK_ZERO cases as imprecise spilled registers.
   It improves the verifier "instructions processed" metric from single
   digit to 50-60% for some programs.
 - Fix verifier retval logic

4) Support for VLAN tag in XDP hints, from Larysa Zaremba.

5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu.

End result: better memory utilization and lower I$ miss for calls to BPF
via BPF trampoline.

6) Fix race between BPF prog accessing inner map and parallel delete,
from Hou Tao.

7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu.

It allows BPF interact with IPSEC infra. The intent is to support
software RSS (via XDP) for the upcoming ipsec pcpu work.
Experiments on AWS demonstrate single tunnel pcpu ipsec reaching
line rate on 100G ENA nics.

8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao.

9) BPF file verification via fsverity, from Song Liu.

It allows BPF progs get fsverity digest.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits)
  bpf: Ensure precise is reset to false in __mark_reg_const_zero()
  selftests/bpf: Add more uprobe multi fail tests
  bpf: Fail uprobe multi link with negative offset
  selftests/bpf: Test the release of map btf
  s390/bpf: Fix indirect trampoline generation
  selftests/bpf: Temporarily disable dummy_struct_ops test on s390
  x86/cfi,bpf: Fix bpf_exception_cb() signature
  bpf: Fix dtor CFI
  cfi: Add CFI_NOSEAL()
  x86/cfi,bpf: Fix bpf_struct_ops CFI
  x86/cfi,bpf: Fix bpf_callback_t CFI
  x86/cfi,bpf: Fix BPF JIT call
  cfi: Flip headers
  selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment
  selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test
  selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment
  bpf: Limit the number of kprobes when attaching program to multiple kprobes
  bpf: Limit the number of uprobes when attaching program to multiple uprobes
  bpf: xdp: Register generic_kfunc_set with XDP programs
  selftests/bpf: utilize string values for delegate_xxx mount options
  ...
====================

Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 16:46:08 -08:00
Arnaldo Carvalho de Melo
ab1c247094 Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up fixes that went thru perf-tools for v6.7 and to get in sync
with upstream to check for drift in the copies of headers, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-18 21:37:07 -03:00
Larysa Zaremba
e6795330f8 xdp: Add VLAN tag hint
Implement functionality that enables drivers to expose VLAN tag
to XDP code.

VLAN tag is represented by 2 variables:
- protocol ID, which is passed to bpf code in BE
- VLAN TCI, in host byte order

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20231205210847.28460-10-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13 16:16:40 -08:00
James Clark
c7b98bf0fc KVM: selftests: aarch64: Update tools copy of arm_pmuv3.h
Now that ARMV8_PMU_PMCR_N is made with GENMASK, update usages to treat
it as a pre-shifted mask.

Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20231211161331.1277825-9-james.clark@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12 09:46:22 +00:00
Thomas Weißschuh
a0bb5f88fc tools/nolibc: add support for getrlimit/setrlimit
The implementation uses the prlimit64 systemcall as that is available on
all architectures.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/20231123-nolibc-rlimit-v1-2-a428b131de2a@weissschuh.net/
Acked-by: Willy Tarreau <w@1wt.eu>
2023-12-11 22:38:35 +01:00
Thomas Weißschuh
7b20478b77 tools/nolibc: drop custom definition of struct rusage
A future commit will include linux/resource.h, which will conflict with
the private definition of struct rusage in nolibc.
Avoid the conflict by dropping the private definition and use the one
from the UAPI headers.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/20231123-nolibc-rlimit-v1-1-a428b131de2a@weissschuh.net/
Acked-by: Willy Tarreau <w@1wt.eu>
2023-12-11 22:38:34 +01:00
Thomas Weißschuh
dece8476d6 tools/nolibc: annotate va_list printf formats
__attribute__(format(printf)) can also be used for functions that take a
va_list argument.

As per the GCC docs:

    For functions where the arguments are not available to be checked
    (such as vprintf), specify the third parameter as zero.

Link: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2023-12-11 22:38:32 +01:00
Thomas Weißschuh
544102458a tools/nolibc: mips: add support for PIC
MIPS requires some extra instructions to set up the $gp register for the
with a pointer to the global data area.

This isn't needed for non-PIC builds, but this patch enables the code
unconditionally to prevent bitrot.

Also enable PIC in one of the test configurations for ongoing
validation.

Link: https://lore.kernel.org/r/20231108-nolibc-pic-v2-1-4fb0d6284757@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2023-12-11 22:38:30 +01:00
Thomas Weißschuh
aa68a5a83a tools/nolibc: move MIPS ABI validation into arch-mips.h
When installing nolibc to a sysroot arch.h is not used so its ABI check
is bypassed. This makes is possible to compile nolibc with a non O32 ABI
which may build but can not run.

Move the check into arch-mips.h so it will always be evaluated.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
2023-12-11 22:38:21 +01:00
Thomas Weißschuh
48946c5aa7 tools/nolibc: error out on unsupported architecture
When an architecture is unsupported arch.h would silently continue.
This leads to a lot of followup errors because my_syscallX() is not
defined and the startup code is missing.

Avoid these confusing errors and fail the build early with a clear
error message and location.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
2023-12-11 22:38:20 +01:00
Mark Brown
bb6ec2e9fd tools/nolibc: Use linux/wait.h rather than duplicating it
Linux defines a few custom flags for waitpid() which aren't currently
provided by nolibc, make them available to nolibc based programs by just
including linux/wait.h where they are defined instead of defining our
own copy of the flags.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2023-12-11 22:38:08 +01:00
Andrei Vagin
e6a9a2cbc1 fs/proc/task_mmu: report SOFT_DIRTY bits through the PAGEMAP_SCAN ioctl
The PAGEMAP_SCAN ioctl returns information regarding page table entries. 
It is more efficient compared to reading pagemap files.  CRIU can start to
utilize this ioctl, but it needs info about soft-dirty bits to track
memory changes.

We are aware of a new method for tracking memory changes implemented in
the PAGEMAP_SCAN ioctl.  For CRIU, the primary advantage of this method is
its usability by unprivileged users.  However, it is not feasible to
transparently replace the soft-dirty tracker with the new one.  The main
problem here is userfault descriptors that have to be preserved between
pre-dump iterations.  It means criu continues supporting the soft-dirty
method to avoid breakage for current users.  The new method will be
implemented as a separate feature.

[avagin@google.com: update tools/include/uapi/linux/fs.h]
  Link: https://lkml.kernel.org/r/20231107164139.576046-1-avagin@google.com
Link: https://lkml.kernel.org/r/20231106220959.296568-1-avagin@google.com
Signed-off-by: Andrei Vagin <avagin@google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10 16:51:35 -08:00
Peng Zhang
446e1867e6 maple_tree: update check_forking() and bench_forking()
Updated check_forking() and bench_forking() to use __mt_dup() to duplicate
maple tree.

Link: https://lkml.kernel.org/r/20231027033845.90608-9-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10 16:51:33 -08:00
Peng Zhang
b2472efe43 maple_tree: introduce {mtree,mas}_lock_nested()
In some cases, nested locks may be needed, so {mtree,mas}_lock_nested is
introduced.  For example, when duplicating maple tree, we need to hold the
locks of two trees, in which case nested locks are needed.

At the same time, add the definition of spin_lock_nested() in tools for
testing.

Link: https://lkml.kernel.org/r/20231027033845.90608-3-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10 16:51:31 -08:00
Jakub Kicinski
2483e7f04c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/stmicro/stmmac/dwmac5.c
drivers/net/ethernet/stmicro/stmmac/dwmac5.h
drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
drivers/net/ethernet/stmicro/stmmac/hwif.h
  37e4b8df27 ("net: stmmac: fix FPE events losing")
  c3f3b97238 ("net: stmmac: Refactor EST implementation")
https://lore.kernel.org/all/20231206110306.01e91114@canb.auug.org.au/

Adjacent changes:

net/ipv4/tcp_ao.c
  9396c4ee93 ("net/tcp: Don't store TCP-AO maclen on reqsk")
  7b0f570f87 ("tcp: Move TCP-AO bits from cookie_v[46]_check() to tcp_ao_syncookie().")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07 17:53:17 -08:00
Andrii Nakryiko
7065eefb38 bpf: rename MAX_BPF_LINK_TYPE into __MAX_BPF_LINK_TYPE for consistency
To stay consistent with the naming pattern used for similar cases in BPF
UAPI (__MAX_BPF_ATTACH_TYPE, etc), rename MAX_BPF_LINK_TYPE into
__MAX_BPF_LINK_TYPE.

Also similar to MAX_BPF_ATTACH_TYPE and MAX_BPF_REG, add:

  #define MAX_BPF_LINK_TYPE __MAX_BPF_LINK_TYPE

Not all __MAX_xxx enums have such #define, so I'm not sure if we should
add it or not, but I figured I'll start with a completely backwards
compatible way, and we can drop that, if necessary.

Also adjust a selftest that used MAX_BPF_LINK_TYPE enum.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231206190920.1651226-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06 14:41:16 -08:00
Andrii Nakryiko
e1cef620f5 bpf: add BPF token support to BPF_PROG_LOAD command
Add basic support of BPF token to BPF_PROG_LOAD. Wire through a set of
allowed BPF program types and attach types, derived from BPF FS at BPF
token creation time. Then make sure we perform bpf_token_capable()
checks everywhere where it's relevant.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06 10:02:59 -08:00
Andrii Nakryiko
ee54b1a910 bpf: add BPF token support to BPF_BTF_LOAD command
Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading
through delegated BPF token. BTF loading is a pretty straightforward
operation, so as long as BPF token is created with allow_cmds granting
BPF_BTF_LOAD command, kernel proceeds to parsing BTF data and creating
BTF object.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06 10:02:59 -08:00
Andrii Nakryiko
688b7270b3 bpf: add BPF token support to BPF_MAP_CREATE command
Allow providing token_fd for BPF_MAP_CREATE command to allow controlled
BPF map creation from unprivileged process through delegated BPF token.

Wire through a set of allowed BPF map types to BPF token, derived from
BPF FS at BPF token creation time. This, in combination with allowed_cmds
allows to create a narrowly-focused BPF token (controlled by privileged
agent) with a restrictive set of BPF maps that application can attempt
to create.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06 10:02:59 -08:00
Andrii Nakryiko
4527358b76 bpf: introduce BPF token object
Add new kind of BPF kernel object, BPF token. BPF token is meant to
allow delegating privileged BPF functionality, like loading a BPF
program or creating a BPF map, from privileged process to a *trusted*
unprivileged process, all while having a good amount of control over which
privileged operations could be performed using provided BPF token.

This is achieved through mounting BPF FS instance with extra delegation
mount options, which determine what operations are delegatable, and also
constraining it to the owning user namespace (as mentioned in the
previous patch).

BPF token itself is just a derivative from BPF FS and can be created
through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF
FS FD, which can be attained through open() API by opening BPF FS mount
point. Currently, BPF token "inherits" delegated command, map types,
prog type, and attach type bit sets from BPF FS as is. In the future,
having an BPF token as a separate object with its own FD, we can allow
to further restrict BPF token's allowable set of things either at the
creation time or after the fact, allowing the process to guard itself
further from unintentionally trying to load undesired kind of BPF
programs. But for now we keep things simple and just copy bit sets as is.

When BPF token is created from BPF FS mount, we take reference to the
BPF super block's owning user namespace, and then use that namespace for
checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
capabilities that are normally only checked against init userns (using
capable()), but now we check them using ns_capable() instead (if BPF
token is provided). See bpf_token_capable() for details.

Such setup means that BPF token in itself is not sufficient to grant BPF
functionality. User namespaced process has to *also* have necessary
combination of capabilities inside that user namespace. So while
previously CAP_BPF was useless when granted within user namespace, now
it gains a meaning and allows container managers and sys admins to have
a flexible control over which processes can and need to use BPF
functionality within the user namespace (i.e., container in practice).
And BPF FS delegation mount options and derived BPF tokens serve as
a per-container "flag" to grant overall ability to use bpf() (plus further
restrict on which parts of bpf() syscalls are treated as namespaced).

Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
within the BPF FS owning user namespace, rounding up the ns_capable()
story of BPF token.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06 10:02:59 -08:00
Amritha Nambiar
8481a249a0 netdev-genl: spec: Add PID in netdev netlink YAML spec
Add support in netlink spec(netdev.yaml) for PID of the
NAPI thread. Add code generated from the spec.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/170147335301.5260.11872351477120434501.stgit@anambiarhost.jf.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-04 18:04:06 -08:00
Amritha Nambiar
5a5131d66f netdev-genl: spec: Add irq in netdev netlink YAML spec
Add support in netlink spec(netdev.yaml) for interrupt number
among the NAPI attributes. Add code generated from the spec.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/170147334210.5260.18178387869057516983.stgit@anambiarhost.jf.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-04 18:04:06 -08:00
Amritha Nambiar
ff9991499f netdev-genl: spec: Extend netdev netlink spec in YAML for NAPI
Add support in netlink spec(netdev.yaml) for napi related information.
Add code generated from the spec.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/170147333119.5260.7050639053080529108.stgit@anambiarhost.jf.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-04 18:04:05 -08:00
Amritha Nambiar
bc87795627 netdev-genl: spec: Extend netdev netlink spec in YAML for queue
Add support in netlink spec(netdev.yaml) for queue information.
Add code generated from the spec.

Note: The "queue-type" attribute takes values 0 and 1 for rx
and tx queue type respectively.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/170147330963.5260.2576294626647300472.stgit@anambiarhost.jf.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-04 18:04:05 -08:00
Jakub Kicinski
753c8608f3 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZWiCPAAKCRDbK58LschI
 g4djAQC1FdqCRIFkhbiIRNHTgHjnfQShELQbd9ofJqzylLqmmgD+JI1E7D9SXagm
 pIXQ26EGmq8/VcCT3VLncA8EsC76Gg4=
 =Xowm
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-11-30

We've added 30 non-merge commits during the last 7 day(s) which contain
a total of 58 files changed, 1598 insertions(+), 154 deletions(-).

The main changes are:

1) Add initial TX metadata implementation for AF_XDP with support in mlx5
   and stmmac drivers. Two types of offloads are supported right now, that
   is, TX timestamp and TX checksum offload, from Stanislav Fomichev with
   stmmac implementation from Song Yoong Siang.

2) Change BPF verifier logic to validate global subprograms lazily instead
   of unconditionally before the main program, so they can be guarded using
   BPF CO-RE techniques, from Andrii Nakryiko.

3) Add BPF link_info support for uprobe multi link along with bpftool
   integration for the latter, from Jiri Olsa.

4) Use pkg-config in BPF selftests to determine ld flags which is
   in particular needed for linking statically, from Akihiko Odaki.

5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18,
   from Yonghong Song.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits)
  bpf/tests: Remove duplicate JSGT tests
  selftests/bpf: Add TX side to xdp_hw_metadata
  selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP
  selftests/bpf: Add TX side to xdp_metadata
  selftests/bpf: Add csum helpers
  selftests/xsk: Support tx_metadata_len
  xsk: Add option to calculate TX checksum in SW
  xsk: Validate xsk_tx_metadata flags
  xsk: Document tx_metadata_len layout
  net: stmmac: Add Tx HWTS support to XDP ZC
  net/mlx5e: Implement AF_XDP TX timestamp and checksum offload
  tools: ynl: Print xsk-features from the sample
  xsk: Add TX timestamp and TX checksum offload support
  xsk: Support tx_metadata_len
  selftests/bpf: Use pkg-config for libelf
  selftests/bpf: Override PKG_CONFIG for static builds
  selftests/bpf: Choose pkg-config for the target
  bpftool: Add support to display uprobe_multi links
  selftests/bpf: Add link_info test for uprobe_multi link
  selftests/bpf: Use bpf_link__destroy in fill_link_info tests
  ...
====================

Conflicts:

Documentation/netlink/specs/netdev.yaml:
  839ff60df3 ("net: page_pool: add nlspec for basic access to page pools")
  48eb03dd26 ("xsk: Add TX timestamp and TX checksum offload support")
https://lore.kernel.org/all/20231201094705.1ee3cab8@canb.auug.org.au/

While at it also regen, tree is dirty after:
  48eb03dd26 ("xsk: Add TX timestamp and TX checksum offload support")
looks like code wasn't re-rendered after "render-max" was removed.

Link: https://lore.kernel.org/r/20231130145708.32573-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-30 16:58:42 -08:00
Stanislav Fomichev
11614723af xsk: Add option to calculate TX checksum in SW
For XDP_COPY mode, add a UMEM option XDP_UMEM_TX_SW_CSUM
to call skb_checksum_help in transmit path. Might be useful
to debugging issues with real hardware. I also use this mode
in the selftests.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20231127190319.1190813-9-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-29 14:59:40 -08:00
Stanislav Fomichev
48eb03dd26 xsk: Add TX timestamp and TX checksum offload support
This change actually defines the (initial) metadata layout
that should be used by AF_XDP userspace (xsk_tx_metadata).
The first field is flags which requests appropriate offloads,
followed by the offload-specific fields. The supported per-device
offloads are exported via netlink (new xsk-flags).

The offloads themselves are still implemented in a bit of a
framework-y fashion that's left from my initial kfunc attempt.
I'm introducing new xsk_tx_metadata_ops which drivers are
supposed to implement. The drivers are also supposed
to call xsk_tx_metadata_request/xsk_tx_metadata_complete in
the right places. Since xsk_tx_metadata_{request,_complete}
are static inline, we don't incur any extra overhead doing
indirect calls.

The benefit of this scheme is as follows:
- keeps all metadata layout parsing away from driver code
- makes it easy to grep and see which drivers implement what
- don't need any extra flags to maintain to keep track of what
  offloads are implemented; if the callback is implemented - the offload
  is supported (used by netlink reporting code)

Two offloads are defined right now:
1. XDP_TXMD_FLAGS_CHECKSUM: skb-style csum_start+csum_offset
2. XDP_TXMD_FLAGS_TIMESTAMP: writes TX timestamp back into metadata
   area upon completion (tx_timestamp field)

XDP_TXMD_FLAGS_TIMESTAMP is also implemented for XDP_COPY mode: it writes
SW timestamp from the skb destructor (note I'm reusing hwtstamps to pass
metadata pointer).

The struct is forward-compatible and can be extended in the future
by appending more fields.

Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231127190319.1190813-3-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-29 14:59:40 -08:00
Stanislav Fomichev
341ac980ea xsk: Support tx_metadata_len
For zerocopy mode, tx_desc->addr can point to an arbitrary offset
and carry some TX metadata in the headroom. For copy mode, there
is no way currently to populate skb metadata.

Introduce new tx_metadata_len umem config option that indicates how many
bytes to treat as metadata. Metadata bytes come prior to tx_desc address
(same as in RX case).

The size of the metadata has mostly the same constraints as XDP:
- less than 256 bytes
- 8-byte aligned (compared to 4-byte alignment on xdp, due to 8-byte
  timestamp in the completion)
- non-zero

This data is not interpreted in any way right now.

Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231127190319.1190813-2-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-29 14:59:40 -08:00
Jiri Olsa
e56fdbfb06 bpf: Add link_info support for uprobe multi link
Adding support to get uprobe_link details through bpf_link_info
interface.

Adding new struct uprobe_multi to struct bpf_link_info to carry
the uprobe_multi link details.

The uprobe_multi.count is passed from user space to denote size
of array fields (offsets/ref_ctr_offsets/cookies). The actual
array size is stored back to uprobe_multi.count (allowing user
to find out the actual array size) and array fields are populated
up to the user passed size.

All the non-array fields (path/count/flags/pid) are always set.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20231125193130.834322-4-jolsa@kernel.org
2023-11-28 21:50:09 -08:00
Jakub Kicinski
637567e4a3 tools: ynl: add sample for getting page-pool information
Regenerate the tools/ code after netdev spec changes.

Add sample to query page-pool info in a concise fashion:

$ ./page-pool
    eth0[2]	page pools: 10 (zombies: 0)
		refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
		recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-28 15:48:39 +01:00
Jaroslav Kysela
2112aa0349 ALSA: pcm: Introduce MSBITS subformat interface
Improve granularity of format selection for S32/U32 formats by adding
constants representing 20, 24 and MAX most significant bits.

The MAX means the maximum number of significant bits which can
the physical format hold. For 32-bit formats, MAX is related
to 32 bits. For 8-bit formats, MAX is related to 8 bits etc.

As there is only one user currently (format S32_LE), subformat is
represented by a simple u32 and stores flags only for that one user
alone. The approach of subformat being part of struct snd_pcm_hardware
is a compromise between ALSA and ASoC allowing for
hw_params-intersection code to be alloc/free-less while not adding any
new responsibilities to ASoC runtime structures.

Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Co-developed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://lore.kernel.org/r/20231117120610.1755254-2-cezary.rojewski@intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-11-27 17:24:26 +01:00
Arnaldo Carvalho de Melo
57686a72da tools: Disable __packed attribute compiler warning due to -Werror=attributes
Noticed on several perf tools cross build test containers:

  [perfbuilder@five ~]$ grep FAIL ~/dm.log/summary
    19    10.18 debian:experimental-x-mips    : FAIL gcc version 12.3.0 (Debian 12.3.0-6)
    20    11.21 debian:experimental-x-mips64  : FAIL gcc version 12.3.0 (Debian 12.3.0-6)
    21    11.30 debian:experimental-x-mipsel  : FAIL gcc version 12.3.0 (Debian 12.3.0-6)
    37    12.07 ubuntu:18.04-x-arm            : FAIL gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)
    42    11.91 ubuntu:18.04-x-riscv64        : FAIL gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
    44    13.17 ubuntu:18.04-x-sh4            : FAIL gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
    45    12.09 ubuntu:18.04-x-sparc64        : FAIL gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
  [perfbuilder@five ~]$

  In file included from util/intel-pt-decoder/intel-pt-pkt-decoder.c:10:
  /tmp/perf-6.6.0-rc1/tools/include/asm-generic/unaligned.h: In function 'get_unaligned_le16':
  /tmp/perf-6.6.0-rc1/tools/include/asm-generic/unaligned.h:13:29: error: packed attribute causes inefficient alignment for 'x' [-Werror=attributes]
     13 |         const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr);      \
        |                             ^
  /tmp/perf-6.6.0-rc1/tools/include/asm-generic/unaligned.h:27:28: note: in expansion of macro '__get_unaligned_t'
     27 |         return le16_to_cpu(__get_unaligned_t(__le16, p));
        |                            ^~~~~~~~~~~~~~~~~

This comes from the kernel, where the -Wattributes and -Wpacked isn't
used, -Wpacked is already disabled, do it for the attributes as well.

Fixes: a91c987254 ("perf tools: Add get_unaligned_leNN()")
Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/7c5b626c-1de9-4c12-a781-e44985b4a797@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-11-22 12:23:27 -08:00
Namhyung Kim
91c97b36bd tools headers UAPI: Update tools's copy of unistd.h header
tldr; Just FYI, I'm carrying this on the perf tools tree.

Full explanation:

There used to be no copies, with tools/ code using kernel headers
directly. From time to time tools/perf/ broke due to legitimate kernel
hacking. At some point Linus complained about such direct usage. Then we
adopted the current model.

The way these headers are used in perf are not restricted to just
including them to compile something.

There are sometimes used in scripts that convert defines into string
tables, etc, so some change may break one of these scripts, or new MSRs
may use some different #define pattern, etc.

E.g.:

  $ ls -1 tools/perf/trace/beauty/*.sh | head -5
  tools/perf/trace/beauty/arch_errno_names.sh
  tools/perf/trace/beauty/drm_ioctl.sh
  tools/perf/trace/beauty/fadvise.sh
  tools/perf/trace/beauty/fsconfig.sh
  tools/perf/trace/beauty/fsmount.sh
  $
  $ tools/perf/trace/beauty/fadvise.sh
  static const char *fadvise_advices[] = {
        [0] = "NORMAL",
        [1] = "RANDOM",
        [2] = "SEQUENTIAL",
        [3] = "WILLNEED",
        [4] = "DONTNEED",
        [5] = "NOREUSE",
  };
  $

The tools/perf/check-headers.sh script, part of the tools/ build
process, points out changes in the original files.

So its important not to touch the copies in tools/ when doing changes in
the original kernel headers, that will be done later, when
check-headers.sh inform about the change to the perf tools hackers.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20231121225650.390246-6-namhyung@kernel.org
2023-11-22 10:57:46 -08:00