mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-09-04 20:19:47 +08:00
9cb1547891
2229 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
568901e709 |
tools/include: Sync uapi/asm-generic/unistd.h with the kernel sources
And arch syscall tables to pick up changes from: |
||
|
|
8ec9497d3e |
tools/include: Sync uapi/linux/perf.h with the kernel sources
To pick up changes from:
|
||
|
|
a625df3995 |
tools/include: Sync uapi/linux/kvm.h with the kernel sources
And other arch-specific UAPI headers to pick up changes from: |
||
|
|
aef21f6b6a |
tools/include: Sync uapi/drm/i915_drm.h with the kernel sources
To pick up changes from:
|
||
|
|
fbc05142cc |
perf tools: Add tools/include/uapi/README
Write down the reason why we keep a copy of headers to the README file instead of adding it to every commit messages. Suggested-by: Jani Nikula <jani.nikula@linux.intel.com> Original-by: Arnaldo Carvalho de Melo <acme@kernel.org> Original-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
|
|
9e3d665384 |
memblock test: fix implicit declaration of function 'strscpy'
Commit
|
||
|
|
a88cde5769 |
memblock test: fix implicit declaration of function 'memparse'
Commit
|
||
|
|
9f76c2ade3 |
memblock test: add the definition of __setup()
Commit
|
||
|
|
39f64e402f |
memblock test: fix implicit declaration of function 'virt_to_phys'
Commit
|
||
|
|
e2ae9cf39f |
tools/testing: abstract two init.h into common include directory
Currently we have two test suits define its own init.h. This is a little redundant. Let's create a init.h in common include directory and merge these two into it. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> CC: Mike Rapoport <rppt@kernel.org> CC: Liam R. Howlett <Liam.Howlett@oracle.com> Link: https://lore.kernel.org/r/20240712035138.24674-3-richard.weiyang@gmail.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> |
||
|
|
d68c08173b |
memblock tests: include export.h in linkage.h as kernel dose
In kernel code, linkage.h includes export.h. Let's sync with kernel. This is a preparation for move init.h in common include directory. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> CC: Mike Rapoport <rppt@kernel.org> Link: https://lore.kernel.org/r/20240712035138.24674-2-richard.weiyang@gmail.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> |
||
|
|
06dd4c5a56 |
perf annotate: Add disasm_line__parse() to parse raw instruction for powerpc
Currently, the perf tool infrastructure uses the disasm_line__parse function to parse disassembled line. Example snippet from objdump: objdump --start-address=<address> --stop-address=<address> -d --no-show-raw-insn -C <vmlinux> c0000000010224b4: lwz r10,0(r9) This line "lwz r10,0(r9)" is parsed to extract instruction name, registers names and offset. In powerpc, the approach for data type profiling uses raw instruction instead of result from objdump to identify the instruction category and extract the source/target registers. Example: 38 01 81 e8 ld r4,312(r1) Here "38 01 81 e8" is the raw instruction representation. Add function "disasm_line__parse_powerpc" to handle parsing of raw instruction. Also update "struct disasm_line" to save the binary code/ With the change, function captures: line -> "38 01 81 e8 ld r4,312(r1)" raw instruction "38 01 81 e8" Raw instruction is used later to extract the reg/offset fields. Macros are added to extract opcode and register fields. "struct disasm_line" is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw code (raw). Function "disasm_line__parse_powerpc fills the raw instruction hex value and can use macros to get opcode. There is no changes in existing code paths, which parses the disassembled code. The size of raw instruction depends on architecture. In case of powerpc, the parsing the disasm line needs to handle cases for reading binary code directly from DSO as well as parsing the objdump result. Hence adding the logic into separate function instead of updating "disasm_line__parse". The architecture using the instruction name and present approach is not altered. Since this approach targets powerpc, the macro implementation is added for powerpc as of now. Since the disasm_line__parse is used in other cases (perf annotate) and not only data tye profiling, the powerpc callback includes changes to work with binary code as well as mnemonic representation. Also in case if the DSO read fails and libcapstone is not supported, the approach fallback to use objdump as option. Hence as option, patch has changes to ensure objdump option also works well. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-5-atrajeev@linux.vnet.ibm.com [ Add check for strndup() result ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
ae1f550efc |
tools/nolibc: add stdbool.h header
stdbool.h is very simple. Provide an implementation for the user convenience. Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20240725-nolibc-stdbool-v1-1-a6ee2c80bcde@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> |
||
|
|
6ea2987c9a |
tools/nolibc: include arch.h from string.h
string.h tests for the macros NOLIBC_ARCH_HAS_$FUNC to use the architecture-optimized function variants. However if string.h is included before arch.h header then that check does not work, leading to duplicate function definitions. Fixes: |
||
|
|
51c4767503 |
bitmap-6.11-rc1
Random fixes for v6.11. -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmahKbIACgkQsUSA/Tof vsh8zQwAvguyeNubDFqdMe3E/Vp1J3WqXsBFzbE1rGLCyI2S0cgJFL5BlW51zY47 70wLt9EmroEobwj1qHSQlzejNp31kSBQ1Sqq25oivfJqEF1elDT5PQxYqBbU1C9Y kVWnxtb+oKaoFd5jiBK8+iTl8dXjT6H2RoV0zpPab/JPcqsjwFfkUvtENt/Kpo5c aRrGTFwshdp5eT4sEZQv57VKroBcwZOvv2//qrklFHrJHl4pjMT8eaX3twcQysoy umTVt+TK6NErLnht+VRQJ2/L02FKi7b+bHePVgNzaT+1FSDMT4FltmZd96Xwbzah hSkwWtqy0N2gaTcqie9nwdEiCJGjF39M7k2wangUS91CeDsbIUSsJgDCESUCm+zK hRqleGOnoeg4+jZBci7M53lKa/pADlmLhnU8iAc3BSKozsaioltkT+hHn8vAkstk h/kHlbfkzasufUWAhduBpIn384gWWEY6RACffgCsOuvbT+ZyDKUJpKYaEwVx+Pri l72j0hs9 =RbET -----END PGP SIGNATURE----- Merge tag 'bitmap-6.11-rc1' of https://github.com:/norov/linux Pull bitmap updates from Yury Norov: "Random fixes" * tag 'bitmap-6.11-rc1' of https://github.com:/norov/linux: riscv: Remove unnecessary int cast in variable_fls() radix tree test suite: put definition of bitmap_clear() into lib/bitmap.c bitops: Add a comment explaining the double underscore macros lib: bitmap: add missing MODULE_DESCRIPTION() macros cpumask: introduce assign_cpu() macro |
||
|
|
1722389b0d |
A lot of networking people were at a conference last week, busy
catching COVID, so relatively short PR. Including fixes from bpf
and netfilter.
Current release - regressions:
- tcp: process the 3rd ACK with sk_socket for TFO and MPTCP
Current release - new code bugs:
- l2tp: protect session IDR and tunnel session list with one lock,
make sure the state is coherent to avoid a warning
- eth: bnxt_en: update xdp_rxq_info in queue restart logic
- eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field
Previous releases - regressions:
- xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
the field reuses previously un-validated pad
Previous releases - always broken:
- tap/tun: drop short frames to prevent crashes later in the stack
- eth: ice: add a per-VF limit on number of FDIR filters
- af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmaibxAACgkQMUZtbf5S
IruuIRAAu96TiN/urPwmKznyb/Sk8x7p8iUzn6OvPS/TUlFUkURQtOh6M9uvbpN4
x/L//EWkMR0hY4SkBegoiXfb1GS0PjBdWTWUiROm5X9nVHqp5KRZAxWXhjFiS1BO
BIYOT+JfCl7mQiPs90Mys/cEtYOggMBsCZQVIGw/iYoJLFREqxFSONwa0dG+tGMX
jn9WNu4yCVDhJ/jtl2MaTsCNtYUaBUgYrKHJBfNGfJ2Lz/7rH9yFui2WSMlmOd/U
QGeCb1DWURlShlCqY37wNinbFsxWkI5JN00ukTtwFAXLIaqc+zgHcIjrDjTJwK43
F4tKbJT3+bmehMU/h3Uo3c7DhXl7n9zDGiDtbCxnkykp0sFGJpjhDrWydo51c+YB
qW5HaNrII2LiDicOVN8L29ylvKp7AEkClxgivEhZVGGk2f/szJRXfp9u3WBn5kAx
3paH55YN0DEsKbYbb1ZENEI1Vnc/4ff4PxZJCUNKwzcS8wCn1awqwcriK9TjS/cp
fjilNFT4J3/uFrodHWTkx0jJT6UJFT0aF03qPLUH/J5kG+EVukOf1jBPInNdf1si
1j47SpblHUe86HiHphFMt32KZ210lJzWxh8uGma57Y2sB9makdLiK4etrFjkiMJJ
Z8A3kGp3KpFjbuK4tHY25rp+5oxLNNOBNpay29lQrWtCL/NDcaQ=
=9OsH
-----END PGP SIGNATURE-----
Merge tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and netfilter.
A lot of networking people were at a conference last week, busy
catching COVID, so relatively short PR.
Current release - regressions:
- tcp: process the 3rd ACK with sk_socket for TFO and MPTCP
Current release - new code bugs:
- l2tp: protect session IDR and tunnel session list with one lock,
make sure the state is coherent to avoid a warning
- eth: bnxt_en: update xdp_rxq_info in queue restart logic
- eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field
Previous releases - regressions:
- xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
the field reuses previously un-validated pad
Previous releases - always broken:
- tap/tun: drop short frames to prevent crashes later in the stack
- eth: ice: add a per-VF limit on number of FDIR filters
- af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash"
* tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
tun: add missing verification for short frame
tap: add missing verification for short frame
mISDN: Fix a use after free in hfcmulti_tx()
gve: Fix an edge case for TSO skb validity check
bnxt_en: update xdp_rxq_info in queue restart logic
tcp: process the 3rd ACK with sk_socket for TFO/MPTCP
selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
bpf: Fix a segment issue when downgrading gso_size
net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling
MAINTAINERS: make Breno the netconsole maintainer
MAINTAINERS: Update bonding entry
net: nexthop: Initialize all fields in dumped nexthops
net: stmmac: Correct byte order of perfect_match
selftests: forwarding: skip if kernel not support setting bridge fdb learning limit
tipc: Return non-zero value from tipc_udp_addr2str() on error
netfilter: nft_set_pipapo_avx2: disable softinterrupts
ice: Fix recipe read procedure
ice: Add a per-VF limit on number of FDIR filters
net: bonding: correctly annotate RCU in bond_should_notify_peers()
...
|
||
|
|
f7578df913 |
bpf-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZqIl1AAKCRDbK58LschI g/MdAP9oyZV9/IZ6Y6Z1fWfio0SB+yJGugcwbFjWcEtNrzsqJQEAwipQnemAI4NC HBMfK2a/w7vhAFMXrP/SbkB/gUJJ7QE= =vovf -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-07-25 We've added 14 non-merge commits during the last 8 day(s) which contain a total of 19 files changed, 177 insertions(+), 70 deletions(-). The main changes are: 1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and BPF sockhash. Also add test coverage for this case, from Michal Luczaj. 2) Fix a segmentation issue when downgrading gso_size in the BPF helper bpf_skb_adjust_room(), from Fred Li. 3) Fix a compiler warning in resolve_btfids due to a missing type cast, from Liwei Song. 4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte boundary in the fexit_sleep BPF selftest, from Puranjay Mohan. 5) Fix a xsk regression to require a flag when actuating tx_metadata_len, from Stanislav Fomichev. 6) Fix function prototype BTF dumping in libbpf for prototypes that have no input arguments, from Andrii Nakryiko. 7) Fix stacktrace symbol resolution in perf script for BPF programs containing subprograms, from Hou Tao. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len bpf: Fix a segment issue when downgrading gso_size tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids bpf, events: Use prog to emit ksymbol event for main program selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected() af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash bpftool: Fix typo in usage help libbpf: Fix no-args func prototype BTF dumping syntax MAINTAINERS: Update powerpc BPF JIT maintainers MAINTAINERS: Update email address of Naveen selftests/bpf: fexit_sleep: Fix stack allocation for arm64 ==================== Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
9b9969c40b |
selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
This flag is now required to use tx_metadata_len.
Fixes:
|
||
|
|
7a3fad30fd |
Random number generator updates for Linux 6.11-rc1.
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmaarzgACgkQSfxwEqXe A66ZWBAAlhXx8bve0uKlDRK8fffWHgruho/fOY4lZJ137AKwA9JCtmOyqdfL4Dmk VxFe7pEQJlQhcA/6kH54uO7SBXwfKlKZJth6SYnaCRMUIbFifHjjIQ0QqldjEKi0 rP90Hu4FVsbwQC7u9i9lQj9n2P36zb6pn83BzpZQ/2PtoVCSCrdSJUe0Rxa3H3GN 0+nNkDSXQt5otCByLaeE3x7KJgXLWL9+G2eFSFLTZ8rSVfMx1CdOIAG37WlLGdWm BaFYPDKMyBTVvVJBNgAe9YSqtrsZ5nlmLz+Z9wAe/hTL7RlL03kWUu34/Udcpull zzMDH0WMntiGK3eFQ2gOYSWqypvAjwHgn3BzqNmjUb69+89mZsdU1slcvnxWsUwU D3vphrscaqarF629tfsXti3jc5PoXwUTjROZVcCyeFPBhyAZgzK8xUvPpJO+RT+K EuUABob9cpA6FCpW/QeolDmMDhXlNT8QgsZu1juokZac2xP3Ly3REyEvT7HLbU2W ZJjbEqm1ppp3RmGELUOJbyhwsLrnbt+OMDO7iEWoG8aSFK4diBK/ZM6WvLMkr8Oi 7ioXGIsYkCy3c47wpZKTrAapOPJp5keqNAiHSEbXw8mozp6429QAEZxNOcczgHKC Ea2JzRkctqutcIT+Slw/uUe//i1iSsIHXbE81fp5udcQTJcUByo= =P8aI -----END PGP SIGNATURE----- Merge tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: "This adds getrandom() support to the vDSO. First, it adds a new kind of mapping to mmap(2), MAP_DROPPABLE, which lets the kernel zero out pages anytime under memory pressure, which enables allocating memory that never gets swapped to disk but also doesn't count as being mlocked. Then, the vDSO implementation of getrandom() is introduced in a generic manner and hooked into random.c. Next, this is implemented on x86. (Also, though it's not ready for this pull, somebody has begun an arm64 implementation already) Finally, two vDSO selftests are added. There are also two housekeeping cleanup commits" * tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: MAINTAINERS: add random.h headers to RNG subsection random: note that RNDGETPOOL was removed in 2.6.9-rc2 selftests/vDSO: add tests for vgetrandom x86: vdso: Wire up getrandom() vDSO implementation random: introduce generic vDSO getrandom() implementation mm: add MAP_DROPPABLE for designating always lazily freeable mappings |
||
|
|
fbc90c042c |
- 875fa64577da ("mm/hugetlb_vmemmap: fix race with speculative PFN
walkers") is known to cause a performance regression (https://lore.kernel.org/all/3acefad9-96e5-4681-8014-827d6be71c7a@linux.ibm.com/T/#mfa809800a7862fb5bdf834c6f71a3a5113eb83ff). Yu has a fix which I'll send along later via the hotfixes branch. - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Is anyone reading this stuff? If so, email me! - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2C+QAKCRDdBJ7gKXxA joTkAQDvjqOoFStqk4GU3OXMYB7WCU/ZQMFG0iuu1EEwTVDZ4QEA8CnG7seek1R3 xEoo+vw0sWWeLV3qzsxnCA1BJ8cTJA8= =z0Lf -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ... |
||
|
|
2c9b351240 |
ARM:
* Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement * Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware * Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol * FPSIMD/SVE support for nested, including merged trap configuration and exception routing * New command-line parameter to control the WFx trap behavior under KVM * Introduce kCFI hardening in the EL2 hypervisor * Fixes + cleanups for handling presence/absence of FEAT_TCRX * Miscellaneous fixes + documentation updates LoongArch: * Add paravirt steal time support. * Add support for KVM_DIRTY_LOG_INITIALLY_SET. * Add perf kvm-stat support for loongarch. RISC-V: * Redirect AMO load/store access fault traps to guest * perf kvm stat support * Use guest files for IMSIC virtualization, when available ONE_REG support for the Zimop, Zcmop, Zca, Zcf, Zcd, Zcb and Zawrs ISA extensions is coming through the RISC-V tree. s390: * Assortment of tiny fixes which are not time critical x86: * Fixes for Xen emulation. * Add a global struct to consolidate tracking of host values, e.g. EFER * Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX. * Print the name of the APICv/AVIC inhibits in the relevant tracepoint. * Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor. * Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop. * Update to the newfangled Intel CPU FMS infrastructure. * Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored. * Misc cleanups x86 - MMU: * Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support. * Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs. * Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages. * Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards. x86 - AMD: * Make per-CPU save_area allocations NUMA-aware. * Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code. * Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges. This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification. There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data. x86 - Intel: * Remove an unnecessary EPT TLB flush when enabling hardware. * Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1). * KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation. Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed. See commit |
||
|
|
4920a2590e |
selftests/vDSO: add tests for vgetrandom
This adds two tests for vgetrandom. The first one, vdso_test_chacha, simply checks that the assembly implementation of chacha20 matches that of libsodium, a basic sanity check that should catch most errors. The second, vdso_test_getrandom, is a full "libc-like" implementation of the userspace side of vgetrandom() support. It's meant to be used also as example code for libcs that might be integrating this. Cc: linux-kselftest@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> |
||
|
|
9651fcedf7 |
mm: add MAP_DROPPABLE for designating always lazily freeable mappings
The vDSO getrandom() implementation works with a buffer allocated with a
new system call that has certain requirements:
- It shouldn't be written to core dumps.
* Easy: VM_DONTDUMP.
- It should be zeroed on fork.
* Easy: VM_WIPEONFORK.
- It shouldn't be written to swap.
* Uh-oh: mlock is rlimited.
* Uh-oh: mlock isn't inherited by forks.
- It shouldn't reserve actual memory, but it also shouldn't crash when
page faulting in memory if none is available
* Uh-oh: VM_NORESERVE means segfaults.
It turns out that the vDSO getrandom() function has three really nice
characteristics that we can exploit to solve this problem:
1) Due to being wiped during fork(), the vDSO code is already robust to
having the contents of the pages it reads zeroed out midway through
the function's execution.
2) In the absolute worst case of whatever contingency we're coding for,
we have the option to fallback to the getrandom() syscall, and
everything is fine.
3) The buffers the function uses are only ever useful for a maximum of
60 seconds -- a sort of cache, rather than a long term allocation.
These characteristics mean that we can introduce VM_DROPPABLE, which
has the following semantics:
a) It never is written out to swap.
b) Under memory pressure, mm can just drop the pages (so that they're
zero when read back again).
c) It is inherited by fork.
d) It doesn't count against the mlock budget, since nothing is locked.
e) If there's not enough memory to service a page fault, it's not fatal,
and no signal is sent.
This way, allocations used by vDSO getrandom() can use:
VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE
And there will be no problem with OOMing, crashing on overcommitment,
using memory when not in use, not wiping on fork(), coredumps, or
writing out to swap.
In order to let vDSO getrandom() use this, expose these via mmap(2) as
MAP_DROPPABLE.
Note that this involves removing the MADV_FREE special case from
sort_folio(), which according to Yu Zhao is unnecessary and will simply
result in an extra call to shrink_folio_list() in the worst case. The
chunk removed reenables the swapbacked flag, which we don't want for
VM_DROPPABLE, and we can't conditionalize it here because there isn't a
vma reference available.
Finally, the provided self test ensures that this is working as desired.
Cc: linux-mm@kvack.org
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
||
|
|
76d9b92e68 |
slab updates for 6.11
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmaXl0kACgkQu+CwddJF
iJrOlgf+N/G7BmgoW2CBF7mKsvCYs+pX3xeBuxPtsuq4FD386nsPFMN8gWAYLG3q
ZU1z1S+0M8LhTg6/G9jMYLHt2Y7WhYbhFTjTHmULJkuhMDTUP9CRYy4XZ+hdPtHF
30ezSdJQF9x/XxCSaaRVK1s+SMVHFg5xAOHKpfkNSamcMz9g+ZkYyPBr10/VoKd0
JqwhW7r6hrlvWAiqY3QKCOvohIWglgvBUnNjUGMh1cUkOE2aYLYHklhRwICKgA6z
p/2BUXiAEWUtgBkUrizwm/pdhJXLs0pOeYarVZP1v83tQMxyrc6XLNnqhvxP3DPW
31thF5Rf9I8WaWTczXhxsAwFjqO3KQ==
=4uf9
-----END PGP SIGNATURE-----
Merge tag 'slab-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
"The most prominent change this time is the kmem_buckets based
hardening of kmalloc() allocations from Kees Cook.
We have also extended the kmalloc() alignment guarantees for
non-power-of-two sizes in a way that benefits rust.
The rest are various cleanups and non-critical fixups.
- Dedicated bucket allocator (Kees Cook)
This series [1] enhances the probabilistic defense against heap
spraying/grooming of CONFIG_RANDOM_KMALLOC_CACHES from last year.
kmalloc() users that are known to be useful for exploits can get
completely separate set of kmalloc caches that can't be shared with
other users. The first converted users are alloc_msg() and
memdup_user().
The hardening is enabled by CONFIG_SLAB_BUCKETS.
- Extended kmalloc() alignment guarantees (Vlastimil Babka)
For years now we have guaranteed natural alignment for power-of-two
allocations, but nothing was defined for other sizes (in practice,
we have two such buckets, kmalloc-96 and kmalloc-192).
To avoid unnecessary padding in the rust layer due to its alignment
rules, extend the guarantee so that the alignment is at least the
largest power-of-two divisor of the requested size.
This fits what rust needs, is a superset of the existing
power-of-two guarantee, and does not in practice change the layout
(and thus does not add overhead due to padding) of the kmalloc-96
and kmalloc-192 caches, unless slab debugging is enabled for them.
- Cleanups and non-critical fixups (Chengming Zhou, Suren
Baghdasaryan, Matthew Willcox, Alex Shi, and Vlastimil Babka)
Various tweaks related to the new alloc profiling code, folio
conversion, debugging and more leftovers after SLAB"
Link: https://lore.kernel.org/all/20240701190152.it.631-kees@kernel.org/ [1]
* tag 'slab-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/memcg: alignment memcg_data define condition
mm, slab: move prepare_slab_obj_exts_hook under CONFIG_MEM_ALLOC_PROFILING
mm, slab: move allocation tagging code in the alloc path into a hook
mm/util: Use dedicated slab buckets for memdup_user()
ipc, msg: Use dedicated slab buckets for alloc_msg()
mm/slab: Introduce kmem_buckets_create() and family
mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument
mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
mm/slab: Introduce kmem_buckets typedef
slab, rust: extend kmalloc() alignment guarantees to remove Rust padding
slab: delete useless RED_INACTIVE and RED_ACTIVE
slab: don't put freepointer outside of object if only orig_size
slab: make check_object() more consistent
mm: Reduce the number of slab->folio casts
mm, slab: don't wrap internal functions with alloc_hooks()
|
||
|
|
b2fc97c186 |
memblock: updates for 6.11-rc1
* reserve_mem command line parameter to allow creation of named memory reservation at boot time. The driving use-case is to improve the ability of pstore to retain ramoops data across reboots. * cleaunps and small improvements in memblock and mm_init * new tests cases in memblock test suite -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmaXfoIQHHJwcHRAa2Vy bmVsLm9yZwAKCRA5A4Ymyw79kU5mCAC23vIrB8FRlORczMYj+V3VFss3OjKT92lS fHGwq2oxHW+rdDpHXFObHU0D3k8d2l5jyrENRAAyA02qR0L6Pv8Na6pGxQua1eic VIdw0PFQMsizD1AIj84Y6skkyyF/tvZHpmX0B12D5+Ur65DC/Z867Cm/lE33/fHv /1+QB0JlG7W+FzxVstYyebY5/DVkH+bC7/A57FE2oB4BRXjEd8v9tTHBS4kRSvrE zE2KFxeGajN749LHztIpIprPKehu8Gc3oLrYLNJO+uLFVCV8ey3OqVj0RXMG2wLl hmVYqhbZM/Uz59D/P8pULD49f1Thjv/5A/MvUZ3SxM6zpWlsincf =xrZd -----END PGP SIGNATURE----- Merge tag 'memblock-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock updates from Mike Rapoport: - 'reserve_mem' command line parameter to allow creation of named memory reservation at boot time. The driving use-case is to improve the ability of pstore to retain ramoops data across reboots. - cleanups and small improvements in memblock and mm_init - new tests cases in memblock test suite * tag 'memblock-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock tests: fix implicit declaration of function 'numa_valid_node' memblock: Move late alloc warning down to phys alloc pstore/ramoops: Add ramoops.mem_name= command line option mm/memblock: Add "reserve_mem" to reserved named memory at boot up mm/mm_init.c: don't initialize page->lru again mm/mm_init.c: not always search next deferred_init_pfn from very beginning mm/mm_init.c: use deferred_init_mem_pfn_range_in_zone() to decide loop condition mm/mm_init.c: get the highest zone directly mm/mm_init.c: move nr_initialised reset down a bit mm/memblock: fix a typo in description of for_each_mem_region() mm/mm_init.c: use memblock_region_memory_base_pfn() to get startpfn mm/memblock: use PAGE_ALIGN_DOWN to get pgend in free_memmap mm/memblock: return true directly on finding overlap region memblock tests: add memblock_overlaps_region_checks mm/memblock: fix comment for memblock_isolate_range() memblock tests: add memblock_reserve_many_may_conflict_check() memblock tests: add memblock_reserve_all_locations_check() mm/memblock: remove empty dummy entry |
||
|
|
91bd008d4e |
Probes updates for v6.11:
Uprobes:
- x86/shstk: Make return uprobe work with shadow stack.
- Add uretprobe syscall which speeds up the uretprobe 10-30% faster. This
syscall is automatically used from user-space trampolines which are
generated by the uretprobe. If this syscall is used by normal
user program, it will cause SIGILL. Note that this is currently only
implemented on x86_64.
(This also has 2 fixes for adjusting the syscall number to avoid conflict
with new *attrat syscalls.)
- uprobes/perf: fix user stack traces in the presence of pending uretprobe.
This corrects the uretprobe's trampoline address in the stacktrace with
correct return address.
- selftests/x86: Add a return uprobe with shadow stack test.
- selftests/bpf: Add uretprobe syscall related tests.
. test case for register integrity check.
. test case with register changing case.
. test case for uretprobe syscall without uprobes (expected to be failed).
. test case for uretprobe with shadow stack.
- selftests/bpf: add test validating uprobe/uretprobe stack traces
- MAINTAINERS: Add uprobes entry. This does not specify the tree but to
clarify who maintains and reviews the uprobes.
Kprobes:
- tracing/kprobes: Test case cleanups. Replace redundant WARN_ON_ONCE() +
pr_warn() with WARN_ONCE() and remove unnecessary code from selftest.
- tracing/kprobes: Add symbol counting check when module loads. This
checks the uniqueness of the probed symbol on modules. The same check
has already done for kernel symbols.
(This also has a fix for build error with CONFIG_MODULES=n)
Cleanup:
- Add MODULE_DESCRIPTION() macros for fprobe and kprobe examples.
-----BEGIN PGP SIGNATURE-----
iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmaWYxwbHG1hc2FtaS5o
aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bsUgH/3JcSzDZujQWCZ1f4fJn
QecvTFSYcCl6ck8+/3wm4EsgeCXIFOyPnoPc7k2Gm+l6Dlk1DKGV6wV4tuKFUq9X
9mplcwoVA0Ln+EX9zv9v4s99yUGxcU9xjgC9XT7J52SvqYncPIi6dR0Z9wlJBmyd
Bx3cZk+wSzCYaoqYngI2fKlzsEcYgDIP999fQPRi0HGzNZujc4xeJyjCTC/48yWO
9kreRQq6wFdgRQTwMcR/fKPDKIGZQCU8jkXv5crVV5K3rNaBcwBmCJJMP8PzPU0V
UQ0+8RZK+Qk8SBwXcMNVRqm/efTderob4IYxP8OBe5wjAIE7+vu8r6sqwxRIS54M
Cyg=
=DRSr
-----END PGP SIGNATURE-----
Merge tag 'probes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
"Uprobes:
- x86/shstk: Make return uprobe work with shadow stack
- Add uretprobe syscall which speeds up the uretprobe 10-30% faster.
This syscall is automatically used from user-space trampolines
which are generated by the uretprobe. If this syscall is used by
normal user program, it will cause SIGILL. Note that this is
currently only implemented on x86_64.
(This also has two fixes for adjusting the syscall number to avoid
conflict with new *attrat syscalls.)
- uprobes/perf: fix user stack traces in the presence of pending
uretprobe. This corrects the uretprobe's trampoline address in the
stacktrace with correct return address
- selftests/x86: Add a return uprobe with shadow stack test
- selftests/bpf: Add uretprobe syscall related tests.
- test case for register integrity check
- test case with register changing case
- test case for uretprobe syscall without uprobes (expected to fail)
- test case for uretprobe with shadow stack
- selftests/bpf: add test validating uprobe/uretprobe stack traces
- MAINTAINERS: Add uprobes entry. This does not specify the tree but
to clarify who maintains and reviews the uprobes
Kprobes:
- tracing/kprobes: Test case cleanups.
Replace redundant WARN_ON_ONCE() + pr_warn() with WARN_ONCE() and
remove unnecessary code from selftest
- tracing/kprobes: Add symbol counting check when module loads.
This checks the uniqueness of the probed symbol on modules. The
same check has already done for kernel symbols
(This also has a fix for build error with CONFIG_MODULES=n)
Cleanup:
- Add MODULE_DESCRIPTION() macros for fprobe and kprobe examples"
* tag 'probes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
MAINTAINERS: Add uprobes entry
selftests/bpf: Change uretprobe syscall number in uprobe_syscall test
uprobe: Change uretprobe syscall scope and number
tracing/kprobes: Fix build error when find_module() is not available
tracing/kprobes: Add symbol counting check when module loads
selftests/bpf: add test validating uprobe/uretprobe stack traces
perf,uprobes: fix user stack traces in the presence of pending uretprobes
tracing/kprobe: Remove cleanup code unrelated to selftest
tracing/kprobe: Integrate test warnings into WARN_ONCE
selftests/bpf: Add uretprobe shadow stack test
selftests/bpf: Add uretprobe syscall call from user space test
selftests/bpf: Add uretprobe syscall test for regs changes
selftests/bpf: Add uretprobe syscall test for regs integrity
selftests/x86: Add return uprobe shadow stack test
uprobe: Add uretprobe syscall to speed up return probe
uprobe: Wire up uretprobe system call
x86/shstk: Make return uprobe work with shadow stack
samples: kprobes: add missing MODULE_DESCRIPTION() macros
fprobe: add missing MODULE_DESCRIPTION() macro
|
||
|
|
51835949dd |
Networking changes for 6.11. Not much excitement - a handful of large
patchsets (devmem among them) did not make it in time.
Core & protocols
----------------
- Use local_lock in addition to local_bh_disable() to protect per-CPU
resources in networking, a step closer for local_bh_disable() not
to act as a big lock on PREEMPT_RT.
- Use flex array for netdevice priv area, ensure its cache alignment.
- Add a sysctl knob to allow user to specify a default rto_min at socket
init time. Bit of a big hammer but multiple companies were
independently carrying such patch downstream so clearly it's useful.
- Support scheduling transmission of packets based on CLOCK_TAI.
- Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned off
using cpusets.
- Support multiple L2TPv3 UDP tunnels using the same 5-tuple address.
- Allow configuration of multipath hash seed, to both allow synchronizing
hashing of two routers, and preventing partial accidental sync.
- Improve TCP compliance with RFC 9293 for simultaneous connect().
- Support sending NAT keepalives in IPsec ESP in UDP states. Userspace
IKE daemon had to do this before, but the kernel can better keep
track of it.
- Support sending supervision HSR frames with MAC addresses stored in
ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled.
- Introduce IPPROTO_SMC for selecting SMC when socket is created.
- Allow UDP GSO transmit from devices with no checksum offload.
- openvswitch: add packet sampling via psample, separating the sampled
traffic from "upcall" packets sent to user space for forwarding.
- nf_tables: shrink memory consumption for transaction objects.
Things we sprinkled into general kernel code
--------------------------------------------
- Power Sequencing subsystem (used by Qualcomm Bluetooth driver
for QCA6390).
- Add IRQ information in sysfs for auxiliary bus.
- Introduce guard definition for local_lock.
- Add aligned flavor of __cacheline_group_{begin, end}() markings for
grouping fields in structures.
BPF
---
- Notify user space (via epoll) when a struct_ops object is getting
detached/unregistered.
- Add new kfuncs for a generic, open-coded bits iterator.
- Enable BPF programs to declare arrays of kptr, bpf_rb_root, and
bpf_list_head.
- Support resilient split BTF which cuts down on duplication and makes
BTF as compact as possible WRT BTF from modules.
- Add support for dumping kfunc prototypes from BTF which enables both
detecting as well as dumping compilable prototypes for kfuncs.
- riscv64 BPF JIT improvements in particular to add 12-argument support
for BPF trampolines and to utilize bpf_prog_pack for the latter.
- Add the capability to offload the netfilter flowtable in XDP layer
through kfuncs.
Driver API
----------
- Allow users to configure IRQ tresholds between which automatic IRQ
moderation can choose.
- Expand Power Sourcing (PoE) status with power, class and failure
reason. Support setting power limits.
- Track additional RSS contexts in the core, make sure configuration
changes don't break them.
- Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP
data paths.
- Support updating firmware on SFP modules.
Tests and tooling
-----------------
- mptcp: use net/lib.sh to manage netns.
- TCP-AO and TCP-MD5: replace debug prints used by tests with
tracepoints.
- openvswitch: make test self-contained (don't depend on OvS CLI tools).
Drivers
-------
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- increase the max total outstanding PTP TX packets to 4
- add timestamping statistics support
- implement netdev_queue_mgmt_ops
- support new RSS context API
- Intel (100G, ice, idpf):
- implement FEC statistics and dumping signal quality indicators
- support E825C products (with 56Gbps PHYs)
- nVidia/Mellanox:
- support HW-GRO
- mlx4/mlx5: support per-queue statistics via netlink
- obey the max number of EQs setting in sub-functions
- AMD/Solarflare:
- support new RSS context API
- AMD/Pensando:
- ionic: rework fix for doorbell miss to lower overhead
and skip it on new HW
- Wangxun:
- txgbe: support Flow Director perfect filters
- Ethernet NICs consumer, embedded and virtual:
- Add driver for Tehuti Networks TN40xx chips
- Add driver for Meta's internal NIC chips
- Add driver for Ethernet MAC on Airoha EN7581 SoCs
- Add driver for Renesas Ethernet-TSN devices
- Google cloud vNIC:
- flow steering support
- Microsoft vNIC:
- support page sizes other than 4KB on ARM64
- vmware vNIC:
- support latency measurement (update to version 9)
- VirtIO net:
- support for Byte Queue Limits
- support configuring thresholds for automatic IRQ moderation
- support for AF_XDP Rx zero-copy
- Synopsys (stmmac):
- support for STM32MP13 SoC
- let platforms select the right PCS implementation
- TI:
- icssg-prueth: add multicast filtering support
- icssg-prueth: enable PTP timestamping and PPS
- Renesas:
- ravb: improve Rx performance 30-400% by using page pool,
theaded NAPI and timer-based IRQ coalescing
- ravb: add MII support for R-Car V4M
- Cadence (macb):
- macb: add ARP support to Wake-On-LAN
- Cortina:
- use phylib for RX and TX pause configuration
- Ethernet switches:
- nVidia/Mellanox:
- support configuration of multipath hash seed
- report more accurate max MTU
- use page_pool to improve Rx performance
- MediaTek:
- mt7530: add support for bridge port isolation
- Qualcomm:
- qca8k: add support for bridge port isolation
- Microchip:
- lan9371/2: add 100BaseTX PHY support
- NXP:
- vsc73xx: implement VLAN operations
- Ethernet PHYs:
- aquantia: enable support for aqr115c
- aquantia: add support for PHY LEDs
- realtek: add support for rtl8224 2.5Gbps PHY
- xpcs: add memory-mapped device support
- add BroadR-Reach link mode and support in Broadcom's PHY driver
- CAN:
- add document for ISO 15765-2 protocol support
- mcp251xfd: workaround for erratum DS80000789E, use timestamps
to catch when device returns incorrect FIFO status
- WiFi:
- mac80211/cfg80211:
- parse Transmit Power Envelope (TPE) data in mac80211 instead of
in drivers
- improvements for 6 GHz regulatory flexibility
- multi-link improvements
- support multiple radios per wiphy
- remove DEAUTH_NEED_MGD_TX_PREP flag
- Intel (iwlwifi):
- bump FW API to 91 for BZ/SC devices
- report 64-bit radiotap timestamp
- enable P2P low latency by default
- handle Transmit Power Envelope (TPE) advertised by AP
- remove support for older FW for new devices
- fast resume (keeping the device configured)
- mvm: re-enable Multi-Link Operation (MLO)
- aggregation (A-MSDU) optimizations
- MediaTek (mt76):
- mt7925 Multi-Link Operation (MLO) support
- Qualcomm (ath10k):
- LED support for various chipsets
- Qualcomm (ath12k):
- remove unsupported Tx monitor handling
- support channel 2 in 6 GHz band
- support Spatial Multiplexing Power Save (SMPS) in 6 GHz band
- supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID
Advertisements (EMA)
- support dynamic VLAN
- add panic handler for resetting the firmware state
- DebugFS support for datapath statistics
- WCN7850: support for Wake on WLAN
- Microchip (wilc1000):
- read MAC address during probe to make it visible to user space
- suspend/resume improvements
- TI (wl18xx):
- support newer firmware versions
- RealTek (rtw89):
- preparation for RTL8852BE-VT support
- Wake on WLAN support for WiFi 6 chips
- 36-bit PCI DMA support
- RealTek (rtlwifi):
- RTL8192DU support
- Broadcom (brcmfmac):
- Management Frame Protection support (to enable WPA3)
- Bluetooth:
- qualcomm: use the power sequencer for QCA6390
- btusb: mediatek: add ISO data transmission functions
- hci_bcm4377: add BCM4388 support
- btintel: add support for BlazarU core
- btintel: add support for Whale Peak2
- btnxpuart: add support for AW693 A1 chipset
- btnxpuart: add support for IW615 chipset
- btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmaWjBwACgkQMUZtbf5S
IrvuSRAAkJuEzTRqgURBCe4eNEQde6mJJig7l2CKHwCbFiHZpRkFHf8qKbcGWbL6
uLW33SWnKtJVDhxVKWHLq635XW7BAa80YhqGw21GDi+mIEhWXZglHj3xbXNxsMfE
4eg/kG4BkfYWFmHaXOwVWV/mr7nXf6j7WmXNeXEi32ufE1j0OL+YlQenKnMj8yP2
j9JmYa2Chwppng1SblHmcjmGkdNVwFhStKeCG+2K7v06wdDH/QYBlbgUv9gw/cxp
NlW//wgiaeX40U4O3kDwt9C+LDoh+0VrDDeVdQ+IsScLtY3PhAzEoKolFYTq2HSr
I1JpoaHNnyNsJq3DZrACQ5WlH4yDn6C2EUB6dxNnFaI9F1ZPsi+7MTl6Sei1AklD
TuQTj/lxOACBwW2Q77NU72uoxiIUauesGPHcnrAFuoCIEhZF0mso7k59BvrXhsOP
QwcLbQdc1YHNkqv/Vc7NBY+ruMsYB+5Ubbhhj2p27dp/CWFIwxI29fze4dn2uhO6
ejHN3mbqwPdSzg12YJtM6Iq61Cnwo2eVSvhTxl+ZVSZtI4nu2arzR+y7QTYmNrXP
6tkgVN9UsWeLl2xJ8wyyqL5mcvNHP2rPXWZ2X56iTaa26m+UlleeQ7YRaYtQAAr0
Ec/vlDMX64SwHhd+qwE99DXGQf2g+KklHKSLsnajJUVrWFTlRI0=
=opz8
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Not much excitement - a handful of large patchsets (devmem among them)
did not make it in time.
Core & protocols:
- Use local_lock in addition to local_bh_disable() to protect per-CPU
resources in networking, a step closer for local_bh_disable() not
to act as a big lock on PREEMPT_RT
- Use flex array for netdevice priv area, ensure its cache alignment
- Add a sysctl knob to allow user to specify a default rto_min at
socket init time. Bit of a big hammer but multiple companies were
independently carrying such patch downstream so clearly it's useful
- Support scheduling transmission of packets based on CLOCK_TAI
- Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned
off using cpusets
- Support multiple L2TPv3 UDP tunnels using the same 5-tuple address
- Allow configuration of multipath hash seed, to both allow
synchronizing hashing of two routers, and preventing partial
accidental sync
- Improve TCP compliance with RFC 9293 for simultaneous connect()
- Support sending NAT keepalives in IPsec ESP in UDP states.
Userspace IKE daemon had to do this before, but the kernel can
better keep track of it
- Support sending supervision HSR frames with MAC addresses stored in
ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled
- Introduce IPPROTO_SMC for selecting SMC when socket is created
- Allow UDP GSO transmit from devices with no checksum offload
- openvswitch: add packet sampling via psample, separating the
sampled traffic from "upcall" packets sent to user space for
forwarding
- nf_tables: shrink memory consumption for transaction objects
Things we sprinkled into general kernel code:
- Power Sequencing subsystem (used by Qualcomm Bluetooth driver for
QCA6390) [ Already merged separately - Linus ]
- Add IRQ information in sysfs for auxiliary bus
- Introduce guard definition for local_lock
- Add aligned flavor of __cacheline_group_{begin, end}() markings for
grouping fields in structures
BPF:
- Notify user space (via epoll) when a struct_ops object is getting
detached/unregistered
- Add new kfuncs for a generic, open-coded bits iterator
- Enable BPF programs to declare arrays of kptr, bpf_rb_root, and
bpf_list_head
- Support resilient split BTF which cuts down on duplication and
makes BTF as compact as possible WRT BTF from modules
- Add support for dumping kfunc prototypes from BTF which enables
both detecting as well as dumping compilable prototypes for kfuncs
- riscv64 BPF JIT improvements in particular to add 12-argument
support for BPF trampolines and to utilize bpf_prog_pack for the
latter
- Add the capability to offload the netfilter flowtable in XDP layer
through kfuncs
Driver API:
- Allow users to configure IRQ tresholds between which automatic IRQ
moderation can choose
- Expand Power Sourcing (PoE) status with power, class and failure
reason. Support setting power limits
- Track additional RSS contexts in the core, make sure configuration
changes don't break them
- Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated
ESP data paths
- Support updating firmware on SFP modules
Tests and tooling:
- mptcp: use net/lib.sh to manage netns
- TCP-AO and TCP-MD5: replace debug prints used by tests with
tracepoints
- openvswitch: make test self-contained (don't depend on OvS CLI
tools)
Drivers:
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- increase the max total outstanding PTP TX packets to 4
- add timestamping statistics support
- implement netdev_queue_mgmt_ops
- support new RSS context API
- Intel (100G, ice, idpf):
- implement FEC statistics and dumping signal quality indicators
- support E825C products (with 56Gbps PHYs)
- nVidia/Mellanox:
- support HW-GRO
- mlx4/mlx5: support per-queue statistics via netlink
- obey the max number of EQs setting in sub-functions
- AMD/Solarflare:
- support new RSS context API
- AMD/Pensando:
- ionic: rework fix for doorbell miss to lower overhead and
skip it on new HW
- Wangxun:
- txgbe: support Flow Director perfect filters
- Ethernet NICs consumer, embedded and virtual:
- Add driver for Tehuti Networks TN40xx chips
- Add driver for Meta's internal NIC chips
- Add driver for Ethernet MAC on Airoha EN7581 SoCs
- Add driver for Renesas Ethernet-TSN devices
- Google cloud vNIC:
- flow steering support
- Microsoft vNIC:
- support page sizes other than 4KB on ARM64
- vmware vNIC:
- support latency measurement (update to version 9)
- VirtIO net:
- support for Byte Queue Limits
- support configuring thresholds for automatic IRQ moderation
- support for AF_XDP Rx zero-copy
- Synopsys (stmmac):
- support for STM32MP13 SoC
- let platforms select the right PCS implementation
- TI:
- icssg-prueth: add multicast filtering support
- icssg-prueth: enable PTP timestamping and PPS
- Renesas:
- ravb: improve Rx performance 30-400% by using page pool,
theaded NAPI and timer-based IRQ coalescing
- ravb: add MII support for R-Car V4M
- Cadence (macb):
- macb: add ARP support to Wake-On-LAN
- Cortina:
- use phylib for RX and TX pause configuration
- Ethernet switches:
- nVidia/Mellanox:
- support configuration of multipath hash seed
- report more accurate max MTU
- use page_pool to improve Rx performance
- MediaTek:
- mt7530: add support for bridge port isolation
- Qualcomm:
- qca8k: add support for bridge port isolation
- Microchip:
- lan9371/2: add 100BaseTX PHY support
- NXP:
- vsc73xx: implement VLAN operations
- Ethernet PHYs:
- aquantia: enable support for aqr115c
- aquantia: add support for PHY LEDs
- realtek: add support for rtl8224 2.5Gbps PHY
- xpcs: add memory-mapped device support
- add BroadR-Reach link mode and support in Broadcom's PHY driver
- CAN:
- add document for ISO 15765-2 protocol support
- mcp251xfd: workaround for erratum DS80000789E, use timestamps to
catch when device returns incorrect FIFO status
- WiFi:
- mac80211/cfg80211:
- parse Transmit Power Envelope (TPE) data in mac80211 instead
of in drivers
- improvements for 6 GHz regulatory flexibility
- multi-link improvements
- support multiple radios per wiphy
- remove DEAUTH_NEED_MGD_TX_PREP flag
- Intel (iwlwifi):
- bump FW API to 91 for BZ/SC devices
- report 64-bit radiotap timestamp
- enable P2P low latency by default
- handle Transmit Power Envelope (TPE) advertised by AP
- remove support for older FW for new devices
- fast resume (keeping the device configured)
- mvm: re-enable Multi-Link Operation (MLO)
- aggregation (A-MSDU) optimizations
- MediaTek (mt76):
- mt7925 Multi-Link Operation (MLO) support
- Qualcomm (ath10k):
- LED support for various chipsets
- Qualcomm (ath12k):
- remove unsupported Tx monitor handling
- support channel 2 in 6 GHz band
- support Spatial Multiplexing Power Save (SMPS) in 6 GHz band
- supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID
Advertisements (EMA)
- support dynamic VLAN
- add panic handler for resetting the firmware state
- DebugFS support for datapath statistics
- WCN7850: support for Wake on WLAN
- Microchip (wilc1000):
- read MAC address during probe to make it visible to user space
- suspend/resume improvements
- TI (wl18xx):
- support newer firmware versions
- RealTek (rtw89):
- preparation for RTL8852BE-VT support
- Wake on WLAN support for WiFi 6 chips
- 36-bit PCI DMA support
- RealTek (rtlwifi):
- RTL8192DU support
- Broadcom (brcmfmac):
- Management Frame Protection support (to enable WPA3)
- Bluetooth:
- qualcomm: use the power sequencer for QCA6390
- btusb: mediatek: add ISO data transmission functions
- hci_bcm4377: add BCM4388 support
- btintel: add support for BlazarU core
- btintel: add support for Whale Peak2
- btnxpuart: add support for AW693 A1 chipset
- btnxpuart: add support for IW615 chipset
- btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591"
* tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1589 commits)
eth: fbnic: Fix spelling mistake "tiggerring" -> "triggering"
tcp: Replace strncpy() with strscpy()
wifi: ath12k: fix build vs old compiler
tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child().
eth: fbnic: Write the TCAM tables used for RSS control and Rx to host
eth: fbnic: Add L2 address programming
eth: fbnic: Add basic Rx handling
eth: fbnic: Add basic Tx handling
eth: fbnic: Add link detection
eth: fbnic: Add initial messaging to notify FW of our presence
eth: fbnic: Implement Rx queue alloc/start/stop/free
eth: fbnic: Implement Tx queue alloc/start/stop/free
eth: fbnic: Allocate a netdevice and napi vectors with queues
eth: fbnic: Add FW communication mechanism
eth: fbnic: Add message parsing for FW messages
eth: fbnic: Add register init to set PCIe/Ethernet device config
eth: fbnic: Allocate core device specific structures and devlink interface
eth: fbnic: Add scaffolding for Meta's NIC driver
PCI: Add Meta Platforms vendor ID
net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK
...
|
||
|
|
d80f2996b8 |
asm-generic updates for 6.11
Most of this is part of my ongoing work to clean up the system call tables. In this bit, all of the newer architectures are converted to use the machine readable syscall.tbl format instead in place of complex macros in include/uapi/asm-generic/unistd.h. This follows an earlier series that fixed various API mismatches and in turn is used as the base for planned simplifications. The other two patches are dead code removal and a warning fix. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmaVB1cACgkQYKtH/8kJ UicMqxAAnYKOxfjoMIhYYK6bl126wg/vIcDcjIR9cNWH21Nhn3qxn11ZXau3S7xv 3l/HreEhyEQr4gC2a70IlXyHUadYOlrk+83OURrunWk1oKPmZlMKcfPVbtp8GL7x PUNXQfwM1XZLveKwufY24hoZdwKC+Y/5WLc1t0ReznJuAqgeO2rM9W5dnV5bAfCp he3F5hFcr196Dz3/GJjJIWrY+cbwfmZWsNtj1vFTL5/r/LuCu8HTkqhsGj8tE5BJ NGVEEXbp5eaVTCIGqJWhnuZcsnKN9kM51M7CtdwWf8OTckUVuJap5OsDVKQkWkGl bLPbd2jhDltph0sah51hAIvv4WdkThW76u9FRW7KR3fo7ra67eF7l5j7wc1lE2JB GwLJ1X56Bxe1GhvvNTlDmb7DrnlP/DMPuRv3Z6xyH6l8iZ2pMGlnAxuw6Bs1s6Y5 WSs36ZpnS0ctgjfx37ZITsZSvbKFPpQFJP4siwS8aRNv/NFALNNdFyOCY5lNzspZ 0dxwjn6/7UpHE4MKh6/hvCg2QwupXXBTRytibw+75/rOsR+EYlmtuONtyq2sLUHe ktJ5pg+8XuZm27+wLffuluzmY7sv2F8OU4cTYeM60Ynmc6pRzwUY6/VhG52S1/mU Ua4VgYIpzOtlLrYmz5QTWIZpdSFSVbIc/3pLriD6hn4Mvg+BwdA= =XOhL -----END PGP SIGNATURE----- Merge tag 'asm-generic-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "Most of this is part of my ongoing work to clean up the system call tables. In this bit, all of the newer architectures are converted to use the machine readable syscall.tbl format instead in place of complex macros in include/uapi/asm-generic/unistd.h. This follows an earlier series that fixed various API mismatches and in turn is used as the base for planned simplifications. The other two patches are dead code removal and a warning fix" * tag 'asm-generic-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: vmlinux.lds.h: catch .bss..L* sections into BSS") fixmap: Remove unused set_fixmap_offset_io() riscv: convert to generic syscall table openrisc: convert to generic syscall table nios2: convert to generic syscall table loongarch: convert to generic syscall table hexagon: use new system call table csky: convert to generic syscall table arm64: rework compat syscall macros arm64: generate 64-bit syscall.tbl arm64: convert unistd_32.h to syscall.tbl format arc: convert to generic syscall table clone3: drop __ARCH_WANT_SYS_CLONE3 macro kbuild: add syscall table generation to scripts/Makefile.asm-headers kbuild: verify asm-generic header list loongarch: avoid generating extra header files um: don't generate asm/bpf_perf_event.h csky: drop asm/gpio.h wrapper syscalls: add generic scripts/syscall.tbl |
||
|
|
436381eaf2 |
Merge branch 'slab/for-6.11/buckets' into slab/for-next
Merge all the slab patches previously collected on top of v6.10-rc1, over cleanups/fixes that had to be based on rc6. |
||
|
|
77179b6f30 |
tools: sync uapi/linux/fs.h header into tools subdir
We need this UAPI header in tools/include subdirectory for using it from BPF selftests. Link: https://lkml.kernel.org/r/20240627170900.1672542-6-andrii@kernel.org Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
c8b8b8190a |
LoongArch KVM changes for v6.11
1. Add ParaVirt steal time support. 2. Add some VM migration enhancement. 3. Add perf kvm-stat support for loongarch. -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmaOS6UWHGNoZW5odWFj YWlAa2VybmVsLm9yZwAKCRAChivD8uImehejD/9pACGe3h3krXLcFVWXOFIu5Hpc 5kQLP0lSPJ/o5Xs8t/oPLrnDX70z90wXI1LOmltc7h32MSwFa2l8COQh+sN5eJBQ PNyt7u7bMipp0yJS4Gl3LQQ5vklcGOSpQc/gbeXnVx8J/tz+Mo9YGGLIXVRXRM6W Ri8D2VVFiwzQQYeTpPo1u1Ob8C6mA4KOppwvhscMTM3vj4NMbsinBzRnR0lG0Tdw meFhxDPly1Ksxsbnj9UGO6UnEY0A2SLONs6MiO4y4DtoqoDlw/lbqFJuYo4vvbx1 pxtjyirD/PX/wjslQFWUOuU0hMfAodera+JupZ5BZWfcG8FltA4DQfDsm/U9RjK/ 7gGNnr8Xk2/tp6+4AVV+HU2iTgRvq+mXCL72zSy2Y4r7ElBAANDfk4n+Zn/PWisn U9wwV8Ue7tVB15BRpRsg77NzBidiCFEe/6flWYiX2y24ke71gwDJBGUy8hMdKt6t 4Cq8atsU0MvDAzfYMsK9JjskJp4UFq6wb1tXbbuADM4TDhnzlK6s6h3vM+pFlh/f my7fDH8/2qsCWhBDM4pmsJskVp+I1GOk/80RjTQISwx7iHktJWvxNYTaisK2fvD5 Qs1IUWfNFbDX0Lr0QpN6j6X4rZkghR4R6XoFkd4nkicwi+UHVn3oK9GSqv24QJn9 7+Ev3dfRTUYLd6mC4Q== =DpIK -----END PGP SIGNATURE----- Merge tag 'loongarch-kvm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.11 1. Add ParaVirt steal time support. 2. Add some VM migration enhancement. 3. Add perf kvm-stat support for loongarch. |
||
|
|
9ff0e37c68 |
KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY
Add a test case to exercise KVM_PRE_FAULT_MEMORY and run the guest to access the pre-populated area. It tests KVM_PRE_FAULT_MEMORY ioctl for KVM_X86_DEFAULT_VM and KVM_X86_SW_PROTECTED_VM. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Message-ID: <32427791ef42e5efaafb05d2ac37fa4372715f47.1712785629.git.isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
|
|
692a68ee9c |
radix tree test suite: put definition of bitmap_clear() into lib/bitmap.c
In tools/ directory, function bitmap_clear() is currently only used in object file tools/testing/radix-tree/xarray.o. But instead of keeping a bitmap.c with only bitmap_clear() definition in radix-tree's own directory, it would be more proper to put it in common directory lib/. Sync the kernel definition and link some related libs, no functional change is expected. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> CC: Matthew Wilcox <willy@infradead.org> CC: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> |
||
|
|
505d66d1ab |
clone3: drop __ARCH_WANT_SYS_CLONE3 macro
When clone3() was introduced, it was not obvious how each architecture deals with setting up the stack and keeping the register contents in a fork()-like system call, so this was left for the architecture maintainers to implement, with __ARCH_WANT_SYS_CLONE3 defined by those that already implement it. Five years later, we still have a few architectures left that are missing clone3(), and the macro keeps getting in the way as it's fundamentally different from all the other __ARCH_WANT_SYS_* macros that are meant to provide backwards-compatibility with applications using older syscalls that are no longer provided by default. Address this by reversing the polarity of the macro, adding an __ARCH_BROKEN_SYS_CLONE3 macro to all architectures that don't already provide the syscall, and remove __ARCH_WANT_SYS_CLONE3 from all the other ones. Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> |
||
|
|
7b769adc26 |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZoxN0AAKCRDbK58LschI g0c5AQDa3ZV9gfbN42y1zSDoM1uOgO60fb+ydxyOYh8l3+OiQQD/fLfpTY3gBFSY 9yi/pZhw/QdNzQskHNIBrHFGtJbMxgs= =p1Zz -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-07-08 The following pull-request contains BPF updates for your *net-next* tree. We've added 102 non-merge commits during the last 28 day(s) which contain a total of 127 files changed, 4606 insertions(+), 980 deletions(-). The main changes are: 1) Support resilient split BTF which cuts down on duplication and makes BTF as compact as possible wrt BTF from modules, from Alan Maguire & Eduard Zingerman. 2) Add support for dumping kfunc prototypes from BTF which enables both detecting as well as dumping compilable prototypes for kfuncs, from Daniel Xu. 3) Batch of s390x BPF JIT improvements to add support for BPF arena and to implement support for BPF exceptions, from Ilya Leoshkevich. 4) Batch of riscv64 BPF JIT improvements in particular to add 12-argument support for BPF trampolines and to utilize bpf_prog_pack for the latter, from Pu Lehui. 5) Extend BPF test infrastructure to add a CHECKSUM_COMPLETE validation option for skbs and add coverage along with it, from Vadim Fedorenko. 6) Inline bpf_get_current_task/_btf() helpers in the arm64 BPF JIT which gives a small 1% performance improvement in micro-benchmarks, from Puranjay Mohan. 7) Extend the BPF verifier to track the delta between linked registers in order to better deal with recent LLVM code optimizations, from Alexei Starovoitov. 8) Fix bpf_wq_set_callback_impl() kfunc signature where the third argument should have been a pointer to the map value, from Benjamin Tissoires. 9) Extend BPF selftests to add regular expression support for test output matching and adjust some of the selftest when compiled under gcc, from Cupertino Miranda. 10) Simplify task_file_seq_get_next() and remove an unnecessary loop which always iterates exactly once anyway, from Dan Carpenter. 11) Add the capability to offload the netfilter flowtable in XDP layer through kfuncs, from Florian Westphal & Lorenzo Bianconi. 12) Various cleanups in networking helpers in BPF selftests to shave off a few lines of open-coded functions on client/server handling, from Geliang Tang. 13) Properly propagate prog->aux->tail_call_reachable out of BPF verifier, so that x86 JIT does not need to implement detection, from Leon Hwang. 14) Fix BPF verifier to add a missing check_func_arg_reg_off() to prevent an out-of-bounds memory access for dynpointers, from Matt Bobrowski. 15) Fix bpf_session_cookie() kfunc to return __u64 instead of long pointer as it might lead to problems on 32-bit archs, from Jiri Olsa. 16) Enhance traffic validation and dynamic batch size support in xsk selftests, from Tushar Vyavahare. bpf-next-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (102 commits) selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature bpf: helpers: fix bpf_wq_set_callback_impl signature libbpf: Add NULL checks to bpf_object__{prev_map,next_map} selftests/bpf: Remove exceptions tests from DENYLIST.s390x s390/bpf: Implement exceptions s390/bpf: Change seen_reg to a mask bpf: Remove unnecessary loop in task_file_seq_get_next() riscv, bpf: Optimize stack usage of trampoline bpf, devmap: Add .map_alloc_check selftests/bpf: Remove arena tests from DENYLIST.s390x selftests/bpf: Add UAF tests for arena atomics selftests/bpf: Introduce __arena_global s390/bpf: Support arena atomics s390/bpf: Enable arena s390/bpf: Support address space cast instruction s390/bpf: Support BPF_PROBE_MEM32 s390/bpf: Land on the next JITed instruction after exception s390/bpf: Introduce pre- and post- probe functions s390/bpf: Get rid of get_probe_mem_regno() ... ==================== Link: https://patch.msgid.link/20240708221438.10974-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
9364a7e40d |
memblock tests: fix implicit declaration of function 'numa_valid_node'
commit
|
||
|
|
c142850fbc |
selftests/mm: kvm, mdwe fixes to avoid requiring "make headers"
On Ubuntu 23.04, the kvm and mdwe selftests/mm build fails due to
missing a few items that are found in prctl.h. Here is an excerpt of the
build failures:
ksm_tests.c:252:13: error: use of undeclared identifier 'PR_SET_MEMORY_MERGE'
...
mdwe_test.c:26:18: error: use of undeclared identifier 'PR_SET_MDWE'
mdwe_test.c:38:18: error: use of undeclared identifier 'PR_GET_MDWE'
Fix these errors by adding a new tools/include/uapi/linux/prctl.h . This
file was created by running "make headers", and then copying a snapshot
over from ./usr/include/linux/prctl.h, as per the approach we settled on
in [1].
[1] commit
|
||
|
|
e20194725b |
selftests/mm: fix vm_util.c build failures: add snapshot of fs.h
On Ubuntu 23.04, on a clean git tree, the selftests/mm build fails due 10
or 20 missing items, all of which are found in fs.h, which is created via
"make headers". However, as per [1], the idea is to stop requiring "make
headers", and instead, take a snapshot of the files and check them in.
Here are a few of the build errors:
vm_util.c:34:21: error: variable has incomplete type 'struct pm_scan_arg'
struct pm_scan_arg arg;
...
vm_util.c:45:28: error: use of undeclared identifier 'PAGE_IS_WPALLOWED'
...
vm_util.c:55:21: error: variable has incomplete type 'struct page_region'
...
vm_util.c:105:20: error: use of undeclared identifier 'PAGE_IS_SOFT_DIRTY'
To fix this, add fs.h, taken from a snapshot of ./usr/include/linux/fs.h
after running "make headers".
[1] commit
|
||
|
|
d20d0b10f8 |
tools/nolibc: implement strerror()
strerror() is commonly used. For example in kselftest which currently needs to do an #ifdef NOLIBC to handle the lack of strerror(). Keep it simple and reuse the output format of perror() for strerror(). Acked-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> |
||
|
|
0cf24d36bb |
tools/nolibc: implement strtol() and friends
The implementation always works on uintmax_t values. This is inefficient when only 32bit are needed. However for all functions this only happens for strtol() on 32bit platforms. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20240425-nolibc-strtol-v1-2-bfeef7846902@weissschuh.net |
||
|
|
8c3bd8bc40 |
tools/nolibc: add limits for {u,}intmax_t, ulong and {u,}llong
They are useful for users and necessary for strtol() and friends. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20240425-nolibc-strtol-v1-1-bfeef7846902@weissschuh.net |
||
|
|
4a24bbabc8 |
slab: delete useless RED_INACTIVE and RED_ACTIVE
These seem useless since we use the SLUB_RED_INACTIVE and SLUB_RED_ACTIVE, so just delete them, no functional change. Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Reviewed-by: Christoph Lameter (Ampere) <cl@linux.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> |
||
|
|
4c7d3d79c7 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. No conflicts, no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
a3cfe84cca |
bpf: Add CHECKSUM_COMPLETE to bpf test progs
Add special flag to validate that TC BPF program properly updates checksum information in skb. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240606145851.229116-1-vadfed@meta.com |
||
|
|
3e8e25761a |
selftests/bpf: Add uretprobe syscall test for regs integrity
Add uretprobe syscall test that compares register values before and after the uretprobe is hit. It also compares the register values seen from attached bpf program. Link: https://lore.kernel.org/all/20240611112158.40795-6-jolsa@kernel.org/ Acked-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> |
||
|
|
b8481381d4 |
perf tools fixes for v6.10: 2nd batch
- Update copies of kernel headers, which resulted in support for the new 'mseal' syscall, SUBVOL statx return mask bit, RISC-V and PPC prctls, fcntl's DUPFD_QUERY, POSTED_MSI_NOTIFICATION IRQ vector, 'map_shadow_stack' syscall for x86-32. - Revert perf.data record memory allocation optimization that ended up causing a regression, work is being done to re-introduce it in the next merge window. - Fix handling of minimal vmlinux.h file used with BPF's CO-RE when interrupting the build. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZmWw9QAKCRCyPKLppCJ+ J/bcAP4vftSGYZ0BDlfLT7KjEUDdc03nvgd5kHAkYM1OyVNgzAD/XRBe2c1f4pUK DWj5rjMAFpMfcGTv4TefQoopMZxYXgs= =cFei -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.10-2-2024-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: - Update copies of kernel headers, which resulted in support for the new 'mseal' syscall, SUBVOL statx return mask bit, RISC-V and PPC prctls, fcntl's DUPFD_QUERY, POSTED_MSI_NOTIFICATION IRQ vector, 'map_shadow_stack' syscall for x86-32. - Revert perf.data record memory allocation optimization that ended up causing a regression, work is being done to re-introduce it in the next merge window. - Fix handling of minimal vmlinux.h file used with BPF's CO-RE when interrupting the build. * tag 'perf-tools-fixes-for-v6.10-2-2024-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf bpf: Fix handling of minimal vmlinux.h file when interrupting the build Revert "perf record: Reduce memory for recording PERF_RECORD_LOST_SAMPLES event" tools headers arm64: Sync arm64's cputype.h with the kernel sources tools headers uapi: Sync linux/stat.h with the kernel sources to pick STATX_SUBVOL tools headers UAPI: Update i915_drm.h with the kernel sources tools headers UAPI: Sync kvm headers with the kernel sources tools arch x86: Sync the msr-index.h copy with the kernel sources tools headers: Update the syscall tables and unistd.h, mostly to support the new 'mseal' syscall perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources to pick POSTED_MSI_NOTIFICATION perf beauty: Update copy of linux/socket.h with the kernel sources tools headers UAPI: Sync fcntl.h with the kernel sources to pick F_DUPFD_QUERY tools headers UAPI: Sync linux/prctl.h with the kernel sources tools include UAPI: Sync linux/stat.h with the kernel sources |
||
|
|
b73f6b98bb |
mm/memblock: use PAGE_ALIGN_DOWN to get pgend in free_memmap
Leverage the macro PAGE_ALIGN_DOWN to get pgend. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Link: https://lore.kernel.org/r/20240507075833.6346-7-richard.weiyang@gmail.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> |
||
|
|
d6283b160a |
tools headers uapi: Sync linux/stat.h with the kernel sources to pick STATX_SUBVOL
To pick the changes from:
|
||
|
|
e19de2064f |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_classifier.c |
||
|
|
13c7c941e7 |
netdev: add qstat for csum complete
Recent commit |
||
|
|
2f523f29d3 |
tools headers UAPI: Update i915_drm.h with the kernel sources
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
88e520512a |
tools headers UAPI: Sync kvm headers with the kernel sources
To pick the changes in: |
||
|
|
4b3529edbb |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZlWtmQAKCRDbK58LschI g0TUAQDT76jx7Rq1DShCtZ3eqiBMNkYczK8b+GqNsSG8YGduaAEA1jn/GN+H65Rh atQZ/pYAfLZflMV04+XE0GyBr5q1uQg= =NczG -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-05-28 We've added 23 non-merge commits during the last 11 day(s) which contain a total of 45 files changed, 696 insertions(+), 277 deletions(-). The main changes are: 1) Rename skb's mono_delivery_time to tstamp_type for extensibility and add SKB_CLOCK_TAI type support to bpf_skb_set_tstamp(), from Abhishek Chauhan. 2) Add netfilter CT zone ID and direction to bpf_ct_opts so that arbitrary CT zones can be used from XDP/tc BPF netfilter CT helper functions, from Brad Cowie. 3) Several tweaks to the instruction-set.rst IETF doc to address the Last Call review comments, from Dave Thaler. 4) Small batch of riscv64 BPF JIT optimizations in order to emit more compressed instructions to the JITed image for better icache efficiency, from Xiao Wang. 5) Sort bpftool C dump output from BTF, aiming to simplify vmlinux.h diffing and forcing more natural type definitions ordering, from Mykyta Yatsenko. 6) Use DEV_STATS_INC() macro in BPF redirect helpers to silence a syzbot/KCSAN race report for the tx_errors counter, from Jiang Yunshui. 7) Un-constify bpf_func_info in bpftool to fix compilation with LLVM 17+ which started treating const structs as constants and thus breaking full BTF program name resolution, from Ivan Babrou. 8) Fix up BPF program numbers in test_sockmap selftest in order to reduce some of the test-internal array sizes, from Geliang Tang. 9) Small cleanup in Makefile.btf script to use test-ge check for v1.25-only pahole, from Alan Maguire. 10) Fix bpftool's make dependencies for vmlinux.h in order to avoid needless rebuilds in some corner cases, from Artem Savkov. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits) bpf, net: Use DEV_STAT_INC() bpf, docs: Fix instruction.rst indentation bpf, docs: Clarify call local offset bpf, docs: Add table captions bpf, docs: clarify sign extension of 64-bit use of 32-bit imm bpf, docs: Use RFC 2119 language for ISA requirements bpf, docs: Move sentence about returning R0 to abi.rst bpf: constify member bpf_sysctl_kern:: Table riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT riscv, bpf: Use STACK_ALIGN macro for size rounding up riscv, bpf: Optimize zextw insn with Zba extension selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets net: Add additional bit to support clockid_t timestamp type net: Rename mono_delivery_time to tstamp_type for scalabilty selftests/bpf: Update tests for new ct zone opts for nf_conntrack kfuncs net: netfilter: Make ct zone opts configurable for bpf ct helpers selftests/bpf: Fix prog numbers in test_sockmap bpf: Remove unused variable "prev_state" bpftool: Un-const bpf_func_info to fix it for llvm 17 and newer bpf: Fix order of args in call to bpf_map_kvcalloc ... ==================== Link: https://lore.kernel.org/r/20240528105924.30905-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
da42b5229b |
tools headers: Update the syscall tables and unistd.h, mostly to support the new 'mseal' syscall
But also to wire up shadow stacks on 32-bit x86, picking up those changes from these csets: |
||
|
|
1693c5db6a |
net: Add additional bit to support clockid_t timestamp type
tstamp_type is now set based on actual clockid_t compressed into 2 bits. To make the design scalable for future needs this commit bring in the change to extend the tstamp_type:1 to tstamp_type:2 to support other clockid_t timestamp. We now support CLOCK_TAI as part of tstamp_type as part of this commit with existing support CLOCK_MONOTONIC and CLOCK_REALTIME. Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240509211834.3235191-3-quic_abchauha@quicinc.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
|
|
29c73fc794 |
perf tools fixes and improvements for v6.10:
- Add Kan Liang to MAINTAINERS as a perf tools reviewer.
- Add support for using the 'capstone' disassembler library in various tools,
such as 'perf script' and 'perf annotate'. This is an alternative for the
use of the 'xed' and 'objdump' disassemblers.
- Data-type profiling improvements:
Resolve types for a->b->c by backtracking the assignments until it finds
DWARF info for one of those members
Support for global variables, keeping a cache to speed up lookups.
Handle the 'call' instruction, dealing with effects on registers and handling
its return when tracking register data types.
Handle x86's segment based addressing like %gs:0x28, to support things like
per CPU variables, the stack canary, etc.
Data-type profiling got big speedups when using capstone for disassembling.
The objdump outoput parsing method is left as a fallback when capstone fails or
isn't available. There are patches posted for 6.11 that to use a LLVM
disassembler.
Support event group display in the TUI when annotating types with --data-type,
for instance to show memory load and store events for the data type fields.
Optimize the 'perf annotate' data structures, reducing memory usage.
Add a initial 'perf test' for 'perf annotate', checking that a target symbol
appears on the output, specifying objdump via the command line, etc.
- Integrate the shellcheck utility with the build of perf to allow catching
shell problems early in areas such as 'perf test', 'perf trace' scrape
scripts, etc.
- Add 'uretprobe' variant in the 'perf bench uprobe' tool.
- Add script to run instances of 'perf script' in parallel.
- Allow parsing tracepoint names that start with digits, such as
9p/9p_client_req, etc. Make sure 'perf test' tests it even on systems
where those tracepoints aren't available.
Vendor Events:
- Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand Ridge, Ice
Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra Forest, Sky Lake X,
Sky Lake and Snow Ridge X. Remove info metrics erroneously in TopdownL1.
- Add AMD's Zen 5 core and uncore events and metrics. Those come from the
"Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh Processors"
document, with events that capture information on op dispatch, execution and
retirement, branch prediction, L1 and L2 cache activity, TLB activity, etc.
- Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/AmpereOneX.
Miscellaneous:
- Sync header copies with the kernel sources.
- Move some header copies used only for generating translation string tables
for ioctl cmds and other syscall integer arguments to a new directory under
tools/perf/beauty/, to separate from copies in tools/include/ that are used
to build the tools.
- Introduce scrape script for several syscall 'flags'/'mask' arguments.
- Improve cpumap utilization, fixing up pairing of refcounts, using the right
iterators (perf_cpu_map__for_each_cpu), etc.
- Give more details about raw event encodings in 'perf list', show tracepoint
encoding in the detailed output.
- Refactor the DSOs handling code, reducing memory usage.
- Document the BPF event modifier and add a 'perf test' for it.
- Improve the event parser, better error messages and add further 'perf test's
for it.
- Add reference count checking to 'struct comm_str' and 'struct mem_info'.
- Make ARM64's 'perf test' entries for the Neoverse N1 more robust.
- Tweak the ARM64's Coresight 'perf test's.
- Improve ARM64's CoreSight ETM version detection and error reporting.
- Fix handling of symbols when using kcore.
- Fix PAI (Processor Activity Instrumentation) counter names for s390 virtual
machines in 'perf report'.
- Fix -g/--call-graph option failure in 'perf sched timehist'.
- Add LIBTRACEEVENT_DIR build option to allow building with libtraceevent
installed in non-standard directories, such as when doing cross builds.
- Various 'perf test' and 'perf bench' fixes.
- Improve 'perf probe' error message for long C++ probe names.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZkzdjgAKCRCyPKLppCJ+
J8ZYAP46rcbOuDWol6tjD9FDXd+spkWc40bnqeSnOR+TWlmJXwEA87XU4+3LAh6p
HQxKXehJRh90I90yn954mK2NuN+58Q0=
=sie+
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
"General:
- Integrate the shellcheck utility with the build of perf to allow
catching shell problems early in areas such as 'perf test', 'perf
trace' scrape scripts, etc
- Add 'uretprobe' variant in the 'perf bench uprobe' tool
- Add script to run instances of 'perf script' in parallel
- Allow parsing tracepoint names that start with digits, such as
9p/9p_client_req, etc. Make sure 'perf test' tests it even on
systems where those tracepoints aren't available
- Add Kan Liang to MAINTAINERS as a perf tools reviewer
- Add support for using the 'capstone' disassembler library in
various tools, such as 'perf script' and 'perf annotate'. This is
an alternative for the use of the 'xed' and 'objdump' disassemblers
Data-type profiling improvements:
- Resolve types for a->b->c by backtracking the assignments until it
finds DWARF info for one of those members
- Support for global variables, keeping a cache to speed up lookups
- Handle the 'call' instruction, dealing with effects on registers
and handling its return when tracking register data types
- Handle x86's segment based addressing like %gs:0x28, to support
things like per CPU variables, the stack canary, etc
- Data-type profiling got big speedups when using capstone for
disassembling. The objdump outoput parsing method is left as a
fallback when capstone fails or isn't available. There are patches
posted for 6.11 that to use a LLVM disassembler
- Support event group display in the TUI when annotating types with
--data-type, for instance to show memory load and store events for
the data type fields
- Optimize the 'perf annotate' data structures, reducing memory usage
- Add a initial 'perf test' for 'perf annotate', checking that a
target symbol appears on the output, specifying objdump via the
command line, etc
Vendor Events:
- Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand
Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra
Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics
erroneously in TopdownL1
- Add AMD's Zen 5 core and uncore events and metrics. Those come from
the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh
Processors" document, with events that capture information on op
dispatch, execution and retirement, branch prediction, L1 and L2
cache activity, TLB activity, etc
- Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/
AmpereOneX
Miscellaneous:
- Sync header copies with the kernel sources
- Move some header copies used only for generating translation string
tables for ioctl cmds and other syscall integer arguments to a new
directory under tools/perf/beauty/, to separate from copies in
tools/include/ that are used to build the tools
- Introduce scrape script for several syscall 'flags'/'mask'
arguments
- Improve cpumap utilization, fixing up pairing of refcounts, using
the right iterators (perf_cpu_map__for_each_cpu), etc
- Give more details about raw event encodings in 'perf list', show
tracepoint encoding in the detailed output
- Refactor the DSOs handling code, reducing memory usage
- Document the BPF event modifier and add a 'perf test' for it
- Improve the event parser, better error messages and add further
'perf test's for it
- Add reference count checking to 'struct comm_str' and 'struct
mem_info'
- Make ARM64's 'perf test' entries for the Neoverse N1 more robust
- Tweak the ARM64's Coresight 'perf test's
- Improve ARM64's CoreSight ETM version detection and error reporting
- Fix handling of symbols when using kcore
- Fix PAI (Processor Activity Instrumentation) counter names for s390
virtual machines in 'perf report'
- Fix -g/--call-graph option failure in 'perf sched timehist'
- Add LIBTRACEEVENT_DIR build option to allow building with
libtraceevent installed in non-standard directories, such as when
doing cross builds
- Various 'perf test' and 'perf bench' fixes
- Improve 'perf probe' error message for long C++ probe names"
* tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (260 commits)
tools lib subcmd: Show parent options in help
perf pmu: Count sys and cpuid JSON events separately
perf stat: Don't display metric header for non-leader uncore events
perf annotate-data: Ensure the number of type histograms
perf annotate: Fix segfault on sample histogram
perf daemon: Fix file leak in daemon_session__control
libsubcmd: Fix parse-options memory leak
perf lock: Avoid memory leaks from strdup()
perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency
perf tools: Ignore deleted cgroups
perf parse: Allow tracepoint names to start with digits
perf parse-events: Add new 'fake_tp' parameter for tests
perf parse-events: pass parse_state to add_tracepoint
perf symbols: Fix ownership of string in dso__load_vmlinux()
perf symbols: Update kcore map before merging in remaining symbols
perf maps: Re-use __maps__free_maps_by_name()
perf symbols: Remove map from list before updating addresses
perf tracepoint: Don't scan all tracepoints to test if one exists
perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT
perf thread: Fixes to thread__new() related to initializing comm
...
|
||
|
|
3eb3c33c1d |
asm-generic cleanups for 6.10
These are a few cross-architecture cleanup patches:
- Thomas Zimmermann works on separating fbdev support from the asm/video.h
contents that may be used by either the old fbdev drivers or the
newer drm display code.
- Thorsten Blum contributes cleanups for the generic bitops code
and asm-generic/bug.h
- I remove the orphaned include/asm-generic/page.h header that used to
included by long-removed mmu-less architectures.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmZLvewACgkQYKtH/8kJ
UicUEQ//b5WVLOVXkFGlQvAaZkagOLEF8xSTnchA7aKrWQ/C6hSwLN6CQU6MAY7j
Fe54jYQtjwBwpVIj3jn20xiXP/pZbQp9aldkOx4v8YoGnjNF5UWLHm5510DV1ecE
0LF/2YIH25vIXGY6MVm6sFq+nkDgWZee6fBFNc3GsCu2y0biD1Gob9xH/ngCHjIj
tw9KS/j6MivPy/9vJ/Ml2YeutV6+pUA9hNmSrbSVlXSWFh3Wq6IZ+j6bNEftqtZY
xdnYwdVfReOCIayq6hSHhAgIp/uw8JOqLuE2JNwG/9sSF4zp4ZHLvTaMhqEoCpyB
3kZYd1qQTwV3eL5PyYtRcW03KvbhfZpMPzZT+wbl9SNPUljC2MSVeSFF30Uqatgb
yUJ9d/vlb1ynu1yQrFfTZ/kK+U0pPByydwLybcMtEIZ6Hrb1h/eRicvHhUx7bKUB
H9z/FN/TxGY+tPradx2lqm3J1wNu0ox8DUreXjtlJijKIUZQeAkJrGJgr6i6XLBz
crwgKzuQUClzEjBcoWzuTVUB7v19jaDuHMsaBBu8O9f1g5FnEIJlItqnXf1J0Dno
rJy68Mxsg4Dzt4YI3lpOJGDDDPhpOTBXfgsjkuru2MrdFMgZQh+DYLl3qOkJ4DJe
rdiEJb9PygBaGGQnoXO71oOLf5yQuenj+Fg5GIe9AQrci5fXwRQ=
=riCs
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanups from Arnd Bergmann:
"These are a few cross-architecture cleanup patches:
- separate out fbdev support from the asm/video.h contents that may
be used by either the old fbdev drivers or the newer drm display
code (Thomas Zimmermann)
- cleanups for the generic bitops code and asm-generic/bug.h
(Thorsten Blum)
- remove the orphaned include/asm-generic/page.h header that used to
be included by long-removed mmu-less architectures (me)"
* tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
arch: Fix name collision with ACPI's video.o
bug: Improve comment
asm-generic: remove unused asm-generic/page.h
arch: Rename fbdev header and source files
arch: Remove struct fb_info from video helpers
arch: Select fbdev helpers with CONFIG_VIDEO
bitops: Change function return types from long to int
|
||
|
|
eb6a9339ef |
Mainly singleton patches, documented in their respective changelogs.
Notable series include:
- Some maintenance and performance work for ocfs2 in Heming Zhao's
series "improve write IO performance when fragmentation is high".
- Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes
exposed by fstests".
- kfifo header rework from Andy Shevchenko in the series "kfifo: Clean
up kfifo.h".
- GDB script fixes from Florian Rommel in the series "scripts/gdb: Fixes
for $lx_current and $lx_per_cpu".
- After much discussion, a coding-style update from Barry Song
explaining one reason why inline functions are preferred over macros.
The series is "codingstyle: avoid unused parameters for a function-like
macro".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkpLYQAKCRDdBJ7gKXxA
jo9NAQDctSD3TMXqxqCHLaEpCaYTYzi6TGAVHjgkqGzOt7tYjAD/ZIzgcmRwthjP
R7SSiSgZ7UnP9JRn16DQILmFeaoG1gs=
=lYhr
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-mm updates from Andrew Morton:
"Mainly singleton patches, documented in their respective changelogs.
Notable series include:
- Some maintenance and performance work for ocfs2 in Heming Zhao's
series "improve write IO performance when fragmentation is high".
- Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes
exposed by fstests".
- kfifo header rework from Andy Shevchenko in the series "kfifo:
Clean up kfifo.h".
- GDB script fixes from Florian Rommel in the series "scripts/gdb:
Fixes for $lx_current and $lx_per_cpu".
- After much discussion, a coding-style update from Barry Song
explaining one reason why inline functions are preferred over
macros. The series is "codingstyle: avoid unused parameters for a
function-like macro""
* tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (62 commits)
fs/proc: fix softlockup in __read_vmcore
nilfs2: convert BUG_ON() in nilfs_finish_roll_forward() to WARN_ON()
scripts: checkpatch: check unused parameters for function-like macro
Documentation: coding-style: ask function-like macros to evaluate parameters
nilfs2: use __field_struct() for a bitwise field
selftests/kcmp: remove unused open mode
nilfs2: remove calls to folio_set_error() and folio_clear_error()
kernel/watchdog_perf.c: tidy up kerneldoc
watchdog: allow nmi watchdog to use raw perf event
watchdog: handle comma separated nmi_watchdog command line
nilfs2: make superblock data array index computation sparse friendly
squashfs: remove calls to set the folio error flag
squashfs: convert squashfs_symlink_read_folio to use folio APIs
scripts/gdb: fix detection of current CPU in KGDB
scripts/gdb: make get_thread_info accept pointers
scripts/gdb: fix parameter handling in $lx_per_cpu
scripts/gdb: fix failing KGDB detection during probe
kfifo: don't use "proxy" headers
media: stih-cec: add missing io.h
media: rc: add missing io.h
...
|
||
|
|
61307b7be4 |
The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs. Notable
series include:
- Lucas Stach has provided some page-mapping
cleanup/consolidation/maintainability work in the series "mm/treewide:
Remove pXd_huge() API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being allocated:
number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in largely
similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
Weiner has fixed the page allocator's handling of migratetype requests,
with resulting improvements in compaction efficiency.
- In the series "make the hugetlb migration strategy consistent" Baolin
Wang has fixed a hugetlb migration issue, which should improve hugetlb
allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when memory
almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting" Kairui
Song has optimized pagecache insertion, yielding ~10% performance
improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various page->flags
cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
"mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs. This
is a simple first-cut implementation for now. The series is "support
multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in the
series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts in
the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes
the initialization code so that migration between different memory types
works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant driver
in the series "mm: follow_pte() improvements and acrn follow_pte()
fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to folio
in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size THP's
in the series "mm: add per-order mTHP alloc and swpout counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His series
"mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback instrumentation
in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series "Fix
and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
series "Improve anon_vma scalability for anon VMAs". Intel's test bot
reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
=V3R/
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
|
||
|
|
ff2632d7d0 |
powerpc updates for 6.10
- Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.
- Allow per-process DEXCR (Dynamic Execution Control Register) settings via
prctl, notably NPHIE which controls hashst/hashchk for ROP protection.
- Install powerpc selftests in sub-directories. Note this changes the way
run_kselftest.sh needs to be invoked for powerpc selftests.
- Change fadump (Firmware Assisted Dump) to better handle memory add/remove.
- Add support for passing additional parameters to the fadump kernel.
- Add support for updating the kdump image on CPU/memory add/remove events.
- Other small features, cleanups and fixes.
Thanks to: Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann,
Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe Jaillet, Christophe
Leroy, Colin Ian King, Cédric Le Goater, Dr. David Alan Gilbert, Erhard Furtner,
Frank Li, GUO Zihua, Ganesh Goudar, Geoff Levand, Ghanshyam Agrawal, Greg Kurz,
Hari Bathini, Joel Stanley, Justin Stitt, Kunwu Chan, Li Yang, Lidong Zhong,
Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Matthias Schiffer,
Naresh Kamboju, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas
Miehlbradt, Ran Wang, Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta,
Shrikanth Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
Jain, Xiaowei Bao, Yang Li, Zhao Chenhui.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmZHLtwTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgCGdD/0cqQkYl6+E0/K68Y7jnAWF+l0LNFlm
/4jZ+zKXPiPhSdaQq4xo2ZjEooUPsm3c+AHidmrAtOMBULvv4pyciu61hrVu4Y2b
aAudkBMUc+i/Lfaz7fq1KnN4LDFVm7xZZ+i/ju9tOBLMpOZ3YZ+YoOGA6nqsshJF
XuB5h0T+H55he1wBpvyyrsUUyss53Mp3IsajxdwBOsUDDp0fSAg8SLEyhoiK3BsQ
EjEa6iEqJSBheqFEXPvqsMuqM3k51CHe/pCOMODjo7P+u/MNrClZUscZKXGB5xq9
Bu3SPxIYfRmU4XE53517faElEPmlxSBrjQGCD1EGEVXGsjn6r7TD6R5voow3SoUq
CLTy90KNNrS1cIqeomu6bJ/anzYrViqTdekImA7Vb+Ol8f+uT9l+l1D75eYOKPQ3
N0AHoa4rnWIb5kjCAjHaZ54O+B2q2tPlQqFUmt+BrvZyKS13zjE36stnArxP3MPC
Xw6y3huX3AkZiJ4mQYRiBn//xGOLwrRCd/EoTDnoe08yq0Hoor6qIm4uEy2Nu3Kf
0mBsEOxMsmQd6NEq43B/sFgVbbxKhAyxfZ9gHqxDQZcgoxXcMesyj/n4+jM5sRYK
zmavLlykM2Tjlh1evs8+e0mCEwDjDn2GRlqstJQTrmnGhbMKi3jvw9I7gGtZVqbS
kAflTXzsIXvxBA==
=GoCV
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.
- Allow per-process DEXCR (Dynamic Execution Control Register) settings
via prctl, notably NPHIE which controls hashst/hashchk for ROP
protection.
- Install powerpc selftests in sub-directories. Note this changes the
way run_kselftest.sh needs to be invoked for powerpc selftests.
- Change fadump (Firmware Assisted Dump) to better handle memory
add/remove.
- Add support for passing additional parameters to the fadump kernel.
- Add support for updating the kdump image on CPU/memory add/remove
events.
- Other small features, cleanups and fixes.
Thanks to Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd
Bergmann, Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe
Jaillet, Christophe Leroy, Colin Ian King, Cédric Le Goater, Dr. David
Alan Gilbert, Erhard Furtner, Frank Li, GUO Zihua, Ganesh Goudar, Geoff
Levand, Ghanshyam Agrawal, Greg Kurz, Hari Bathini, Joel Stanley, Justin
Stitt, Kunwu Chan, Li Yang, Lidong Zhong, Madhavan Srinivasan, Mahesh
Salgaonkar, Masahiro Yamada, Matthias Schiffer, Naresh Kamboju, Nathan
Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Miehlbradt, Ran Wang,
Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta, Shrikanth
Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
Jain, Xiaowei Bao, Yang Li, and Zhao Chenhui.
* tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (85 commits)
powerpc/fadump: Fix section mismatch warning
powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP
powerpc/fadump: update documentation about bootargs_append
powerpc/fadump: pass additional parameters when fadump is active
powerpc/fadump: setup additional parameters for dump capture kernel
powerpc/pseries/fadump: add support for multiple boot memory regions
selftests/powerpc/dexcr: Fix spelling mistake "predicition" -> "prediction"
KVM: PPC: Book3S HV nestedv2: Fix an error handling path in gs_msg_ops_kvmhv_nestedv2_config_fill_info()
KVM: PPC: Fix documentation for ppc mmu caps
KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver
KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception
powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#"
powerpc/code-patching: Use dedicated memory routines for patching
powerpc/code-patching: Test patch_instructions() during boot
powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region()
powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX
powerpc: Fix typos
powerpc/eeh: Fix spelling of the word "auxillary" and update comment
macintosh/ams: Fix unused variable warning
powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large
...
|
||
|
|
1b294a1f35 |
Networking changes for 6.10.
Core & protocols
----------------
- Complete rework of garbage collection of AF_UNIX sockets.
AF_UNIX is prone to forming reference count cycles due to fd passing
functionality. New method based on Tarjan's Strongly Connected Components
algorithm should be both faster and remove a lot of workarounds
we accumulated over the years.
- Add TCP fraglist GRO support, allowing chaining multiple TCP packets
and forwarding them together. Useful for small switches / routers which
lack basic checksum offload in some scenarios (e.g. PPPoE).
- Support using SMP threads for handling packet backlog i.e. packet
processing from software interfaces and old drivers which don't
use NAPI. This helps move the processing out of the softirq jumble.
- Continue work of converting from rtnl lock to RCU protection.
Don't require rtnl lock when reading: IPv6 routing FIB, IPv6 address
labels, netdev threaded NAPI sysfs files, bonding driver's sysfs files,
MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics, TC Qdiscs,
neighbor entries, ARP entries via ioctl(SIOCGARP), a lot of the link
information available via rtnetlink.
- Small optimizations from Eric to UDP wake up handling, memory accounting,
RPS/RFS implementation, TCP packet sizing etc.
- Allow direct page recycling in the bulk API used by XDP, for +2% PPS.
- Support peek with an offset on TCP sockets.
- Add MPTCP APIs for querying last time packets were received/sent/acked,
and whether MPTCP "upgrade" succeeded on a TCP socket.
- Add intra-node communication shortcut to improve SMC performance.
- Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol driver.
- Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
- Add reset reasons for tracing what caused a TCP reset to be sent.
- Introduce direction attribute for xfrm (IPSec) states.
State can be used either for input or output packet processing.
Things we sprinkled into general kernel code
--------------------------------------------
- Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
This required touch-ups and renaming of a few existing users.
- Add Endian-dependent __counted_by_{le,be} annotations.
- Make building selftests "quieter" by printing summaries like
"CC object.o" rather than full commands with all the arguments.
Netfilter
---------
- Use GFP_KERNEL to clone elements, to deal better with OOM situations
and avoid failures in the .commit step.
BPF
---
- Add eBPF JIT for ARCv2 CPUs.
- Support attaching kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function entry
and return, the entry program can decide if the return program gets
executed and the entry program can share u64 cookie value with return
program. "Session mode" is a common use-case for tetragon and bpftrace.
- Add the ability to specify and retrieve BPF cookie for raw tracepoint
programs in order to ease migration from classic to raw tracepoints.
- Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86, ARM64 and RISC-V JITs.
This allows inlining functions which need to access per-CPU state.
- Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86 instruction.
Support BPF arena on ARM64.
- Add a new bpf_wq API for deferring events and refactor process-context
bpf_timer code to keep common code where possible.
- Harden the BPF verifier's and/or/xor value tracking.
- Introduce crypto kfuncs to let BPF programs call kernel crypto APIs.
- Support bpf_tail_call_static() helper for BPF programs with GCC 13.
- Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled.
Driver API
----------
- Skip software TC processing completely if all installed rules are
marked as HW-only, instead of checking the HW-only flag rule by rule.
- Add support for configuring PoE (Power over Ethernet), similar to
the already existing support for PoDL (Power over Data Line) config.
- Initial bits of a queue control API, for now allowing a single queue
to be reset without disturbing packet flow to other queues.
- Common (ethtool) statistics for hardware timestamping.
Tests and tooling
-----------------
- Remove the need to create a config file to run the net forwarding tests
so that a naive "make run_tests" can exercise them.
- Define a method of writing tests which require an external endpoint
to communicate with (to send/receive data towards the test machine).
Add a few such tests.
- Create a shared code library for writing Python tests. Expose the YAML
Netlink library from tools/ to the tests for easy Netlink access.
- Move netfilter tests under net/, extend them, separate performance tests
from correctness tests, and iron out issues found by running them
"on every commit".
- Refactor BPF selftests to use common network helpers.
- Further work filling in YAML definitions of Netlink messages for:
nftables, team driver, bonding interfaces, vlan interfaces, VF info,
TC u32 mark, TC police action.
- Teach Python YAML Netlink to decode attribute policies.
- Extend the definition of the "indexed array" construct in the specs
to cover arrays of scalars rather than just nests.
- Add hyperlinks between definitions in generated Netlink docs.
Drivers
-------
- Make sure unsupported flower control flags are rejected by drivers,
and make more drivers report errors directly to the application rather
than dmesg (large number of driver changes from Asbjørn Sloth Tønnesen).
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support multiple RSS contexts and steering traffic to them
- support XDP metadata
- make page pool allocations more NUMA aware
- Intel (100G, ice, idpf):
- extract datapath code common among Intel drivers into a library
- use fewer resources in switchdev by sharing queues with the PF
- add PFCP filter support
- add Ethernet filter support
- use a spinlock instead of HW lock in PTP clock ops
- support 5 layer Tx scheduler topology
- nVidia/Mellanox:
- 800G link modes and 100G SerDes speeds
- per-queue IRQ coalescing configuration
- Marvell Octeon:
- support offloading TC packet mark action
- Ethernet NICs consumer, embedded and virtual:
- stop lying about skb->truesize in USB Ethernet drivers, it messes up
TCP memory calculations
- Google cloud vNIC:
- support changing ring size via ethtool
- support ring reset using the queue control API
- VirtIO net:
- expose flow hash from RSS to XDP
- per-queue statistics
- add selftests
- Synopsys (stmmac):
- support controllers which require an RX clock signal from the MII
bus to perform their hardware initialization
- TI:
- icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
- icssg_prueth: add SW TX / RX Coalescing based on hrtimers
- cpsw: minimal XDP support
- Renesas (ravb):
- support describing the MDIO bus
- Realtek (r8169):
- add support for RTL8168M
- Microchip Sparx5:
- matchall and flower actions mirred and redirect
- Ethernet switches:
- nVidia/Mellanox:
- improve events processing performance
- Marvell:
- add support for MV88E6250 family internal PHYs
- Microchip:
- add DCB and DSCP mapping support for KSZ switches
- vsc73xx: convert to PHYLINK
- Realtek:
- rtl8226b/rtl8221b: add C45 instances and SerDes switching
- Many driver changes related to PHYLIB and PHYLINK deprecated API cleanup.
- Ethernet PHYs:
- Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
- micrel: lan8814: add support for PPS out and external timestamp trigger
- WiFi:
- Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices drivers.
Modern devices can only be configured using nl80211.
- mac80211/cfg80211
- handle color change per link for WiFi 7 Multi-Link Operation
- Intel (iwlwifi):
- don't support puncturing in 5 GHz
- support monitor mode on passive channels
- BZ-W device support
- P2P with HE/EHT support
- re-add support for firmware API 90
- provide channel survey information for Automatic Channel Selection
- MediaTek (mt76):
- mt7921 LED control
- mt7925 EHT radiotap support
- mt7920e PCI support
- Qualcomm (ath11k):
- P2P support for QCA6390, WCN6855 and QCA2066
- support hibernation
- ieee80211-freq-limit Device Tree property support
- Qualcomm (ath12k):
- refactoring in preparation of multi-link support
- suspend and hibernation support
- ACPI support
- debugfs support, including dfs_simulate_radar support
- RealTek:
- rtw88: RTL8723CS SDIO device support
- rtw89: RTL8922AE Wi-Fi 7 PCI device support
- rtw89: complete features of new WiFi 7 chip 8922AE including
BT-coexistence and Wake-on-WLAN
- rtw89: use BIOS ACPI settings to set TX power and channels
- rtl8xxxu: enable Management Frame Protection (MFP) support
- Bluetooth:
- support for Intel BlazarI and Filmore Peak2 (BE201)
- support for MediaTek MT7921S SDIO
- initial support for Intel PCIe BT driver
- remove HCI_AMP support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZD6sQACgkQMUZtbf5S
IrtLYw/+I73ePGIye37o2jpbodcLAUZVfF3r6uYUzK8hokEcKD0QVJa9w7PizLZ3
UO45ClOXFLJCkfP4reFenLfxGCel2AJI+F7VFl2xaO2XgrcH/lnVrHqKZEAEXjls
KoYMnShIolv7h2MKP6hHtyTi2j1wvQUKsZC71o9/fuW+4fUT8gECx1YtYcL73wrw
gEMdlUgBYC3jiiCUHJIFX6iPJ2t/TC+q1eIIF2K/Osrk2kIqQhzoozcL4vpuAZQT
99ljx/qRelXa8oppDb7nM5eulg7WY8ZqxEfFZphTMC5nLEGzClxuOTTl2kDYI/D/
UZmTWZDY+F5F0xvNk2gH84qVJXBOVDoobpT7hVA/tDuybobc/kvGDzRayEVqVzKj
Q0tPlJs+xBZpkK5TVnxaFLJVOM+p1Xosxy3kNVXmuYNBvT/R89UbJiCrUKqKZF+L
z/1mOYUv8UklHqYAeuJSptHvqJjTGa/fsEYP7dAUBbc1N2eVB8mzZ4mgU5rYXbtC
E6UXXiWnoSRm8bmco9QmcWWoXt5UGEizHSJLz6t1R5Df/YmXhWlytll5aCwY1ksf
FNoL7S4u7AZThL1Nwi7yUs4CAjhk/N4aOsk+41S0sALCx30BJuI6UdesAxJ0lu+Z
fwCQYbs27y4p7mBLbkYwcQNxAxGm7PSK4yeyRIy2njiyV4qnLf8=
=EsC2
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Complete rework of garbage collection of AF_UNIX sockets.
AF_UNIX is prone to forming reference count cycles due to fd
passing functionality. New method based on Tarjan's Strongly
Connected Components algorithm should be both faster and remove a
lot of workarounds we accumulated over the years.
- Add TCP fraglist GRO support, allowing chaining multiple TCP
packets and forwarding them together. Useful for small switches /
routers which lack basic checksum offload in some scenarios (e.g.
PPPoE).
- Support using SMP threads for handling packet backlog i.e. packet
processing from software interfaces and old drivers which don't use
NAPI. This helps move the processing out of the softirq jumble.
- Continue work of converting from rtnl lock to RCU protection.
Don't require rtnl lock when reading: IPv6 routing FIB, IPv6
address labels, netdev threaded NAPI sysfs files, bonding driver's
sysfs files, MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics,
TC Qdiscs, neighbor entries, ARP entries via ioctl(SIOCGARP), a lot
of the link information available via rtnetlink.
- Small optimizations from Eric to UDP wake up handling, memory
accounting, RPS/RFS implementation, TCP packet sizing etc.
- Allow direct page recycling in the bulk API used by XDP, for +2%
PPS.
- Support peek with an offset on TCP sockets.
- Add MPTCP APIs for querying last time packets were received/sent/acked
and whether MPTCP "upgrade" succeeded on a TCP socket.
- Add intra-node communication shortcut to improve SMC performance.
- Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol
driver.
- Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
- Add reset reasons for tracing what caused a TCP reset to be sent.
- Introduce direction attribute for xfrm (IPSec) states. State can be
used either for input or output packet processing.
Things we sprinkled into general kernel code:
- Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
This required touch-ups and renaming of a few existing users.
- Add Endian-dependent __counted_by_{le,be} annotations.
- Make building selftests "quieter" by printing summaries like
"CC object.o" rather than full commands with all the arguments.
Netfilter:
- Use GFP_KERNEL to clone elements, to deal better with OOM
situations and avoid failures in the .commit step.
BPF:
- Add eBPF JIT for ARCv2 CPUs.
- Support attaching kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function
entry and return, the entry program can decide if the return
program gets executed and the entry program can share u64 cookie
value with return program. "Session mode" is a common use-case for
tetragon and bpftrace.
- Add the ability to specify and retrieve BPF cookie for raw
tracepoint programs in order to ease migration from classic to raw
tracepoints.
- Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86, ARM64 and RISC-V
JITs. This allows inlining functions which need to access per-CPU
state.
- Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86
instruction. Support BPF arena on ARM64.
- Add a new bpf_wq API for deferring events and refactor
process-context bpf_timer code to keep common code where possible.
- Harden the BPF verifier's and/or/xor value tracking.
- Introduce crypto kfuncs to let BPF programs call kernel crypto
APIs.
- Support bpf_tail_call_static() helper for BPF programs with GCC 13.
- Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled.
Driver API:
- Skip software TC processing completely if all installed rules are
marked as HW-only, instead of checking the HW-only flag rule by
rule.
- Add support for configuring PoE (Power over Ethernet), similar to
the already existing support for PoDL (Power over Data Line)
config.
- Initial bits of a queue control API, for now allowing a single
queue to be reset without disturbing packet flow to other queues.
- Common (ethtool) statistics for hardware timestamping.
Tests and tooling:
- Remove the need to create a config file to run the net forwarding
tests so that a naive "make run_tests" can exercise them.
- Define a method of writing tests which require an external endpoint
to communicate with (to send/receive data towards the test
machine). Add a few such tests.
- Create a shared code library for writing Python tests. Expose the
YAML Netlink library from tools/ to the tests for easy Netlink
access.
- Move netfilter tests under net/, extend them, separate performance
tests from correctness tests, and iron out issues found by running
them "on every commit".
- Refactor BPF selftests to use common network helpers.
- Further work filling in YAML definitions of Netlink messages for:
nftables, team driver, bonding interfaces, vlan interfaces, VF
info, TC u32 mark, TC police action.
- Teach Python YAML Netlink to decode attribute policies.
- Extend the definition of the "indexed array" construct in the specs
to cover arrays of scalars rather than just nests.
- Add hyperlinks between definitions in generated Netlink docs.
Drivers:
- Make sure unsupported flower control flags are rejected by drivers,
and make more drivers report errors directly to the application
rather than dmesg (large number of driver changes from Asbjørn
Sloth Tønnesen).
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support multiple RSS contexts and steering traffic to them
- support XDP metadata
- make page pool allocations more NUMA aware
- Intel (100G, ice, idpf):
- extract datapath code common among Intel drivers into a library
- use fewer resources in switchdev by sharing queues with the PF
- add PFCP filter support
- add Ethernet filter support
- use a spinlock instead of HW lock in PTP clock ops
- support 5 layer Tx scheduler topology
- nVidia/Mellanox:
- 800G link modes and 100G SerDes speeds
- per-queue IRQ coalescing configuration
- Marvell Octeon:
- support offloading TC packet mark action
- Ethernet NICs consumer, embedded and virtual:
- stop lying about skb->truesize in USB Ethernet drivers, it
messes up TCP memory calculations
- Google cloud vNIC:
- support changing ring size via ethtool
- support ring reset using the queue control API
- VirtIO net:
- expose flow hash from RSS to XDP
- per-queue statistics
- add selftests
- Synopsys (stmmac):
- support controllers which require an RX clock signal from the
MII bus to perform their hardware initialization
- TI:
- icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
- icssg_prueth: add SW TX / RX Coalescing based on hrtimers
- cpsw: minimal XDP support
- Renesas (ravb):
- support describing the MDIO bus
- Realtek (r8169):
- add support for RTL8168M
- Microchip Sparx5:
- matchall and flower actions mirred and redirect
- Ethernet switches:
- nVidia/Mellanox:
- improve events processing performance
- Marvell:
- add support for MV88E6250 family internal PHYs
- Microchip:
- add DCB and DSCP mapping support for KSZ switches
- vsc73xx: convert to PHYLINK
- Realtek:
- rtl8226b/rtl8221b: add C45 instances and SerDes switching
- Many driver changes related to PHYLIB and PHYLINK deprecated API
cleanup
- Ethernet PHYs:
- Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
- micrel: lan8814: add support for PPS out and external timestamp trigger
- WiFi:
- Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices
drivers. Modern devices can only be configured using nl80211.
- mac80211/cfg80211
- handle color change per link for WiFi 7 Multi-Link Operation
- Intel (iwlwifi):
- don't support puncturing in 5 GHz
- support monitor mode on passive channels
- BZ-W device support
- P2P with HE/EHT support
- re-add support for firmware API 90
- provide channel survey information for Automatic Channel Selection
- MediaTek (mt76):
- mt7921 LED control
- mt7925 EHT radiotap support
- mt7920e PCI support
- Qualcomm (ath11k):
- P2P support for QCA6390, WCN6855 and QCA2066
- support hibernation
- ieee80211-freq-limit Device Tree property support
- Qualcomm (ath12k):
- refactoring in preparation of multi-link support
- suspend and hibernation support
- ACPI support
- debugfs support, including dfs_simulate_radar support
- RealTek:
- rtw88: RTL8723CS SDIO device support
- rtw89: RTL8922AE Wi-Fi 7 PCI device support
- rtw89: complete features of new WiFi 7 chip 8922AE including
BT-coexistence and Wake-on-WLAN
- rtw89: use BIOS ACPI settings to set TX power and channels
- rtl8xxxu: enable Management Frame Protection (MFP) support
- Bluetooth:
- support for Intel BlazarI and Filmore Peak2 (BE201)
- support for MediaTek MT7921S SDIO
- initial support for Intel PCIe BT driver
- remove HCI_AMP support"
* tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1827 commits)
selftests: netfilter: fix packetdrill conntrack testcase
net: gro: fix napi_gro_cb zeroed alignment
Bluetooth: btintel_pcie: Refactor and code cleanup
Bluetooth: btintel_pcie: Fix warning reported by sparse
Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1
Bluetooth: btintel: Fix compiler warning for multi_v7_defconfig config
Bluetooth: btintel_pcie: Fix compiler warnings
Bluetooth: btintel_pcie: Add *setup* function to download firmware
Bluetooth: btintel_pcie: Add support for PCIe transport
Bluetooth: btintel: Export few static functions
Bluetooth: HCI: Remove HCI_AMP support
Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init()
Bluetooth: qca: Fix error code in qca_read_fw_build_info()
Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warning
Bluetooth: btintel: Add support for Filmore Peak2 (BE201)
Bluetooth: btintel: Add support for BlazarI
LE Create Connection command timeout increased to 20 secs
dt-bindings: net: bluetooth: Add MediaTek MT7921S SDIO Bluetooth
Bluetooth: compute LE flow credits based on recvbuf space
Bluetooth: hci_sync: Use cmd->num_cis instead of magic number
...
|
||
|
|
4b768bf062 |
linux_kselftest-nolibc-6.10-rc1
This nolibc update for Linux 6.10-rc1 - adds support for uname(2) - removes open-coded strnlen() - exports strlen() - adds tests for strlcat() and strlcpy() - fixes memory error in realloc() - fixes strlcat() return code and size usage - fixes strlcpy() return code and size usage -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmZCKZ8ACgkQCwJExA0N QxznfRAA17AXswINyhQEu/2lJIFFpQazZqMyvZ4mv19FPI8P3zhwPN24XsmLNqdo py6dUIbondrs7UgOOG0R0E8N89rxDctok6F8OEScsnfvd+KKEg3+YJscJj2X1vaK H3qU+i/K3oGFHJo2dsaR1iYl+lzo9dyzvicOe5bFzFkVQGI0a3p88CQrhoF3HkNf jc9oqnxIGJn3v3UX6jxFWFiuF17CopNv4/5DZG2QIU5t0JGiqEYxl5ayDwqQsgGB E88eiutsuhkVTqKI/9VXs+YIE0B0xbEQfGGSQ1zzGZKfy7MBSk88PzLXa591DFZH SXETzfxqbI4ACX7L36mUX/JUw/sC5ch67UABZqwrj8oxXb9Ssgxfzmt7rCwBlp2i GdYHuoSDY5TJnskceL5a94BLuvhI9EnYsMHKA2aHM95eBBkPZo7jGqqOU8hJbL4t MbIu0z3iCNUBFcNSBXdfv81Dc4CJO4Orj1jnMqe0U6dSasT+hS2ulZPu6EjKIfMY Wqgfs89pFQ4OH3e3Crx2ngz57iL2fDYZfccfjOJQtYfxTOmXRKHjcDXcZdydsxNF 6dXNDhtuyftvNf3h4o7B73Rv3xNbG1ObZmU2i4wxmY2fsYEsRhqR/UKhtqbTPsxF C/zL8KSraGzrgYuyOzQMDWAH/ybhZytn91kEd1QBWMO79TCC2JI= =2QlZ -----END PGP SIGNATURE----- Merge tag 'linux_kselftest-nolibc-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull nolibc updates from Shuah Khan: - add support for uname(2) - remove open-coded strnlen() - export strlen() - add tests for strlcat() and strlcpy() - fix memory error in realloc() - fix strlcat() and strlcpy() return code and size usage * tag 'linux_kselftest-nolibc-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: tools/nolibc: add support for uname(2) tools/nolibc/string: remove open-coded strnlen() selftests/nolibc: Add tests for strlcat() and strlcpy() tools/nolibc: Fix strlcpy() return code and size usage tools/nolibc: Fix strlcat() return code and size usage tools/nolibc/string: export strlen() tools/nolibc/stdlib: fix memory error in realloc() |
||
|
|
6e62702feb |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZkGcZAAKCRDbK58LschI g6o6APwLsqhrM2w71VUN5ciCxu4H5VDtZp6wkdqtVbxxU4qNxQEApKgYgKt8ZLF3 Kily5c7m+S4ZXhMX21rb8JhSAz0dfQk= =5Dk7 -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-05-13 We've added 119 non-merge commits during the last 14 day(s) which contain a total of 134 files changed, 9462 insertions(+), 4742 deletions(-). The main changes are: 1) Add BPF JIT support for 32-bit ARCv2 processors, from Shahab Vahedi. 2) Add BPF range computation improvements to the verifier in particular around XOR and OR operators, refactoring of checks for range computation and relaxing MUL range computation so that src_reg can also be an unknown scalar, from Cupertino Miranda. 3) Add support to attach kprobe BPF programs through kprobe_multi link in a session mode, meaning, a BPF program is attached to both function entry and return, the entry program can decide if the return program gets executed and the entry program can share u64 cookie value with return program. Session mode is a common use-case for tetragon and bpftrace, from Jiri Olsa. 4) Fix a potential overflow in libbpf's ring__consume_n() and improve libbpf as well as BPF selftest's struct_ops handling, from Andrii Nakryiko. 5) Improvements to BPF selftests in context of BPF gcc backend, from Jose E. Marchesi & David Faust. 6) Migrate remaining BPF selftest tests from test_sock_addr.c to prog_test- -style in order to retire the old test, run it in BPF CI and additionally expand test coverage, from Jordan Rife. 7) Big batch for BPF selftest refactoring in order to remove duplicate code around common network helpers, from Geliang Tang. 8) Another batch of improvements to BPF selftests to retire obsolete bpf_tcp_helpers.h as everything is available vmlinux.h, from Martin KaFai Lau. 9) Fix BPF map tear-down to not walk the map twice on free when both timer and wq is used, from Benjamin Tissoires. 10) Fix BPF verifier assumptions about socket->sk that it can be non-NULL, from Alexei Starovoitov. 11) Change BTF build scripts to using --btf_features for pahole v1.26+, from Alan Maguire. 12) Small improvements to BPF reusing struct_size() and krealloc_array(), from Andy Shevchenko. 13) Fix s390 JIT to emit a barrier for BPF_FETCH instructions, from Ilya Leoshkevich. 14) Extend TCP ->cong_control() callback in order to feed in ack and flag parameters and allow write-access to tp->snd_cwnd_stamp from BPF program, from Miao Xu. 15) Add support for internal-only per-CPU instructions to inline bpf_get_smp_processor_id() helper call for arm64 and riscv64 BPF JITs, from Puranjay Mohan. 16) Follow-up to remove the redundant ethtool.h from tooling infrastructure, from Tushar Vyavahare. 17) Extend libbpf to support "module:<function>" syntax for tracing programs, from Viktor Malik. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits) bpf: make list_for_each_entry portable bpf: ignore expected GCC warning in test_global_func10.c bpf: disable strict aliasing in test_global_func9.c selftests/bpf: Free strdup memory in xdp_hw_metadata selftests/bpf: Fix a few tests for GCC related warnings. bpf: avoid gcc overflow warning in test_xdp_vlan.c tools: remove redundant ethtool.h from tooling infra selftests/bpf: Expand ATTACH_REJECT tests selftests/bpf: Expand getsockname and getpeername tests sefltests/bpf: Expand sockaddr hook deny tests selftests/bpf: Expand sockaddr program return value tests selftests/bpf: Retire test_sock_addr.(c|sh) selftests/bpf: Remove redundant sendmsg test cases selftests/bpf: Migrate ATTACH_REJECT test cases selftests/bpf: Migrate expected_attach_type tests selftests/bpf: Migrate wildcard destination rewrite test selftests/bpf: Migrate sendmsg6 v4 mapped address tests selftests/bpf: Migrate sendmsg deny test cases selftests/bpf: Migrate WILDCARD_IP test selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases ... ==================== Link: https://lore.kernel.org/r/20240513134114.17575-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
b56035101e |
netdev: Add queue stats for TX stop and wake
TX queue stop and wake are counted by some drivers. Support reporting these via netdev-genl queue stats. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20240510201927.1821109-2-danielj@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
bbe91a9f68 |
tools: remove redundant ethtool.h from tooling infra
Remove the redundant ethtool.h header file from tools/include/uapi/linux. The file is unnecessary as the system uses the kernel's include/uapi/linux/ethtool.h directly. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240508104123.434769-1-tushar.vyavahare@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
bbed8b9ffe |
tools lib rbtree: pick some improvements from the kernel rbtree code
The tools/lib/rbtree.c code came from the kernel. Remove the
EXPORT_SYMBOL() that make sense only there. Unfortunately it is not being
checked with tools/perf/check_headers.sh. Will try to remedy this. Until
then pick the improvements from:
|
||
|
|
651d61bc8b |
KVM: PPC: Fix documentation for ppc mmu caps
The documentation mentions KVM_CAP_PPC_RADIX_MMU, but the defines in the
kvm headers spell it KVM_CAP_PPC_MMU_RADIX. Similarly with
KVM_CAP_PPC_MMU_HASH_V3.
Fixes:
|
||
|
|
9c313ccdfc
|
bitops: Change function return types from long to int
Change the return types of bitops functions (ffs, fls, and fns) from long to int. The expected return values are in the range [0, 64], for which int is sufficient. Additionally, int aligns well with the return types of the corresponding __builtin_* functions, potentially reducing overall type conversions. Many of the existing bitops functions already return an int and don't need to be changed. The bitops functions in arch/ should be considered separately. Adjust some return variables to match the function return types. With GCC 13 and defconfig, these changes reduced the size of a test kernel image by 5,432 bytes on arm64 and by 248 bytes on riscv; there were no changes in size on x86_64, powerpc, or m68k. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> |
||
|
|
535a3692ba |
bpf: Add support for kprobe session attach
Adding support to attach bpf program for entry and return probe of the same function. This is common use case which at the moment requires to create two kprobe multi links. Adding new BPF_TRACE_KPROBE_SESSION attach type that instructs kernel to attach single link program to both entry and exit probe. It's possible to control execution of the bpf program on return probe simply by returning zero or non zero from the entry bpf program execution to execute or not the bpf program on return probe respectively. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240430112830.1184228-2-jolsa@kernel.org |
||
|
|
0cfe71f45f |
netdev: add queue stats
These stats are commonly. Support reporting those via netdev-genl queue stats. name: rx-hw-drops name: rx-hw-drop-overruns name: rx-csum-unnecessary name: rx-csum-none name: rx-csum-bad name: rx-hw-gro-packets name: rx-hw-gro-bytes name: rx-hw-gro-wire-packets name: rx-hw-gro-wire-bytes name: rx-hw-drop-ratelimits name: tx-hw-drops name: tx-hw-drop-errors name: tx-csum-none name: tx-needs-csum name: tx-hw-gso-packets name: tx-hw-gso-bytes name: tx-hw-gso-wire-packets name: tx-hw-gso-wire-bytes name: tx-hw-drop-ratelimits Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
89de2db193 |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZi9+AAAKCRDbK58LschI g0nEAP487m7L0nLVriC2oIOWsi29tklW3etm6DO7gmGRGIHgrgEAnMyV1xBj3bGj v6jJwDcybCym1hLx+1x1JCZ4eoAFswE= =xbna -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-04-29 We've added 147 non-merge commits during the last 32 day(s) which contain a total of 158 files changed, 9400 insertions(+), 2213 deletions(-). The main changes are: 1) Add an internal-only BPF per-CPU instruction for resolving per-CPU memory addresses and implement support in x86 BPF JIT. This allows inlining per-CPU array and hashmap lookups and the bpf_get_smp_processor_id() helper, from Andrii Nakryiko. 2) Add BPF link support for sk_msg and sk_skb programs, from Yonghong Song. 3) Optimize x86 BPF JIT's emit_mov_imm64, and add support for various atomics in bpf_arena which can be JITed as a single x86 instruction, from Alexei Starovoitov. 4) Add support for passing mark with bpf_fib_lookup helper, from Anton Protopopov. 5) Add a new bpf_wq API for deferring events and refactor sleepable bpf_timer code to keep common code where possible, from Benjamin Tissoires. 6) Fix BPF_PROG_TEST_RUN infra with regards to bpf_dummy_struct_ops programs to check when NULL is passed for non-NULLable parameters, from Eduard Zingerman. 7) Harden the BPF verifier's and/or/xor value tracking, from Harishankar Vishwanathan. 8) Introduce crypto kfuncs to make BPF programs able to utilize the kernel crypto subsystem, from Vadim Fedorenko. 9) Various improvements to the BPF instruction set standardization doc, from Dave Thaler. 10) Extend libbpf APIs to partially consume items from the BPF ringbuffer, from Andrea Righi. 11) Bigger batch of BPF selftests refactoring to use common network helpers and to drop duplicate code, from Geliang Tang. 12) Support bpf_tail_call_static() helper for BPF programs with GCC 13, from Jose E. Marchesi. 13) Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF program to have code sections where preemption is disabled, from Kumar Kartikeya Dwivedi. 14) Allow invoking BPF kfuncs from BPF_PROG_TYPE_SYSCALL programs, from David Vernet. 15) Extend the BPF verifier to allow different input maps for a given bpf_for_each_map_elem() helper call in a BPF program, from Philo Lu. 16) Add support for PROBE_MEM32 and bpf_addr_space_cast instructions for riscv64 and arm64 JITs to enable BPF Arena, from Puranjay Mohan. 17) Shut up a false-positive KMSAN splat in interpreter mode by unpoison the stack memory, from Martin KaFai Lau. 18) Improve xsk selftest coverage with new tests on maximum and minimum hardware ring size configurations, from Tushar Vyavahare. 19) Various ReST man pages fixes as well as documentation and bash completion improvements for bpftool, from Rameez Rehman & Quentin Monnet. 20) Fix libbpf with regards to dumping subsequent char arrays, from Quentin Deslandes. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (147 commits) bpf, docs: Clarify PC use in instruction-set.rst bpf_helpers.h: Define bpf_tail_call_static when building with GCC bpf, docs: Add introduction for use in the ISA Internet Draft selftests/bpf: extend BPF_SOCK_OPS_RTT_CB test for srtt and mrtt_us bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args selftests/bpf: dummy_st_ops should reject 0 for non-nullable params bpf: check bpf_dummy_struct_ops program params for test runs selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops selftests/bpf: adjust dummy_st_ops_success to detect additional error bpf: mark bpf_dummy_struct_ops.test_1 parameter as nullable selftests/bpf: Add ring_buffer__consume_n test. bpf: Add bpf_guard_preempt() convenience macro selftests: bpf: crypto: add benchmark for crypto functions selftests: bpf: crypto skcipher algo selftests bpf: crypto: add skcipher to bpf crypto bpf: make common crypto API for TC/XDP programs bpf: update the comment for BTF_FIELDS_MAX selftests/bpf: Fix wq test. selftests/bpf: Use make_sockaddr in test_sock_addr selftests/bpf: Use connect_to_addr in test_sock_addr ... ==================== Link: https://lore.kernel.org/r/20240429131657.19423-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
450f941ea9 |
tools headers: Synchronize linux/bits.h with the kernel sources
To pick up the changes in this cset:
|
||
|
|
cd88c11c6d |
tools lib rbtree: Pick some improvements from the kernel rbtree code
The tools/lib/rbtree.c code came from the kernel, removing the
EXPORT_SYMBOL() that make sense only there, unfortunately it is not
being checked with tools/perf/check_headers.sh, will try to remedy this,
till then pick the improvements from:
|
||
|
|
580ea358af |
selftests/mm: fix additional build errors for selftests
These build errors only occur if one fails to first run "make headers". However, that is a non-obvious and instrusive requirement, and so there was a discussion on how to get rid of it [1]. This uses that solution. These two files were created by taking a snapshot of the generated header files that are created via "make headers". These two files were copied from ./usr/include/linux/ to ./tools/include/uapi/linux/ . That fixes the selftests/mm build on today's Arch Linux (which required the userfaultfd.h) and Ubuntu 23.04 (which additionally required memfd.h). [1] https://lore.kernel.org/all/783a4178-1dec-4e30-989a-5174b8176b09@redhat.com/ Link: https://lkml.kernel.org/r/20240328033418.203790-3-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Mark Brown <broonie@kernel.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
48e2cd3e3d |
bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args
Two important arguments in RTT estimation, mrtt and srtt, are passed to tcp_bpf_rtt(), so that bpf programs get more information about RTT computation in BPF_SOCK_OPS_RTT_CB. The difference between bpf_sock_ops->srtt_us and the srtt here is: the former is an old rtt before update, while srtt passed by tcp_bpf_rtt() is that after update. Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Link: https://lore.kernel.org/r/20240425161724.73707-2-lulie@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
|
|
2bd87951de |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_prueth.c net/mac80211/chan.c |
||
|
|
f1d0a2fbb0 |
tools: sync include/uapi/linux/bpf.h
cp include/uapi/linux/bpf.h tools/include/uapi/linux/bpf.h Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-6-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
173b0b5b0e |
Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up fixes sent via perf-tools, by Namhyung Kim. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
13a2e429f6 |
perf tools fixes for v6.9
A random set of small bug fixes: * Fix perf annotate TUI when used with data type profiling * Work around BPF verifier about sighand lock checking And a set of kernel header synchronization. Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZiLYdwAKCRCMstVUGiXM g6kDAP9GYdWqmbYZhKgQQQwAVkgkB0j19vLu+6xphi6P3ul7pwD+Nr5mHEcqytU/ I1r0lrV/ybNmpUK3wFyRsyvRmnxe/Qw= =/181 -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.9-2024-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Namhyung Kim: "A random set of small bug fixes: - Fix perf annotate TUI when used with data type profiling - Work around BPF verifier about sighand lock checking And a set of kernel header synchronization" * tag 'perf-tools-fixes-for-v6.9-2024-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: tools/include: Sync arm64 asm/cputype.h with the kernel sources tools/include: Sync asm-generic/bitops/fls.h with the kernel sources tools/include: Sync x86 asm/msr-index.h with the kernel sources tools/include: Sync x86 asm/irq_vectors.h with the kernel sources tools/include: Sync x86 CPU feature headers with the kernel sources tools/include: Sync uapi/sound/asound.h with the kernel sources tools/include: Sync uapi/linux/kvm.h and asm/kvm.h with the kernel sources tools/include: Sync uapi/linux/fs.h with the kernel sources tools/include: Sync uapi/drm/i915_drm.h with the kernel sources perf lock contention: Add a missing NULL check perf annotate: Make sure to call symbol__annotate2() in TUI |
||
|
|
61ba075d99 |
Revert "tools headers: Remove almost unused copy of uapi/stat.h, add few conditional defines"
This reverts commit
|
||
|
|
0adab2b6b7 |
tools/nolibc: add support for uname(2)
All supported kernels are assumed to use struct new_utsname. This is validated in test_uname(). uname(2) can for example be used in ksft_min_kernel_version() from the kernels selftest framework. Link: https://lore.kernel.org/lkml/20240412123536.GA32444@redhat.com/ Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> |
||
|
|
94426ed213 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. Conflicts: net/unix/garbage.c |
||
|
|
99e4e1174a |
tools/include: Sync asm-generic/bitops/fls.h with the kernel sources
To pick up the changes from:
|
||
|
|
b7ce17f257 |
tools/include: Sync uapi/sound/asound.h with the kernel sources
To pick up the changes from:
|
||
|
|
bee3b820c6 |
tools/include: Sync uapi/linux/kvm.h and asm/kvm.h with the kernel sources
To pick up the changes from: |
||
|
|
4cfa8a873d |
tools/include: Sync uapi/linux/fs.h with the kernel sources
To pick up the changes from: |
||
|
|
3ef842a77e |
tools/include: Sync uapi/drm/i915_drm.h with the kernel sources
To pick up changes from: |
||
|
|
699c23f02c |
bpf: Add bpf_link support for sk_msg and sk_skb progs
Add bpf_link support for sk_msg and sk_skb programs. We have an internal request to support bpf_link for sk_msg programs so user space can have a uniform handling with bpf_link based libbpf APIs. Using bpf_link based libbpf API also has a benefit which makes system robust by decoupling prog life cycle and attachment life cycle. Reviewed-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240410043527.3737160-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
e93b912ecf |
tools/nolibc/string: remove open-coded strnlen()
The same header already defines an implementation of strnlen(), so use it. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> |
||
|
|
fbffce819e |
tools/nolibc: Fix strlcpy() return code and size usage
The return code should always be strlen(src), and we should copy at most size-1 bytes. While we are there, make sure to null-terminate the dst buffer if we copied something. Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> |
||
|
|
34d232c39a |
tools/nolibc: Fix strlcat() return code and size usage
The return code should always be strlen(src) + strnlen(dst, size). Let's make sure to copy at most size-1 bytes from src and null-terminate the dst buffer if we did copied something. While we can use strnlen() and strncpy() to implement strlcat(), this is simple enough and results in shorter code when compiled. Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> |
||
|
|
689230b674 |
tools/nolibc/string: export strlen()
As with commit
|
||
|
|
791f464114 |
tools/nolibc/stdlib: fix memory error in realloc()
Pass user_p_len to memcpy() instead of heap->len to prevent realloc()
from copying an extra sizeof(heap) bytes from beyond the allocated
region.
Signed-off-by: Brennan Xavier McManus <bxmcmanus@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Fixes:
|
||
|
|
4f0a8fe321 |
memblock tests: fix build errors
* add stubs to functions that calls to them were recently added to memblock
but they were missing in tests
* update gfp_types.h to include bits.h so that BIT() definitions won't
depend on other includes
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmYTgQ4QHHJwcHRAa2Vy
bmVsLm9yZwAKCRA5A4Ymyw79kas7CACU1P6gzbtsz9w1YcJtElowyvQ22dhVO918
Iu9Y95QkFrJHnKVjrGiIlZVi3Mt0c3xvYm8D16ik3CRvBbLkvGdPGeo7kKeZrF8j
RvicmSjJduEsiPceCy4+bcFjQx0P3JQ6jKtYwJj21kJrV2j1T3Qob1KVjEmJt+h0
xcphdyXW1bjU+EgZyEka5ct0nYUDP+a9DwUETFxG+s/Cnp5SMprZHjsdXormHx4j
8QyU+P916wWmzEtWlLY4yLhe6849Rx9Cse8E8ViWqwxfIZELJ97A5qvHwD9dHCOX
8oZHzRgUslKoSJWutkjYHeOtB7kDhv1l+KZYOYbC4XE7WUooQs/O
=CVgG
-----END PGP SIGNATURE-----
Merge tag 'fixes-2024-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fixes from Mike Rapoport:
"Fix build errors in memblock tests:
- add stubs to functions that calls to them were recently added to
memblock but they were missing in tests
- update gfp_types.h to include bits.h so that BIT() definitions
won't depend on other includes"
* tag 'fixes-2024-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock tests: fix undefined reference to `BIT'
memblock tests: fix undefined reference to `panic'
memblock tests: fix undefined reference to `early_pfn_to_nid'
|
||
|
|
343ca8131c |
selftests/bpf: add fp-leaking precise subprog result tests
Add selftests validating that BPF verifier handles precision marking for SCALAR registers derived from r10 (fp) register correctly. Given `r0 = (s8)r10;` syntax is not supported by older Clang compilers, use the raw BPF instruction syntax to maximize compatibility. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240404214536.3551295-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
cf1ca1f66d |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/ip_gre.c |
||
|
|
f917170072 |
bpf: Pack struct bpf_fib_lookup
The struct bpf_fib_lookup is supposed to be of size 64. A recent commit |
||
|
|
e0f5a8e74b |
memblock tests: fix undefined reference to `panic'
commit
|
||
|
|
7d8ed162e6 |
memblock tests: fix undefined reference to `early_pfn_to_nid'
commit
|
||
|
|
7effe3fdc0 |
tools: Add ethtool.h header to tooling infra
This commit duplicates the ethtool.h file from the include/uapi/linux directory in the kernel source to the tools/include/uapi/linux directory. This action ensures that the ethtool.h file used in the tools directory is in sync with the kernel's version, maintaining consistency across the codebase. There are some checkpatch warnings in this file that could be cleaned up, but I preferred to move it over as-is for now to avoid disrupting the code. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-2-tushar.vyavahare@intel.com |
||
|
|
ca4ddc26f8 |
bpf: Fix typo in uapi doc comments
In a few places in the bpf uapi headers, EOPNOTSUPP is missing a "P" in the doc comments. This adds the missing "P". Signed-off-by: David Lechner <dlechner@baylibre.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240329152900.398260-2-dlechner@baylibre.com |
||
|
|
a37fbe666c |
bitmap: introduce generic optimized bitmap_size()
The number of times yet another open coded `BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge. Some generic helper is long overdue. Add one, bitmap_size(), but with one detail. BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both divident and divisor are compile-time constants or when the divisor is not a pow-of-2. When it is however, the compilers sometimes tend to generate suboptimal code (GCC 13): 48 83 c0 3f add $0x3f,%rax 48 c1 e8 06 shr $0x6,%rax 48 8d 14 c5 00 00 00 00 lea 0x0(,%rax,8),%rdx %BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does full division of `nbits + 63` by it and then multiplication by 8. Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC: 8d 50 3f lea 0x3f(%rax),%edx c1 ea 03 shr $0x3,%edx 81 e2 f8 ff ff 1f and $0x1ffffff8,%edx Now it shifts `nbits + 63` by 3 positions (IOW performs fast division by 8) and then masks bits[2:0]. bloat-o-meter: add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617) Clang does it better and generates the same code before/after starting from -O1, except that with the ALIGN() approach it uses %edx and thus still saves some bytes: add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520) Note that we can't expand DIV_ROUND_UP() by adding a check and using this approach there, as it's used in array declarations where expressions are not allowed. Add this helper to tools/ as well. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
10a04ff09b |
tools: move alignment-related macros to new <linux/align.h>
Currently, tools have *ALIGN*() macros scattered across the unrelated headers, as there are only 3 of them and they were added separately each time on an as-needed basis. Anyway, let's make it more consistent with the kernel headers and allow using those macros outside of the mentioned headers. Create <linux/align.h> inside the tools/ folder and include it where needed. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
7d8296b250 |
bitops: make BYTES_TO_BITS() treewide-available
Avoid open-coding that simple expression each time by moving BYTES_TO_BITS() from the probes code to <linux/bitops.h> to export it to the rest of the kernel. Simplify the macro while at it. `BITS_PER_LONG / sizeof(long)` always equals to %BITS_PER_BYTE, regardless of the target architecture. Do the same for the tools ecosystem as well (incl. its version of bitops.h). The previous implementation had its implicit type of long, while the new one is int, so adjust the format literal accordingly in the perf code. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
62248b22d0 |
tools/resolve_btfids: fix build with musl libc
Include the header that defines u32.
This fixes build of 6.6.23 and 6.1.83 kernels for Alpine Linux, which
uses musl libc. I assume that GNU libc indirecly pulls in linux/types.h.
Fixes:
|
||
|
|
5311591fbb |
bpf: Add support for passing mark with bpf_fib_lookup
Extend the bpf_fib_lookup() helper by making it to utilize mark if the BPF_FIB_LOOKUP_MARK flag is set. In order to pass the mark the four bytes of struct bpf_fib_lookup are used, shared with the output-only smac/dmac fields. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240326101742.17421-2-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
5e47fbe5ce |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
2a702c2e57 |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZgHylwAKCRDbK58LschI gzmaAPwKhDFFSU/DU08k22muJxLIXVR7Xx04baJ9mPiFrqZyyAEA8RFNamC7wZIB AnfwwoDjfDTP60rlXFaEf8UT5PpA7Ao= =/KF6 -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-03-25 We've added 38 non-merge commits during the last 13 day(s) which contain a total of 50 files changed, 867 insertions(+), 274 deletions(-). The main changes are: 1) Add the ability to specify and retrieve BPF cookie also for raw tracepoint programs in order to ease migration from classic to raw tracepoints, from Andrii Nakryiko. 2) Allow the use of bpf_get_{ns_,}current_pid_tgid() helper for all program types and add additional BPF selftests, from Yonghong Song. 3) Several improvements to bpftool and its build, for example, enabling libbpf logs when loading pid_iter in debug mode, from Quentin Monnet. 4) Check the return code of all BPF-related set_memory_*() functions during load and bail out in case they fail, from Christophe Leroy. 5) Avoid a goto in regs_refine_cond_op() such that the verifier can be better integrated into Agni tool which doesn't support backedges yet, from Harishankar Vishwanathan. 6) Add a small BPF trie perf improvement by always inlining longest_prefix_match, from Jesper Dangaard Brouer. 7) Small BPF selftest refactor in bpf_tcp_ca.c to utilize start_server() helper instead of open-coding it, from Geliang Tang. 8) Improve test_tc_tunnel.sh BPF selftest to prevent client connect before the server bind, from Alessandro Carminati. 9) Fix BPF selftest benchmark for older glibc and use syscall(SYS_gettid) instead of gettid(), from Alan Maguire. 10) Implement a backward-compatible method for struct_ops types with additional fields which are not present in older kernels, from Kui-Feng Lee. 11) Add a small helper to check if an instruction is addr_space_cast from as(0) to as(1) and utilize it in x86-64 JIT, from Puranjay Mohan. 12) Small cleanup to remove unnecessary error check in bpf_struct_ops_map_update_elem, from Martin KaFai Lau. 13) Improvements to libbpf fd validity checks for BPF map/programs, from Mykyta Yatsenko. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (38 commits) selftests/bpf: Fix flaky test btf_map_in_map/lookup_update bpf: implement insn_is_cast_user() helper for JITs bpf: Avoid get_kernel_nofault() to fetch kprobe entry IP selftests/bpf: Use start_server in bpf_tcp_ca bpf: Sync uapi bpf.h to tools directory libbpf: Add new sec_def "sk_skb/verdict" selftests/bpf: Mark uprobe trigger functions with nocf_check attribute selftests/bpf: Use syscall(SYS_gettid) instead of gettid() wrapper in bench bpf-next: Avoid goto in regs_refine_cond_op() bpftool: Clean up HOST_CFLAGS, HOST_LDFLAGS for bootstrap bpftool selftests/bpf: scale benchmark counting by using per-CPU counters bpftool: Remove unnecessary source files from bootstrap version bpftool: Enable libbpf logs when loading pid_iter in debug mode selftests/bpf: add raw_tp/tp_btf BPF cookie subtests libbpf: add support for BPF cookie for raw_tp/tp_btf programs bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs bpf: pass whole link instead of prog when triggering raw tracepoint bpf: flatten bpf_probe_register call chain selftests/bpf: Prevent client connect before server bind in test_tc_tunnel.sh selftests/bpf: Add a sk_msg prog bpf_get_ns_current_pid_tgid() test ... ==================== Link: https://lore.kernel.org/r/20240325233940.7154-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
476a5e9291 |
bpf: Sync uapi bpf.h to tools directory
There is a difference between kernel uapi bpf.h and tools uapi bpf.h. There is no functionality difference, but let us sync properly to make it easy for later bpf.h update. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240325033842.1693553-1-yonghong.song@linux.dev |
||
|
|
1e3cd03c54 |
LoongArch changes for v6.9
1, Add objtool support for LoongArch;
2, Add ORC stack unwinder support for LoongArch;
3, Add kernel livepatching support for LoongArch;
4, Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig;
5, Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig;
6, Some bug fixes and other small changes.
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmX69KsWHGNoZW5odWFj
YWlAa2VybmVsLm9yZwAKCRAChivD8uImerjFD/9rAzm1+G4VxFvFzlOiJXEqquNJ
+Vz2fAZLU3lJhBlx0uUKXFVijvLe8s/DnoLrM9e/M6gk4ivT9eszy3DnqT3NjGDX
njYFkPUWZhZGACmbkoVk9St80R8sPIdZrwXtW3q7g3T0bC7LXUXrJw52Sh4gmbYx
RqLsE6GoEWGY0zhhWqeeAM9LkKDuLxxyjH4fYE4g623EhQt7A7hP5okyaC+xHzp+
qp/4dPFLu61LeqIfeBUKK7nQ6uzno3EWLiME2eHEHiuelYfzmh+BtNMcX9Ugb/En
j0vLGNsoDGmEYw7xGa6OSRaCR/nCwVJz4SvuH33wbbbHhVAiUKUBVNFR3gmAtLlc
BSa2dDZbKhHkiWSUCM9K2ihr7WiQNuraTK1kKHwBgfa+RbEVOTu1q8yokAB9XCaT
T7lijJ8MKQmzHpMvgev7nN41baDB6V5bPIni0Ueh+NhQJKZ2/IxtYA3XzV5D0UgL
TBovVgYB/VNThS9gzOrlenKuDX9hT+kCQgyudErXaoIo645P6dsPFowOZRQxCEIv
WnLskZatLTCA8xWl1XyC1bqtGxhp34Gbhg0ZcvUqlNE20luaK/qi8wtW9Mv1Utp+
aXFO3i7d93z99oAcUT0oc1N83T0x0M/p69Z+rL/2+L0sYQgBf1cwUEiDNRW4OCdI
h15289rRTxjeL7NZPw==
=lSkY
-----END PGP SIGNATURE-----
Merge tag 'loongarch-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Add objtool support for LoongArch
- Add ORC stack unwinder support for LoongArch
- Add kernel livepatching support for LoongArch
- Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig
- Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig
- Some bug fixes and other small changes
* tag 'loongarch-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch/crypto: Clean up useless assignment operations
LoongArch: Define the __io_aw() hook as mmiowb()
LoongArch: Remove superfluous flush_dcache_page() definition
LoongArch: Move {dmw,tlb}_virt_to_page() definition to page.h
LoongArch: Change __my_cpu_offset definition to avoid mis-optimization
LoongArch: Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig
LoongArch: Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig
LoongArch: Add kernel livepatching support
LoongArch: Add ORC stack unwinder support
objtool: Check local label in read_unwind_hints()
objtool: Check local label in add_dead_ends()
objtool/LoongArch: Enable orc to be built
objtool/x86: Separate arch-specific and generic parts
objtool/LoongArch: Implement instruction decoder
objtool/LoongArch: Enable objtool to be built
|
||
|
|
af8d27bf15 |
selftests/bpf: Mark uprobe trigger functions with nocf_check attribute
Some distros seem to enable the -fcf-protection=branch by default, which breaks our setup on first instruction of uprobe trigger functions and place there endbr64 instruction. Marking them with nocf_check attribute to skip that. Ignoring unknown attribute warning in gcc for bench objects, because nocf_check can be used only when -fcf-protection=branch is enabled, otherwise we get a warning and break compilation. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240322134936.1075395-1-jolsa@kernel.org |
||
|
|
4962e19496 |
perf beauty: Move uapi/linux/vhost.h copy out of the directory used to build perf
It is only used to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it, just <linux/vhost.h> coming from either 'make install_headers' or from the system /usr/include/ directory. Suggested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
2316ef5891 |
perf beauty: Introduce scrape script for 'clone' syscall 'flags' argument
It was using the first variation on producing a string representation
for a binary flag, one that used the copy of uapi/linux/sched.h with
preprocessor tricks that had to be updated everytime a new flag was
introduced.
Use the more recent scrape script + strarray + strarray__scnprintf_flags() combo.
$ tools/perf/trace/beauty/clone.sh | head -5
static const char *clone_flags[] = {
[ilog2(0x00000100) + 1] = "VM",
[ilog2(0x00000200) + 1] = "FS",
[ilog2(0x00000400) + 1] = "FILES",
[ilog2(0x00000800) + 1] = "SIGHAND",
$
Now we can move uapi/linux/sched.h from tools/include/, that is used for
building perf to the scrape only directory tools/perf/trace/beauty/include.
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZfnULIn3XKDq0bpc@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
a672af9139 |
tools headers: Remove almost unused copy of uapi/stat.h, add few conditional defines
These were used to build perf to provide defines not available in older
distros, but this was back in 2017, nowadays most the distros that are
supported and I have build containers for work using just the system
headers, so ditch them.
For the few that don't have STATX_MNT_ID{_UNIQUE}, or STATX_MNT_DIOALIGN
add them conditionally.
Some of these older distros may not have things that are used in 'perf
trace', but then they also don't have libtraceevent packages, so don't
build 'perf trace'.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/20240315204835.748716-6-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
8a1ad44135 |
tools headers: Remove now unused copies of uapi/{fcntl,openat2}.h and asm/fcntl.h
These were used to build perf to provide defines not available in older distros, but this was back in 2017, nowadays all the distros that are supported and I have build containers for work using just the system headers, so ditch them. Some of these older distros may not have things that are used in 'perf trace', but then they also don't have libtraceevent packages, so don't build 'perf trace'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240315204835.748716-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
eb01fe7abb |
perf beauty: Move prctl.h files (uapi/linux and x86's) copy out of the directory used to build perf
It is used only to generate string tables, not to build perf, so move it
to the tools/perf/trace/beauty/{include,arch}/ hierarchies, that is used
just for scraping.
This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.
No other tools/ living code uses it, just <linux/usbdevice_fs.h> coming
from either 'make install_headers' or from the system /usr/include/
directory.
Suggested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/20240315204835.748716-3-acme@kernel.org
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
7050e33e86 |
perf beauty: Move uapi/sound/asound.h copy out of the directory used to build perf
It is used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
44512bd613 |
perf beauty: Move uapi/linux/usbdevice_fs.h copy out of the directory used to build perf
It is mostly used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it, just <linux/usbdevice_fs.h> coming from either 'make install_headers' or from the system /usr/include/ directory. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
ab3316119f |
perf beauty: Move uapi/linux/mount.h copy out of the directory used to build perf
It is mostly used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it, just <linux/mount.h> coming from either 'make install_headers' or from the system /usr/include/ directory. Suggested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
faf7217a39 |
perf beauty: Move uapi/linux/fs.h copy out of the directory used to build perf
It is mostly used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. The only case where it was being used to build was in tools/perf/trace/beauty/sync_file_range.c, because some older systems doesn't have the SYNC_FILE_RANGE_WRITE_AND_WAIT define, just use the system's linux/fs.h header instead, defining it if not available. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it, just <linux/fs.h> coming from either 'make install_headers' or from the system /usr/include/ directory. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
68ca5d4eeb |
bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
Wire up BPF cookie for raw tracepoint programs (both BTF and non-BTF aware variants). This brings them up to part w.r.t. BPF cookie usage with classic tracepoint and fentry/fexit programs. Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Message-ID: <20240319233852.1977493-4-andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
0ea680eda6 |
slab changes for 6.9
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmXwH0wACgkQu+CwddJF iJq3HAf6A/0m0pSr0QDcwjM8D7TVYQJ+Z/jPC6Mj+HfTcF8Otrgk8c0M6EsHGIGF GQNnYJRKmBla3mpVFvDtsVZuiakEtRLCpoP5n23s8p8gY9ibJcl6bpn9NaMVMKrq kBnhQ9VdLAgKVcTH8wz6jJqdWiZ7W4jGH5NWO+nr+r0H7vay7jfB0+tur1NO8J09 HE5I76XE6ArRvaKYxvsZmOx1pihSmsJ7CerXN6Y8U5qcuxNXdUO/9rf+uv5llDIV gl54UAU79koZ9k88t5AiSKO2IZVhBgC/j66ds9MRRAFCf/ldxUtJIlsHTOnumfmy FApqwtR0MYNPeMPZpzogQbv58oOcNw== =XDxn -----END PGP SIGNATURE----- Merge tag 'slab-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - Freelist loading optimization (Chengming Zhou) When the per-cpu slab is depleted and a new one loaded from the cpu partial list, optimize the loading to avoid an irq enable/disable cycle. This results in a 3.5% performance improvement on the "perf bench sched messaging" test. - Kernel boot parameters cleanup after SLAB removal (Xiongwei Song) Due to two different main slab implementations we've had boot parameters prefixed either slab_ and slub_ with some later becoming an alias as both implementations gained the same functionality (i.e. slab_nomerge vs slub_nomerge). In order to eventually get rid of the implementation-specific names, the canonical and documented parameters are now all prefixed slab_ and the slub_ variants become deprecated but still working aliases. - SLAB_ kmem_cache creation flags cleanup (Vlastimil Babka) The flags had hardcoded #define values which became tedious and error-prone when adding new ones. Assign the values via an enum that takes care of providing unique bit numbers. Also deprecate SLAB_MEM_SPREAD which was only used by SLAB, so it's a no-op since SLAB removal. Assign it an explicit zero value. The removals of the flag usage are handled independently in the respective subsystems, with a final removal of any leftover usage planned for the next release. - Misc cleanups and fixes (Chengming Zhou, Xiaolei Wang, Zheng Yejian) Includes removal of unused code or function parameters and a fix of a memleak. * tag 'slab-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: slab: remove PARTIAL_NODE slab_state mm, slab: remove memcg_from_slab_obj() mm, slab: remove the corner case of inc_slabs_node() mm/slab: Fix a kmemleak in kmem_cache_destroy() mm, slab, kasan: replace kasan_never_merge() with SLAB_NO_MERGE mm, slab: use an enum to define SLAB_ cache creation flags mm, slab: deprecate SLAB_MEM_SPREAD flag mm, slab: fix the comment of cpu partial list mm, slab: remove unused object_size parameter in kmem_cache_flags() mm/slub: remove parameter 'flags' in create_kmalloc_caches() mm/slub: remove unused parameter in next_freelist_entry() mm/slub: remove full list manipulation for non-debug slab mm/slub: directly load freelist from cpu partial slab in the likely case mm/slub: make the description of slab_min_objects helpful in doc mm/slub: replace slub_$params with slab_$params in slub.rst mm/slub: unify all sl[au]b parameters with "slab_$param" Documentation: kernel-parameters: remove noaliencache |
||
|
|
5f20e6ab1f |
for-netdev
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmXvm7IACgkQ6rmadz2v bTqdMA//VMHNHVLb4oROoXyQD9fw2mCmIUEKzP88RXfqcxsfEX7HF+k8B5ZTk0ro CHXTAnc79+Qqg0j24bkQKxup/fKBQVw9D+Ia4b3ytlm1I2MtyU/16xNEzVhAPU2D iKk6mVBsEdCbt/GjpWORy/VVnZlZpC7BOpZLxsbbxgXOndnCegyjXzSnLGJGxdvi zkrQTn2SrFzLi6aNpVLqrv6Nks6HJusfCKsIrtlbkQ85dulasHOtwK9s6GF60nte aaho+MPx3L+lWEgapsm8rR779pHaYIB/GbZUgEPxE/xUJ/V8BzDgFNLMzEiIBRMN a0zZam11BkBzCfcO9gkvDRByaei/dZz2jdqfU4GlHklFj1WFfz8Q7fRLEPINksvj WXLgJADGY5mtGbjG21FScThxzj+Ruqwx0a13ddlyI/W+P3y5yzSWsLwJG5F9p0oU 6nlkJ4U8yg+9E1ie5ae0TibqvRJzXPjfOERZGwYDSVvfQGzv1z+DGSOPMmgNcWYM dIaO+A/+NS3zdbk8+1PP2SBbhHPk6kWyCUByWc7wMzCPTiwriFGY/DD2sN+Fsufo zorzfikUQOlTfzzD5jbmT49U8hUQUf6QIWsu7BijSiHaaC7am4S8QB2O6ibJMqdv yNiwvuX+ThgVIY3QKrLLqL0KPGeKMR5mtfq6rrwSpfp/b4g27FE= =eFgA -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2024-03-11 We've added 59 non-merge commits during the last 9 day(s) which contain a total of 88 files changed, 4181 insertions(+), 590 deletions(-). The main changes are: 1) Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages to be used in bpf_arena, from Alexei. 2) Introduce bpf_arena which is sparse shared memory region between bpf program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and bpf programs, from Alexei and Andrii. 3) Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it, from Alexei. 4) Use IETF format for field definitions in the BPF standard document, from Dave. 5) Extend struct_ops libbpf APIs to allow specify version suffixes for stuct_ops map types, share the same BPF program between several map definitions, and other improvements, from Eduard. 6) Enable struct_ops support for more than one page in trampolines, from Kui-Feng. 7) Support kCFI + BPF on riscv64, from Puranjay. 8) Use bpf_prog_pack for arm64 bpf trampoline, from Puranjay. 9) Fix roundup_pow_of_two undefined behavior on 32-bit archs, from Toke. ==================== Link: https://lore.kernel.org/r/20240312003646.8692-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
667a86ad9b |
bpf: Disasm support for addr_space_cast instruction.
LLVM generates rX = addr_space_cast(rY, dst_addr_space, src_addr_space) instruction when pointers in non-zero address space are used by the bpf program. Recognize this insn in uapi and in bpf disassembler. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20240308010812.89848-3-alexei.starovoitov@gmail.com |
||
|
|
317460317a |
bpf: Introduce bpf_arena.
Introduce bpf_arena, which is a sparse shared memory region between the bpf program and user space. Use cases: 1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed anonymous region, like memcached or any key/value storage. The bpf program implements an in-kernel accelerator. XDP prog can search for a key in bpf_arena and return a value without going to user space. 2. The bpf program builds arbitrary data structures in bpf_arena (hash tables, rb-trees, sparse arrays), while user space consumes it. 3. bpf_arena is a "heap" of memory from the bpf program's point of view. The user space may mmap it, but bpf program will not convert pointers to user base at run-time to improve bpf program speed. Initially, the kernel vm_area and user vma are not populated. User space can fault in pages within the range. While servicing a page fault, bpf_arena logic will insert a new page into the kernel and user vmas. The bpf program can allocate pages from that region via bpf_arena_alloc_pages(). This kernel function will insert pages into the kernel vm_area. The subsequent fault-in from user space will populate that page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time can be used to prevent fault-in from user space. In such a case, if a page is not allocated by the bpf program and not present in the kernel vm_area, the user process will segfault. This is useful for use cases 2 and 3 above. bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages either at a specific address within the arena or allocates a range with the maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees pages and removes the range from the kernel vm_area and from user process vmas. bpf_arena can be used as a bpf program "heap" of up to 4GB. The speed of bpf program is more important than ease of sharing with user space. This is use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended. It will tell the verifier to treat the rX = bpf_arena_cast_user(rY) instruction as a 32-bit move wX = wY, which will improve bpf prog performance. Otherwise, bpf_arena_cast_user is translated by JIT to conditionally add the upper 32 bits of user vm_start (if the pointer is not NULL) to arena pointers before they are stored into memory. This way, user space sees them as valid 64-bit pointers. Diff https://github.com/llvm/llvm-project/pull/84410 enables LLVM BPF backend generate the bpf_addr_space_cast() instruction to cast pointers between address_space(1) which is reserved for bpf_arena pointers and default address space zero. All arena pointers in a bpf program written in C language are tagged as __attribute__((address_space(1))). Hence, clang provides helpful diagnostics when pointers cross address space. Libbpf and the kernel support only address_space == 1. All other address space identifiers are reserved. rX = bpf_addr_space_cast(rY, /* dst_as */ 1, /* src_as */ 0) tells the verifier that rX->type = PTR_TO_ARENA. Any further operations on PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar to copy_from_kernel_nofault() except that no address checks are necessary. The address is guaranteed to be in the 4GB range. If the page is not present, the destination register is zeroed on read, and the operation is ignored on write. rX = bpf_addr_space_cast(rY, 0, 1) tells the verifier that rX->type = unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the verifier converts such cast instructions to mov32. Otherwise, JIT will emit native code equivalent to: rX = (u32)rY; if (rY) rX |= clear_lo32_bits(arena->user_vm_start); /* replace hi32 bits in rX */ After such conversion, the pointer becomes a valid user pointer within bpf_arena range. The user process can access data structures created in bpf_arena without any additional computations. For example, a linked list built by a bpf program can be walked natively by user space. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Barret Rhoden <brho@google.com> Link: https://lore.kernel.org/bpf/20240308010812.89848-2-alexei.starovoitov@gmail.com |
||
|
|
b2d23158e6 |
objtool/LoongArch: Implement instruction decoder
Only copy the minimal definitions of instruction opcodes and formats in inst.h from arch/loongarch to tools/arch/loongarch, and also copy the definition of sign_extend64() to tools/include/linux/bitops.h to decode the following kinds of instructions: (1) stack pointer related instructions addi.d, ld.d, st.d, ldptr.d and stptr.d (2) branch and jump related instructions beq, bne, blt, bge, bltu, bgeu, beqz, bnez, bceqz, bcnez, b, bl and jirl (3) other instructions break, nop and ertn See more info about instructions in LoongArch Reference Manual: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html Co-developed-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Jinyang He <hejinyang@loongson.cn> Co-developed-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
|
|
92f8b1f5ca |
netdev: add queue stat for alloc failures
Rx alloc failures are commonly counted by drivers. Support reporting those via netdev-genl queue stats. Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20240306195509.1502746-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
ab63a2387c |
netdev: add per-queue statistics
The ethtool-nl family does a good job exposing various protocol related and IEEE/IETF statistics which used to get dumped under ethtool -S, with creative names. Queue stats don't have a netlink API, yet, and remain a lion's share of ethtool -S output for new drivers. Not only is that bad because the names differ driver to driver but it's also bug-prone. Intuitively drivers try to report only the stats for active queues, but querying ethtool stats involves multiple system calls, and the number of stats is read separately from the stats themselves. Worse still when user space asks for values of the stats, it doesn't inform the kernel how big the buffer is. If number of stats increases in the meantime kernel will overflow user buffer. Add a netlink API for dumping queue stats. Queue information is exposed via the netdev-genl family, so add the stats there. Support per-queue and sum-for-device dumps. Latter will be useful when subsequent patches add more interesting common stats than just bytes and packets. The API does not currently distinguish between HW and SW stats. The expectation is that the source of the stats will either not matter much (good packets) or be obvious (skb alloc errors). Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
011832b97b |
bpf: Introduce may_goto instruction
Introduce may_goto instruction that from the verifier pov is similar to
open coded iterators bpf_for()/bpf_repeat() and bpf_loop() helper, but it
doesn't iterate any objects.
In assembly 'may_goto' is a nop most of the time until bpf runtime has to
terminate the program for whatever reason. In the current implementation
may_goto has a hidden counter, but other mechanisms can be used.
For programs written in C the later patch introduces 'cond_break' macro
that combines 'may_goto' with 'break' statement and has similar semantics:
cond_break is a nop until bpf runtime has to break out of this loop.
It can be used in any normal "for" or "while" loop, like
for (i = zero; i < cnt; cond_break, i++) {
The verifier recognizes that may_goto is used in the program, reserves
additional 8 bytes of stack, initializes them in subprog prologue, and
replaces may_goto instruction with:
aux_reg = *(u64 *)(fp - 40)
if aux_reg == 0 goto pc+off
aux_reg -= 1
*(u64 *)(fp - 40) = aux_reg
may_goto instruction can be used by LLVM to implement __builtin_memcpy,
__builtin_strcmp.
may_goto is not a full substitute for bpf_for() macro.
bpf_for() doesn't have induction variable that verifiers sees,
so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded.
But when the code is written as:
for (i = 0; i < 100; cond_break, i++)
the verifier see 'i' as precise constant zero,
hence cond_break (aka may_goto) doesn't help to converge the loop.
A static or global variable can be used as a workaround:
static int zero = 0;
for (i = zero; i < 100; cond_break, i++) // works!
may_goto works well with arena pointers that don't need to be bounds
checked on access. Load/store from arena returns imprecise unbounded
scalar and loops with may_goto pass the verifier.
Reserve new opcode BPF_JMP | BPF_JCOND for may_goto insn.
JCOND stands for conditional pseudo jump.
Since goto_or_nop insn was proposed, it may use the same opcode.
may_goto vs goto_or_nop can be distinguished by src_reg:
code = BPF_JMP | BPF_JCOND
src_reg = 0 - may_goto
src_reg = 1 - goto_or_nop
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240306031929.42666-2-alexei.starovoitov@gmail.com
|
||
|
|
fae1b01293 |
slab: remove PARTIAL_NODE slab_state
The PARTIAL_NODE slab_state has gone with SLAB removed, so just remove it. Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> |
||
|
|
4b2765ae41 |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZeEKVAAKCRDbK58LschI g7oYAQD5Jlv4fIVTvxvfZrTTZ2tU+OsPa75mc8SDKwpash3YygEA8kvESy8+t6pg D6QmSf1DIZdFoSp/bV+pfkNWMeR8gwg= =mTAj -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-02-29 We've added 119 non-merge commits during the last 32 day(s) which contain a total of 150 files changed, 3589 insertions(+), 995 deletions(-). The main changes are: 1) Extend the BPF verifier to enable static subprog calls in spin lock critical sections, from Kumar Kartikeya Dwivedi. 2) Fix confusing and incorrect inference of PTR_TO_CTX argument type in BPF global subprogs, from Andrii Nakryiko. 3) Larger batch of riscv BPF JIT improvements and enabling inlining of the bpf_kptr_xchg() for RV64, from Pu Lehui. 4) Allow skeleton users to change the values of the fields in struct_ops maps at runtime, from Kui-Feng Lee. 5) Extend the verifier's capabilities of tracking scalars when they are spilled to stack, especially when the spill or fill is narrowing, from Maxim Mikityanskiy & Eduard Zingerman. 6) Various BPF selftest improvements to fix errors under gcc BPF backend, from Jose E. Marchesi. 7) Avoid module loading failure when the module trying to register a struct_ops has its BTF section stripped, from Geliang Tang. 8) Annotate all kfuncs in .BTF_ids section which eventually allows for automatic kfunc prototype generation from bpftool, from Daniel Xu. 9) Several updates to the instruction-set.rst IETF standardization document, from Dave Thaler. 10) Shrink the size of struct bpf_map resp. bpf_array, from Alexei Starovoitov. 11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer, from Benjamin Tissoires. 12) Fix bpftool to be more portable to musl libc by using POSIX's basename(), from Arnaldo Carvalho de Melo. 13) Add libbpf support to gcc in CORE macro definitions, from Cupertino Miranda. 14) Remove a duplicate type check in perf_event_bpf_event, from Florian Lehner. 15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them with notrace correctly, from Yonghong Song. 16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible array to fix build warnings, from Kees Cook. 17) Fix resolve_btfids cross-compilation to non host-native endianness, from Viktor Malik. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits) selftests/bpf: Test if shadow types work correctly. bpftool: Add an example for struct_ops map and shadow type. bpftool: Generated shadow variables for struct_ops maps. libbpf: Convert st_ops->data to shadow type. libbpf: Set btf_value_type_id of struct bpf_map for struct_ops. bpf: Replace bpf_lpm_trie_key 0-length array with flexible array bpf, arm64: use bpf_prog_pack for memory management arm64: patching: implement text_poke API bpf, arm64: support exceptions arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT bpf: add is_async_callback_calling_insn() helper bpf: introduce in_sleepable() helper bpf: allow more maps in sleepable bpf programs selftests/bpf: Test case for lacking CFI stub functions. bpf: Check cfi_stubs before registering a struct_ops type. bpf: Clarify batch lookup/lookup_and_delete semantics bpf, docs: specify which BPF_ABS and BPF_IND fields were zero bpf, docs: Fix typos in instruction-set.rst selftests/bpf: update tcp_custom_syncookie to use scalar packet offset bpf: Shrink size of struct bpf_map/bpf_array. ... ==================== Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
896880ff30 |
bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
Replace deprecated 0-length array in struct bpf_lpm_trie_key with
flexible array. Found with GCC 13:
../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=]
207 | *(__be16 *)&key->data[i]);
| ^~~~~~~~~~~~~
../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16'
102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
| ^
../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu'
97 | #define be16_to_cpu __be16_to_cpu
| ^~~~~~~~~~~~~
../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu'
206 | u16 diff = be16_to_cpu(*(__be16 *)&node->data[i]
^
| ^~~~~~~~~~~
In file included from ../include/linux/bpf.h:7:
../include/uapi/linux/bpf.h:82:17: note: while referencing 'data'
82 | __u8 data[0]; /* Arbitrary size */
| ^~~~
And found at run-time under CONFIG_FORTIFY_SOURCE:
UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49
index 0 is out of range for type '__u8 [*]'
Changing struct bpf_lpm_trie_key is difficult since has been used by
userspace. For example, in Cilium:
struct egress_gw_policy_key {
struct bpf_lpm_trie_key lpm_key;
__u32 saddr;
__u32 daddr;
};
While direct references to the "data" member haven't been found, there
are static initializers what include the final member. For example,
the "{}" here:
struct egress_gw_policy_key in_key = {
.lpm_key = { 32 + 24, {} },
.saddr = CLIENT_IP,
.daddr = EXTERNAL_SVC_IP & 0Xffffff,
};
To avoid the build time and run time warnings seen with a 0-sized
trailing array for struct bpf_lpm_trie_key, introduce a new struct
that correctly uses a flexible array for the trailing bytes,
struct bpf_lpm_trie_key_u8. As part of this, include the "header"
portion (which is just the "prefixlen" member), so it can be used
by anything building a bpf_lpr_trie_key that has trailing members that
aren't a u8 flexible array (like the self-test[1]), which is named
struct bpf_lpm_trie_key_hdr.
Unfortunately, C++ refuses to parse the __struct_group() helper, so
it is not possible to define struct bpf_lpm_trie_key_hdr directly in
struct bpf_lpm_trie_key_u8, so we must open-code the union directly.
Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out,
and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment
to the UAPI header directing folks to the two new options.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Closes: https://paste.debian.net/hidden/ca500597/
Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1]
Link: https://lore.kernel.org/bpf/20240222155612.it.533-kees@kernel.org
|
||
|
|
58fd62e0aa |
bpf: Clarify batch lookup/lookup_and_delete semantics
The batch lookup and lookup_and_delete APIs have two parameters, in_batch and out_batch, to facilitate iterative lookup/lookup_and_deletion operations for supported maps. Except NULL for in_batch at the start of these two batch operations, both parameters need to point to memory equal or larger than the respective map key size, except for various hashmaps (hash, percpu_hash, lru_hash, lru_percpu_hash) where the in_batch/out_batch memory size should be at least 4 bytes. Document these semantics to clarify the API. Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240221211838.1241578-1-martin.kelly@crowdstrike.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
|
|
73be9a3aab |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/dev.c |
||
|
|
4356e9f841 |
work around gcc bugs with 'asm goto' with outputs
We've had issues with gcc and 'asm goto' before, and we created a 'asm_volatile_goto()' macro for that in the past: see commits |
||
|
|
3be042cf46 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/stmicro/stmmac/common.h |
||
|
|
9707ac4fe2 |
tools/resolve_btfids: Refactor set sorting with types from btf_ids.h
Instead of using magic offsets to access BTF ID set data, leverage types from btf_ids.h (btf_id_set and btf_id_set8) which define the actual layout of the data. Thanks to this change, set sorting should also continue working if the layout changes. This requires to sync the definition of 'struct btf_id_set8' from include/linux/btf_ids.h to tools/include/linux/btf_ids.h. We don't sync the rest of the file at the moment, b/c that would require to also sync multiple dependent headers and we don't need any other defs from btf_ids.h. Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/bpf/ff7f062ddf6a00815fda3087957c4ce667f50532.1707223196.git.vmalik@redhat.com |
||
|
|
240fd40552 |
bonding: Add independent control state machine
Add support for the independent control state machine per IEEE 802.1AX-2008 5.4.15 in addition to the existing implementation of the coupled control state machine. Introduces two new states, AD_MUX_COLLECTING and AD_MUX_DISTRIBUTING in the LACP MUX state machine for separated handling of an initial Collecting state before the Collecting and Distributing state. This enables a port to be in a state where it can receive incoming packets while not still distributing. This is useful for reducing packet loss when a port begins distributing before its partner is able to collect. Added new functions such as bond_set_slave_tx_disabled_flags and bond_set_slave_rx_enabled_flags to precisely manage the port's collecting and distributing states. Previously, there was no dedicated method to disable TX while keeping RX enabled, which this patch addresses. Note that the regular flow process in the kernel's bonding driver remains unaffected by this patch. The extension requires explicit opt-in by the user (in order to ensure no disruptions for existing setups) via netlink support using the new bonding parameter coupled_control. The default value for coupled_control is set to 1 so as to preserve existing behaviour. Signed-off-by: Aahil Awatramani <aahila@google.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240202175858.1573852-1-aahila@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
fdd0ae72b3 |
perf tools headers: update the asm-generic/unaligned.h copy with the kernel sources
To pick up the changes in:
|
||
|
|
1f8c43b09e |
tools include UAPI: Sync linux/mount.h copy with the kernel sources
To pick the changes from: |
||
|
|
21fdd8dd37 |
tools headers UAPI: Sync unistd.h to pick {list,stat}mount, lsm_{[gs]et_self_attr,list_modules} syscall numbers
To pick the changes in these csets: |
||
|
|
e30dca91e5 |
tools headers UAPI: Sync kvm headers with the kernel sources
To pick the changes in: |
||
|
|
92046e83c0 |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZbQV+gAKCRDbK58LschI g2OeAP0VvhZS9SPiS+/AMAFuw2W1BkMrFNbfBTc3nzRnyJSmNAD+NG4CLLJvsKI9 olu7VC20B8pLTGLUGIUSwqnjOC+Kkgc= =wVMl -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-01-26 We've added 107 non-merge commits during the last 4 day(s) which contain a total of 101 files changed, 6009 insertions(+), 1260 deletions(-). The main changes are: 1) Add BPF token support to delegate a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. With addressed changes from Christian and Linus' reviews, from Andrii Nakryiko. 2) Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type, from Kui-Feng Lee. 3) Add support for retrieval of cookies for perf/kprobe multi links, from Jiri Olsa. 4) Bigger batch of prep-work for the BPF verifier to eventually support preserving boundaries and tracking scalars on narrowing fills, from Maxim Mikityanskiy. 5) Extend the tc BPF flavor to support arbitrary TCP SYN cookies to help with the scenario of SYN floods, from Kuniyuki Iwashima. 6) Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects, from Hou Tao. 7) Extend BPF verifier to track aligned ST stores as imprecise spilled registers, from Yonghong Song. 8) Several fixes to BPF selftests around inline asm constraints and unsupported VLA code generation, from Jose E. Marchesi. 9) Various updates to the BPF IETF instruction set draft document such as the introduction of conformance groups for instructions, from Dave Thaler. 10) Fix BPF verifier to make infinite loop detection in is_state_visited() exact to catch some too lax spill/fill corner cases, from Eduard Zingerman. 11) Refactor the BPF verifier pointer ALU check to allow ALU explicitly instead of implicitly for various register types, from Hao Sun. 12) Fix the flaky tc_redirect_dtime BPF selftest due to slowness in neighbor advertisement at setup time, from Martin KaFai Lau. 13) Change BPF selftests to skip callback tests for the case when the JIT is disabled, from Tiezhu Yang. 14) Add a small extension to libbpf which allows to auto create a map-in-map's inner map, from Andrey Grafin. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (107 commits) selftests/bpf: Add missing line break in test_verifier bpf, docs: Clarify definitions of various instructions bpf: Fix error checks against bpf_get_btf_vmlinux(). bpf: One more maintainer for libbpf and BPF selftests selftests/bpf: Incorporate LSM policy to token-based tests selftests/bpf: Add tests for LIBBPF_BPF_TOKEN_PATH envvar libbpf: Support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar selftests/bpf: Add tests for BPF object load with implicit token selftests/bpf: Add BPF object loading tests with explicit token passing libbpf: Wire up BPF token support at BPF object level libbpf: Wire up token_fd into feature probing logic libbpf: Move feature detection code into its own file libbpf: Further decouple feature checking logic from bpf_object libbpf: Split feature detectors definitions from cached results selftests/bpf: Utilize string values for delegate_xxx mount options bpf: Support symbolic BPF FS delegation mount options bpf: Fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS bpf,selinux: Allocate bpf_security_struct per BPF token selftests/bpf: Add BPF token-enabled tests libbpf: Add BPF token support to bpf_prog_load() API ... ==================== Link: https://lore.kernel.org/r/20240126215710.19855-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
1233d1d54b |
tools headers UAPI: Update tools's copy of drm.h headers to pick DRM_IOCTL_MODE_CLOSEFB
Picking the changes from: |
||
|
|
b0dc992155 |
tools headers UAPI: Sync linux/fcntl.h with the kernel sources
To get the changes in:
|
||
|
|
690811f012 |
tools headers uapi: Sync linux/stat.h with the kernel sources to pick STATX_MNT_ID_UNIQUE
To pick the changes from:
|
||
|
|
caf8f28e03 |
bpf: Add BPF token support to BPF_PROG_LOAD command
Add basic support of BPF token to BPF_PROG_LOAD. BPF_F_TOKEN_FD flag should be set in prog_flags field when providing prog_token_fd. Wire through a set of allowed BPF program types and attach types, derived from BPF FS at BPF token creation time. Then make sure we perform bpf_token_capable() checks everywhere where it's relevant. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-7-andrii@kernel.org |
||
|
|
9ea7c4bf17 |
bpf: Add BPF token support to BPF_BTF_LOAD command
Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading through delegated BPF token. BPF_F_TOKEN_FD flag has to be specified when passing BPF token FD. Given BPF_BTF_LOAD command didn't have flags field before, we also add btf_flags field. BTF loading is a pretty straightforward operation, so as long as BPF token is created with allow_cmds granting BPF_BTF_LOAD command, kernel proceeds to parsing BTF data and creating BTF object. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-6-andrii@kernel.org |
||
|
|
a177fc2bf6 |
bpf: Add BPF token support to BPF_MAP_CREATE command
Allow providing token_fd for BPF_MAP_CREATE command to allow controlled BPF map creation from unprivileged process through delegated BPF token. New BPF_F_TOKEN_FD flag is added to specify together with BPF token FD for BPF_MAP_CREATE command. Wire through a set of allowed BPF map types to BPF token, derived from BPF FS at BPF token creation time. This, in combination with allowed_cmds allows to create a narrowly-focused BPF token (controlled by privileged agent) with a restrictive set of BPF maps that application can attempt to create. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-5-andrii@kernel.org |
||
|
|
35f96de041 |
bpf: Introduce BPF token object
Add new kind of BPF kernel object, BPF token. BPF token is meant to
allow delegating privileged BPF functionality, like loading a BPF
program or creating a BPF map, from privileged process to a *trusted*
unprivileged process, all while having a good amount of control over which
privileged operations could be performed using provided BPF token.
This is achieved through mounting BPF FS instance with extra delegation
mount options, which determine what operations are delegatable, and also
constraining it to the owning user namespace (as mentioned in the
previous patch).
BPF token itself is just a derivative from BPF FS and can be created
through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF
FS FD, which can be attained through open() API by opening BPF FS mount
point. Currently, BPF token "inherits" delegated command, map types,
prog type, and attach type bit sets from BPF FS as is. In the future,
having an BPF token as a separate object with its own FD, we can allow
to further restrict BPF token's allowable set of things either at the
creation time or after the fact, allowing the process to guard itself
further from unintentionally trying to load undesired kind of BPF
programs. But for now we keep things simple and just copy bit sets as is.
When BPF token is created from BPF FS mount, we take reference to the
BPF super block's owning user namespace, and then use that namespace for
checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
capabilities that are normally only checked against init userns (using
capable()), but now we check them using ns_capable() instead (if BPF
token is provided). See bpf_token_capable() for details.
Such setup means that BPF token in itself is not sufficient to grant BPF
functionality. User namespaced process has to *also* have necessary
combination of capabilities inside that user namespace. So while
previously CAP_BPF was useless when granted within user namespace, now
it gains a meaning and allows container managers and sys admins to have
a flexible control over which processes can and need to use BPF
functionality within the user namespace (i.e., container in practice).
And BPF FS delegation mount options and derived BPF tokens serve as
a per-container "flag" to grant overall ability to use bpf() (plus further
restrict on which parts of bpf() syscalls are treated as namespaced).
Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
within the BPF FS owning user namespace, rounding up the ns_capable()
story of BPF token. Also creating BPF token in init user namespace is
currently not supported, given BPF token doesn't have any effect in init
user namespace anyways.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-4-andrii@kernel.org
|
||
|
|
fcc2c1fb06 |
bpf: pass attached BTF to the bpf_struct_ops subsystem
Pass the fd of a btf from the userspace to the bpf() syscall, and then convert the fd into a btf. The btf is generated from the module that defines the target BPF struct_ops type. In order to inform the kernel about the module that defines the target struct_ops type, the userspace program needs to provide a btf fd for the respective module's btf. This btf contains essential information on the types defined within the module, including the target struct_ops type. A btf fd must be provided to the kernel for struct_ops maps and for the bpf programs attached to those maps. In the case of the bpf programs, the attach_btf_obj_fd parameter is passed as part of the bpf_attr and is converted into a btf. This btf is then stored in the prog->aux->attach_btf field. Here, it just let the verifier access attach_btf directly. In the case of struct_ops maps, a btf fd is passed as value_type_btf_obj_fd of bpf_attr. The bpf_struct_ops_map_alloc() function converts the fd to a btf and stores it as st_map->btf. A flag BPF_F_VTYPE_BTF_OBJ_FD is added for map_flags to indicate that the value of value_type_btf_obj_fd is set. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-9-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
|
|
1338b93346 |
bpf: pass btf object id in bpf_map_info.
Include btf object id (btf_obj_id) in bpf_map_info so that tools (ex: bpftools struct_ops dump) know the correct btf from the kernel to look up type information of struct_ops types. Since struct_ops types can be defined and registered in a module. The type information of a struct_ops type are defined in the btf of the module defining it. The userspace tools need to know which btf is for the module defining a struct_ops type. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-7-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
|
|
9fd112b1f8 |
bpf: Store cookies in kprobe_multi bpf_link_info data
Storing cookies in kprobe_multi bpf_link_info data. The cookies field is optional and if provided it needs to be an array of __u64 with kprobe_multi.count length. Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240119110505.400573-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
d5c16492c6 |
bpf: Add cookie to perf_event bpf_link_info records
At the moment we don't store cookie for perf_event probes, while we do that for the rest of the probes. Adding cookie fields to struct bpf_link_info perf event probe records: perf_event.uprobe perf_event.kprobe perf_event.tracepoint perf_event.perf_event And the code to store that in bpf_link_info struct. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20240119110505.400573-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
091f2bf60d |
bpf: Sync uapi bpf.h header for the tooling infra
Both commit
|
||
|
|
9d64bf433c |
perf tools improvements and fixes for v6.8:
- Add Namhyung Kim as tools/perf/ co-maintainer, we're taking turns processing
patches, switching roles from perf-tools to perf-tools-next at each Linux
release.
Data profiling:
- Associate samples that identify loads and stores with data structures. This
uses events available on Intel, AMD and others and DWARF info:
# To get memory access samples in kernel for 1 second (on Intel)
$ perf mem record -a -K --ldlat=4 -- sleep 1
# Similar for the AMD (but it requires 6.3+ kernel for BPF filters)
$ perf mem record -a --filter 'mem_op == load || mem_op == store, ip > 0x8000000000000000' -- sleep 1
Then, amongst several modes of post processing, one can do things like:
$ perf report -s type,typeoff --hierarchy --group --stdio
...
#
# Samples: 10K of events 'cpu/mem-loads,ldlat=4/P, cpu/mem-stores/P, dummy:u'
# Event count (approx.): 602758064
#
# Overhead Data Type / Data Type Offset
# ........................... ............................
#
26.09% 3.28% 0.00% long unsigned int
26.09% 3.28% 0.00% long unsigned int +0 (no field)
18.48% 0.73% 0.00% struct page
10.83% 0.02% 0.00% struct page +8 (lru.next)
3.90% 0.28% 0.00% struct page +0 (flags)
3.45% 0.06% 0.00% struct page +24 (mapping)
0.25% 0.28% 0.00% struct page +48 (_mapcount.counter)
0.02% 0.06% 0.00% struct page +32 (index)
0.02% 0.00% 0.00% struct page +52 (_refcount.counter)
0.02% 0.01% 0.00% struct page +56 (memcg_data)
0.00% 0.01% 0.00% struct page +16 (lru.prev)
15.37% 17.54% 0.00% (stack operation)
15.37% 17.54% 0.00% (stack operation) +0 (no field)
11.71% 50.27% 0.00% (unknown)
11.71% 50.27% 0.00% (unknown) +0 (no field)
$ perf annotate --data-type
...
Annotate type: 'struct cfs_rq' in [kernel.kallsyms] (13 samples):
============================================================================
samples offset size field
13 0 640 struct cfs_rq {
2 0 16 struct load_weight load {
2 0 8 unsigned long weight;
0 8 4 u32 inv_weight;
};
0 16 8 unsigned long runnable_weight;
0 24 4 unsigned int nr_running;
1 28 4 unsigned int h_nr_running;
...
$ perf annotate --data-type=page --group
Annotate type: 'struct page' in [kernel.kallsyms] (480 samples):
event[0] = cpu/mem-loads,ldlat=4/P
event[1] = cpu/mem-stores/P
event[2] = dummy:u
===================================================================================
samples offset size field
447 33 0 0 64 struct page {
108 8 0 0 8 long unsigned int flags;
319 13 0 8 40 union {
319 13 0 8 40 struct {
236 2 0 8 16 union {
236 2 0 8 16 struct list_head lru {
236 1 0 8 8 struct list_head* next;
0 1 0 16 8 struct list_head* prev;
};
236 2 0 8 16 struct {
236 1 0 8 8 void* __filler;
0 1 0 16 4 unsigned int mlock_count;
};
236 2 0 8 16 struct list_head buddy_list {
236 1 0 8 8 struct list_head* next;
0 1 0 16 8 struct list_head* prev;
};
236 2 0 8 16 struct list_head pcp_list {
236 1 0 8 8 struct list_head* next;
0 1 0 16 8 struct list_head* prev;
};
};
82 4 0 24 8 struct address_space* mapping;
1 7 0 32 8 union {
1 7 0 32 8 long unsigned int index;
1 7 0 32 8 long unsigned int share;
};
0 0 0 40 8 long unsigned int private;
};
This uses the existing annotate code, calling objdump to do the disassembly,
with improvements to avoid having this take too long, but longer term a
switch to a disassembler library, possibly reusing code in the kernel will
be pursued.
This is the initial implementation, please use it and report impressions and
bugs. Make sure the kernel-debuginfo packages match the running kernel. The
'perf report' phase for non short perf.data files may take a while.
There is a great article about it on LWN:
https://lwn.net/Articles/955709/ - "Data-type profiling for perf"
One last test I did while writing this text, on a AMD Ryzen 5950X, using a distro
kernel, while doing a simple 'find /' on an otherwise idle system resulted in:
# uname -r
6.6.9-100.fc38.x86_64
# perf -vv | grep BPF_
bpf: [ on ] # HAVE_LIBBPF_SUPPORT
bpf_skeletons: [ on ] # HAVE_BPF_SKEL
# rpm -qa | grep kernel-debuginfo
kernel-debuginfo-common-x86_64-6.6.9-100.fc38.x86_64
kernel-debuginfo-6.6.9-100.fc38.x86_64
#
# perf mem record -a --filter 'mem_op == load || mem_op == store, ip > 0x8000000000000000'
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.199 MB perf.data (2913 samples) ]
#
# ls -la perf.data
-rw-------. 1 root root 2346486 Jan 9 18:36 perf.data
# perf evlist
ibs_op//
dummy:u
# perf evlist -v
ibs_op//: type: 11, size: 136, config: 0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1
dummy:u: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT, read_format: ID, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
#
# perf report -s type,typeoff --hierarchy --group --stdio
# Total Lost Samples: 0
#
# Samples: 2K of events 'ibs_op//, dummy:u'
# Event count (approx.): 1904553038
#
# Overhead Data Type / Data Type Offset
# ................... ............................
#
73.70% 0.00% (unknown)
73.70% 0.00% (unknown) +0 (no field)
3.01% 0.00% long unsigned int
3.00% 0.00% long unsigned int +0 (no field)
0.01% 0.00% long unsigned int +2 (no field)
2.73% 0.00% struct task_struct
1.71% 0.00% struct task_struct +52 (on_cpu)
0.38% 0.00% struct task_struct +2104 (rcu_read_unlock_special.b.blocked)
0.23% 0.00% struct task_struct +2100 (rcu_read_lock_nesting)
0.14% 0.00% struct task_struct +2384 ()
0.06% 0.00% struct task_struct +3096 (signal)
0.05% 0.00% struct task_struct +3616 (cgroups)
0.05% 0.00% struct task_struct +2344 (active_mm)
0.02% 0.00% struct task_struct +46 (flags)
0.02% 0.00% struct task_struct +2096 (migration_disabled)
0.01% 0.00% struct task_struct +24 (__state)
0.01% 0.00% struct task_struct +3956 (mm_cid_active)
0.01% 0.00% struct task_struct +1048 (cpus_ptr)
0.01% 0.00% struct task_struct +184 (se.group_node.next)
0.01% 0.00% struct task_struct +20 (thread_info.cpu)
0.00% 0.00% struct task_struct +104 (on_rq)
0.00% 0.00% struct task_struct +2456 (pid)
1.36% 0.00% struct module
0.59% 0.00% struct module +952 (kallsyms)
0.42% 0.00% struct module +0 (state)
0.23% 0.00% struct module +8 (list.next)
0.12% 0.00% struct module +216 (syms)
0.95% 0.00% struct inode
0.41% 0.00% struct inode +40 (i_sb)
0.22% 0.00% struct inode +0 (i_mode)
0.06% 0.00% struct inode +76 (i_rdev)
0.06% 0.00% struct inode +56 (i_security)
<SNIP>
perf top/report:
- Don't ignore job control, allowing control+Z + bg to work.
- Add s390 raw data interpretation for PAI (Processor Activity Instrumentation)
counters.
perf archive:
- Add new option '--all' to pack perf.data with DSOs.
- Add new option '--unpack' to expand tarballs.
Initialization speedups:
- Lazily initialize zstd streams to save memory when not using it.
- Lazily allocate/size mmap event copy.
- Lazy load kernel symbols in 'perf record'.
- Be lazier in allocating lost samples buffer in 'perf record'.
- Don't synthesize BPF events when disabled via the command line (perf record --no-bpf-event).
Assorted improvements:
- Show note on AMD systems that the :p, :pp, :ppp and :P are all the same, as
IBS (Instruction Based Sampling) is used and it is inherentely precise, not
having levels of precision like in Intel systems.
- When 'cycles' isn't available, fall back to the "task-clock" event when not
system wide, not to 'cpu-clock'.
- Add --debug-file option to redirect debug output, e.g.:
$ perf --debug-file /tmp/perf.log record -v true
- Shrink 'struct map' to under one cacheline by avoiding function pointers for
selecting if addresses are identity or DSO relative, and using just a byte for
some boolean struct members.
- Resolve the arch specific strerrno just once to use in perf_env__arch_strerrno().
- Reduce memory for recording PERF_RECORD_LOST_SAMPLES event.
Assorted fixes:
- Fix the default 'perf top' usage on Intel hybrid systems, now it starts with
a browser showing the number of samples for Efficiency (cpu_atom/cycles/P) and
Performance (cpu_core/cycles/P). This behaviour is similar on ARM64, with its
respective set of big.LITTLE processors.
- Fix segfault on build_mem_topology() error path.
- Fix 'perf mem' error on hybrid related to availability of mem event in a PMU.
- Fix missing reference count gets (map, maps) in the db-export code.
- Avoid recursively taking env->bpf_progs.lock in the 'perf_env' code.
- Use the newly introduced maps__for_each_map() to add missing locking around
iteration of 'struct map' entries.
- Parse NOTE segments until the build id is found, don't stop on the first one,
ELF files may have several such NOTE segments.
- Remove 'egrep' usage, its deprecated, use 'grep -E' instead.
- Warn first about missing libelf, not libbpf, that depends on libelf.
- Use alternative to 'find ... -printf' as this isn't supported in busybox.
- Address python 3.6 DeprecationWarning for string scapes.
- Fix memory leak in uniq() in libsubcmd.
- Fix man page formatting for 'perf lock'
- Fix some spelling mistakes.
perf tests:
- Fail shell tests that needs some symbol in perf itself if it is stripped.
These tests check if a symbol is resolved, if some hot function is indeed
detected by profiling, etc.
- The 'perf test sigtrap' test is currently failing on PREEMPT_RT, skip it if
sleeping spinlocks are detected (using BTF) and point to the mailing list
discussion about it. This test is also being skipped on several architectures
(powerpc, s390x, arm and aarch64) due to other pending issues with intruction
breakpoints.
- Adjust test case perf record offcpu profiling tests for s390.
- Fix 'Setup struct perf_event_attr' fails on s390 on z/VM guest, addressing
issues caused by the fallback from cycles to task-clock done in this release.
- Fix mask for VG register in the user-regs test.
- Use shellcheck on 'perf test' shell scripts automatically to make sure changes
don't introduce things it flags as problematic.
- Add option to change objdump binary and allow it to be set via 'perf config'.
- Add basic 'perf script', 'perf list --json" and 'perf diff' tests.
- Basic branch counter support.
- Make DSO tests a suite rather than individual.
- Remove atomics from test_loop to avoid test failures.
- Fix call chain match on powerpc for the record+probe_libc_inet_pton test.
- Improve Intel hybrid tests.
Vendor event files (JSON):
powerpc:
- Update datasource event name to fix duplicate events on IBM's Power10.
- Add PVN for HX-C2000 CPU with Power8 Architecture.
Intel:
- Alderlake/rocketlake metric fixes.
- Update emeraldrapids events to v1.02.
- Update icelakex events to v1.23.
- Update sapphirerapids events to v1.17.
- Add skx, clx, icx and spr upi bandwidth metric.
AMD:
- Add Zen 4 memory controller events.
RISC-V:
- Add StarFive Dubhe-80 and Dubhe-90 JSON files.
https://www.starfivetech.com/en/site/cpu-u
- Add T-HEAD C9xx JSON file.
https://github.com/riscv-software-src/opensbi/blob/master/docs/platform/thead-c9xx.md
ARM64:
- Remove UTF-8 characters from cmn.json, that were causing build failure in some
distros.
- Add core PMU events and metrics for Ampere One X.
- Rename Ampere One's BPU_FLUSH_MEM_FAULT to GPC_FLUSH_MEM_FAULT
libperf:
- Rename several perf_cpu_map constructor names to clarify what they really do.
- Ditto for some other methods, coping with some issues in their semantics,
like perf_cpu_map__empty() -> perf_cpu_map__has_any_cpu_or_is_empty().
- Document perf_cpu_map__nr()'s behavior
perf stat:
- Exit if parse groups fails.
- Combine the -A/--no-aggr and --no-merge options.
- Fix help message for --metric-no-threshold option.
Hardware tracing:
ARM64 CoreSight:
- Bump minimum OpenCSD version to ensure a bugfix is present.
- Add 'T' itrace option for timestamp trace
- Set start vm addr of exectable file to 0 and don't ignore first sample on the
arm-cs-trace-disasm.py 'perf script'.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZZ3FpgAKCRCyPKLppCJ+
Jz21AQDB93J4X05bwHJlRloN3KuA3LuwzvAQkwFoJSfFFMDnzgEAgbAMF1sANirP
5UcGxVgqoXWdrp9pkMcGlcFc7jsz5gA=
=SM26
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.8-1-2024-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
"Add Namhyung Kim as tools/perf/ co-maintainer, we're taking turns
processing patches, switching roles from perf-tools to perf-tools-next
at each Linux release.
Data profiling:
- Associate samples that identify loads and stores with data
structures. This uses events available on Intel, AMD and others and
DWARF info:
# To get memory access samples in kernel for 1 second (on Intel)
$ perf mem record -a -K --ldlat=4 -- sleep 1
# Similar for the AMD (but it requires 6.3+ kernel for BPF filters)
$ perf mem record -a --filter 'mem_op == load || mem_op == store, ip > 0x8000000000000000' -- sleep 1
Then, amongst several modes of post processing, one can do things like:
$ perf report -s type,typeoff --hierarchy --group --stdio
...
#
# Samples: 10K of events 'cpu/mem-loads,ldlat=4/P, cpu/mem-stores/P, dummy:u'
# Event count (approx.): 602758064
#
# Overhead Data Type / Data Type Offset
# ........................... ............................
#
26.09% 3.28% 0.00% long unsigned int
26.09% 3.28% 0.00% long unsigned int +0 (no field)
18.48% 0.73% 0.00% struct page
10.83% 0.02% 0.00% struct page +8 (lru.next)
3.90% 0.28% 0.00% struct page +0 (flags)
3.45% 0.06% 0.00% struct page +24 (mapping)
0.25% 0.28% 0.00% struct page +48 (_mapcount.counter)
0.02% 0.06% 0.00% struct page +32 (index)
0.02% 0.00% 0.00% struct page +52 (_refcount.counter)
0.02% 0.01% 0.00% struct page +56 (memcg_data)
0.00% 0.01% 0.00% struct page +16 (lru.prev)
15.37% 17.54% 0.00% (stack operation)
15.37% 17.54% 0.00% (stack operation) +0 (no field)
11.71% 50.27% 0.00% (unknown)
11.71% 50.27% 0.00% (unknown) +0 (no field)
$ perf annotate --data-type
...
Annotate type: 'struct cfs_rq' in [kernel.kallsyms] (13 samples):
============================================================================
samples offset size field
13 0 640 struct cfs_rq {
2 0 16 struct load_weight load {
2 0 8 unsigned long weight;
0 8 4 u32 inv_weight;
};
0 16 8 unsigned long runnable_weight;
0 24 4 unsigned int nr_running;
1 28 4 unsigned int h_nr_running;
...
$ perf annotate --data-type=page --group
Annotate type: 'struct page' in [kernel.kallsyms] (480 samples):
event[0] = cpu/mem-loads,ldlat=4/P
event[1] = cpu/mem-stores/P
event[2] = dummy:u
===================================================================================
samples offset size field
447 33 0 0 64 struct page {
108 8 0 0 8 long unsigned int flags;
319 13 0 8 40 union {
319 13 0 8 40 struct {
236 2 0 8 16 union {
236 2 0 8 16 struct list_head lru {
236 1 0 8 8 struct list_head* next;
0 1 0 16 8 struct list_head* prev;
};
236 2 0 8 16 struct {
236 1 0 8 8 void* __filler;
0 1 0 16 4 unsigned int mlock_count;
};
236 2 0 8 16 struct list_head buddy_list {
236 1 0 8 8 struct list_head* next;
0 1 0 16 8 struct list_head* prev;
};
236 2 0 8 16 struct list_head pcp_list {
236 1 0 8 8 struct list_head* next;
0 1 0 16 8 struct list_head* prev;
};
};
82 4 0 24 8 struct address_space* mapping;
1 7 0 32 8 union {
1 7 0 32 8 long unsigned int index;
1 7 0 32 8 long unsigned int share;
};
0 0 0 40 8 long unsigned int private;
};
This uses the existing annotate code, calling objdump to do the
disassembly, with improvements to avoid having this take too long,
but longer term a switch to a disassembler library, possibly
reusing code in the kernel will be pursued.
This is the initial implementation, please use it and report
impressions and bugs. Make sure the kernel-debuginfo packages match
the running kernel. The 'perf report' phase for non short perf.data
files may take a while.
There is a great article about it on LWN:
https://lwn.net/Articles/955709/ - "Data-type profiling for perf"
One last test I did while writing this text, on a AMD Ryzen 5950X,
using a distro kernel, while doing a simple 'find /' on an
otherwise idle system resulted in:
# uname -r
6.6.9-100.fc38.x86_64
# perf -vv | grep BPF_
bpf: [ on ] # HAVE_LIBBPF_SUPPORT
bpf_skeletons: [ on ] # HAVE_BPF_SKEL
# rpm -qa | grep kernel-debuginfo
kernel-debuginfo-common-x86_64-6.6.9-100.fc38.x86_64
kernel-debuginfo-6.6.9-100.fc38.x86_64
#
# perf mem record -a --filter 'mem_op == load || mem_op == store, ip > 0x8000000000000000'
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.199 MB perf.data (2913 samples) ]
#
# ls -la perf.data
-rw-------. 1 root root 2346486 Jan 9 18:36 perf.data
# perf evlist
ibs_op//
dummy:u
# perf evlist -v
ibs_op//: type: 11, size: 136, config: 0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1
dummy:u: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT, read_format: ID, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
#
# perf report -s type,typeoff --hierarchy --group --stdio
# Total Lost Samples: 0
#
# Samples: 2K of events 'ibs_op//, dummy:u'
# Event count (approx.): 1904553038
#
# Overhead Data Type / Data Type Offset
# ................... ............................
#
73.70% 0.00% (unknown)
73.70% 0.00% (unknown) +0 (no field)
3.01% 0.00% long unsigned int
3.00% 0.00% long unsigned int +0 (no field)
0.01% 0.00% long unsigned int +2 (no field)
2.73% 0.00% struct task_struct
1.71% 0.00% struct task_struct +52 (on_cpu)
0.38% 0.00% struct task_struct +2104 (rcu_read_unlock_special.b.blocked)
0.23% 0.00% struct task_struct +2100 (rcu_read_lock_nesting)
0.14% 0.00% struct task_struct +2384 ()
0.06% 0.00% struct task_struct +3096 (signal)
0.05% 0.00% struct task_struct +3616 (cgroups)
0.05% 0.00% struct task_struct +2344 (active_mm)
0.02% 0.00% struct task_struct +46 (flags)
0.02% 0.00% struct task_struct +2096 (migration_disabled)
0.01% 0.00% struct task_struct +24 (__state)
0.01% 0.00% struct task_struct +3956 (mm_cid_active)
0.01% 0.00% struct task_struct +1048 (cpus_ptr)
0.01% 0.00% struct task_struct +184 (se.group_node.next)
0.01% 0.00% struct task_struct +20 (thread_info.cpu)
0.00% 0.00% struct task_struct +104 (on_rq)
0.00% 0.00% struct task_struct +2456 (pid)
1.36% 0.00% struct module
0.59% 0.00% struct module +952 (kallsyms)
0.42% 0.00% struct module +0 (state)
0.23% 0.00% struct module +8 (list.next)
0.12% 0.00% struct module +216 (syms)
0.95% 0.00% struct inode
0.41% 0.00% struct inode +40 (i_sb)
0.22% 0.00% struct inode +0 (i_mode)
0.06% 0.00% struct inode +76 (i_rdev)
0.06% 0.00% struct inode +56 (i_security)
<SNIP>
perf top/report:
- Don't ignore job control, allowing control+Z + bg to work.
- Add s390 raw data interpretation for PAI (Processor Activity
Instrumentation) counters.
perf archive:
- Add new option '--all' to pack perf.data with DSOs.
- Add new option '--unpack' to expand tarballs.
Initialization speedups:
- Lazily initialize zstd streams to save memory when not using it.
- Lazily allocate/size mmap event copy.
- Lazy load kernel symbols in 'perf record'.
- Be lazier in allocating lost samples buffer in 'perf record'.
- Don't synthesize BPF events when disabled via the command line
(perf record --no-bpf-event).
Assorted improvements:
- Show note on AMD systems that the :p, :pp, :ppp and :P are all the
same, as IBS (Instruction Based Sampling) is used and it is
inherentely precise, not having levels of precision like in Intel
systems.
- When 'cycles' isn't available, fall back to the "task-clock" event
when not system wide, not to 'cpu-clock'.
- Add --debug-file option to redirect debug output, e.g.:
$ perf --debug-file /tmp/perf.log record -v true
- Shrink 'struct map' to under one cacheline by avoiding function
pointers for selecting if addresses are identity or DSO relative,
and using just a byte for some boolean struct members.
- Resolve the arch specific strerrno just once to use in
perf_env__arch_strerrno().
- Reduce memory for recording PERF_RECORD_LOST_SAMPLES event.
Assorted fixes:
- Fix the default 'perf top' usage on Intel hybrid systems, now it
starts with a browser showing the number of samples for Efficiency
(cpu_atom/cycles/P) and Performance (cpu_core/cycles/P). This
behaviour is similar on ARM64, with its respective set of
big.LITTLE processors.
- Fix segfault on build_mem_topology() error path.
- Fix 'perf mem' error on hybrid related to availability of mem event
in a PMU.
- Fix missing reference count gets (map, maps) in the db-export code.
- Avoid recursively taking env->bpf_progs.lock in the 'perf_env'
code.
- Use the newly introduced maps__for_each_map() to add missing
locking around iteration of 'struct map' entries.
- Parse NOTE segments until the build id is found, don't stop on the
first one, ELF files may have several such NOTE segments.
- Remove 'egrep' usage, its deprecated, use 'grep -E' instead.
- Warn first about missing libelf, not libbpf, that depends on
libelf.
- Use alternative to 'find ... -printf' as this isn't supported in
busybox.
- Address python 3.6 DeprecationWarning for string scapes.
- Fix memory leak in uniq() in libsubcmd.
- Fix man page formatting for 'perf lock'
- Fix some spelling mistakes.
perf tests:
- Fail shell tests that needs some symbol in perf itself if it is
stripped. These tests check if a symbol is resolved, if some hot
function is indeed detected by profiling, etc.
- The 'perf test sigtrap' test is currently failing on PREEMPT_RT,
skip it if sleeping spinlocks are detected (using BTF) and point to
the mailing list discussion about it. This test is also being
skipped on several architectures (powerpc, s390x, arm and aarch64)
due to other pending issues with intruction breakpoints.
- Adjust test case perf record offcpu profiling tests for s390.
- Fix 'Setup struct perf_event_attr' fails on s390 on z/VM guest,
addressing issues caused by the fallback from cycles to task-clock
done in this release.
- Fix mask for VG register in the user-regs test.
- Use shellcheck on 'perf test' shell scripts automatically to make
sure changes don't introduce things it flags as problematic.
- Add option to change objdump binary and allow it to be set via
'perf config'.
- Add basic 'perf script', 'perf list --json" and 'perf diff' tests.
- Basic branch counter support.
- Make DSO tests a suite rather than individual.
- Remove atomics from test_loop to avoid test failures.
- Fix call chain match on powerpc for the record+probe_libc_inet_pton
test.
- Improve Intel hybrid tests.
Vendor event files (JSON):
powerpc:
- Update datasource event name to fix duplicate events on IBM's
Power10.
- Add PVN for HX-C2000 CPU with Power8 Architecture.
Intel:
- Alderlake/rocketlake metric fixes.
- Update emeraldrapids events to v1.02.
- Update icelakex events to v1.23.
- Update sapphirerapids events to v1.17.
- Add skx, clx, icx and spr upi bandwidth metric.
AMD:
- Add Zen 4 memory controller events.
RISC-V:
- Add StarFive Dubhe-80 and Dubhe-90 JSON files.
https://www.starfivetech.com/en/site/cpu-u
- Add T-HEAD C9xx JSON file.
https://github.com/riscv-software-src/opensbi/blob/master/docs/platform/thead-c9xx.md
ARM64:
- Remove UTF-8 characters from cmn.json, that were causing build
failure in some distros.
- Add core PMU events and metrics for Ampere One X.
- Rename Ampere One's BPU_FLUSH_MEM_FAULT to GPC_FLUSH_MEM_FAULT
libperf:
- Rename several perf_cpu_map constructor names to clarify what they
really do.
- Ditto for some other methods, coping with some issues in their
semantics, like perf_cpu_map__empty() ->
perf_cpu_map__has_any_cpu_or_is_empty().
- Document perf_cpu_map__nr()'s behavior
perf stat:
- Exit if parse groups fails.
- Combine the -A/--no-aggr and --no-merge options.
- Fix help message for --metric-no-threshold option.
Hardware tracing:
ARM64 CoreSight:
- Bump minimum OpenCSD version to ensure a bugfix is present.
- Add 'T' itrace option for timestamp trace
- Set start vm addr of exectable file to 0 and don't ignore first
sample on the arm-cs-trace-disasm.py 'perf script'"
* tag 'perf-tools-for-v6.8-1-2024-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (179 commits)
MAINTAINERS: Add Namhyung as tools/perf/ co-maintainer
perf test: test case 'Setup struct perf_event_attr' fails on s390 on z/vm
perf db-export: Fix missing reference count get in call_path_from_sample()
perf tests: Add perf script test
libsubcmd: Fix memory leak in uniq()
perf TUI: Don't ignore job control
perf vendor events intel: Update sapphirerapids events to v1.17
perf vendor events intel: Update icelakex events to v1.23
perf vendor events intel: Update emeraldrapids events to v1.02
perf vendor events intel: Alderlake/rocketlake metric fixes
perf x86 test: Add hybrid test for conflicting legacy/sysfs event
perf x86 test: Update hybrid expectations
perf vendor events amd: Add Zen 4 memory controller events
perf stat: Fix hard coded LL miss units
perf record: Reduce memory for recording PERF_RECORD_LOST_SAMPLES event
perf env: Avoid recursively taking env->bpf_progs.lock
perf annotate: Add --insn-stat option for debugging
perf annotate: Add --type-stat option for debugging
perf annotate: Support event group display
perf annotate: Add --data-type option
...
|
||
|
|
7912a6391f |
sound updates for 6.8-rc1
It was a clam development cycle. There were an ALSA core extension
for subformat PCM bits and a few ASoC core changes to support N:M
mappings, while the most of remaining changes are driver-specific.
Core:
- API extensions for properly limiting PCM format bits via subformat
- Enhanced support for N:M CPU:CODEC mappings in the core and in
audio-graph-card2
ASoC:
- Lots of SOF updates: fallback support to older IPC versions,
notification on control changes with IPC4.
Also supports for ACPI parse for the ES83xx driver that reduces
quirks.
- Device tree support for describing parts of the card which can be
active over suspend (for very low power playback or wake word use
cases)
- Support for more AMD and Intel systems, NXP i.MX8m MICFIL, Qualcomm
SM8250, SM8550, SM8650 and X1E80100
- Drop of Freescale MPC8610 code that is no longer supported
HD-audio:
- More CS35L41 codec extensions for Dell, HP and Lenovo models
- TAS2781 codec extensions for Lenovo and co
- New PCM subformat supports
Others:
- More enhancement for Scarlett2 USB mixer support
- Various kselftest fixes
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmWfzJgOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE/NHg//VT0JRZDKqAZMUIaFZswUIKgoUy9fY9WaePO+
db38uFzjuTC1ZN1PIDFnU0bQ2uL7IYF/xnceqitk+G76+Wt/pcDP/1hX+F9UhsVd
wgRrihBeNRtSmPku8cwKu+duD/46439JyWo34PdHm8FxNoyilBx6fIApk6JqB57V
HYU3ZFtQE84s8TmPHefasLeewY5thFtWYXLAaEZR5oohyXUTN0Np7h7vG8nchh1F
zZOwoQ+nBo607PwUoXd0BFYkcuXwHlK4vLVmAM05KPaH1Q/kesEecYMhIVbnDHOj
a4caMz+/tMbNbw3/GRsg8HgIeiyp1NoyC4LAqufa+Pj6BgPREVra5j8XoxAnCxXr
8X8EDtBds6frMzqtQyNvTkyRCf3Iki8fhz60Re5nEkoXLcv34E3kleQDLG/FnHqC
qeH0J3FEed84Gf6KrnpjkPHLFRx5ZKyahOHZ7Xc76fUYMCwvczkc5CKeG2EoivE4
koEkhlQU1gnNyjNTTi4JchWis+EZG/oNA91eud1lMDm320lFJmxdZ5z31xZubVvs
WTcMStgHCDPIKOeSBuwBCYFwugMtV/o/ejE567E4bxVC5ZA/zbxyvpxU9iDAjUNK
T0JdPf/KKy1YJiNe9xuJn9/1ZpV6BXFCl7b7wILV+ZbGduOczoMCEH5T7dwAmZqq
/lQtT/8=
=yVxc
-----END PGP SIGNATURE-----
Merge tag 'sound-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"It was a calm development cycle. There were an ALSA core extension for
subformat PCM bits and a few ASoC core changes to support N:M
mappings, while the most of remaining changes are driver-specific.
Core:
- API extensions for properly limiting PCM format bits via subformat
- Enhanced support for N:M CPU:CODEC mappings in the core and in
audio-graph-card2
ASoC:
- Lots of SOF updates: fallback support to older IPC versions,
notification on control changes with IPC4. Also supports for ACPI
parse for the ES83xx driver that reduces quirks.
- Device tree support for describing parts of the card which can be
active over suspend (for very low power playback or wake word use
cases)
- Support for more AMD and Intel systems, NXP i.MX8m MICFIL, Qualcomm
SM8250, SM8550, SM8650 and X1E80100
- Drop of Freescale MPC8610 code that is no longer supported
HD-audio:
- More CS35L41 codec extensions for Dell, HP and Lenovo models
- TAS2781 codec extensions for Lenovo and co
- New PCM subformat supports
Others:
- More enhancement for Scarlett2 USB mixer support
- Various kselftest fixes"
* tag 'sound-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (337 commits)
kselftest/alsa - conf: Stringify the printed errno in sysfs_get()
kselftest/alsa - mixer-test: Fix the print format specifier warning
kselftest/alsa - mixer-test: Fix the print format specifier warning
kselftest/alsa - mixer-test: fix the number of parameters to ksft_exit_fail_msg()
ALSA: hda/tas2781: annotate calibration data endianness
ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP Envy X360 13-ay0xxx
ALSA: hda/conexant: Fix headset auto detect fail in cx8070 and SN6140
ALSA: ac97: fix build regression
ALSA: hda: cs35l41: Support more HP models without _DSD
ALSA: hda/tas2781: add fixup for Lenovo 14ARB7
ALSA: hda/tas2781: add TAS2563 support for 14ARB7
ALSA: hda/tas2781: add configurable global i2c address
ALSA: hda/tas2781: add ptrs to calibration functions
ALSA: hda: Add driver properties for cs35l41 for Lenovo Legion Slim 7 Gen 8 serie
ALSA: hda/realtek: enable SND_PCI_QUIRK for Lenovo Legion Slim 7 Gen 8 (2023) serie
ALSA: hda/tas2781: configure the amp after firmware load
ALSA: mark all struct bus_type as const
ASoC: pxa: sspa: Don't select SND_ARM
ASoC: rt5663: cancel the work when system suspends
ALSA: scarlett2: Add PCM Input Switch for Solo Gen 4
...
|
||
|
|
3e7aeb78ab |
Networking changes for 6.8.
Core & protocols
----------------
- Analyze and reorganize core networking structs (socks, netdev,
netns, mibs) to optimize cacheline consumption and set up
build time warnings to safeguard against future header changes.
This improves TCP performances with many concurrent connections
up to 40%.
- Add page-pool netlink-based introspection, exposing the
memory usage and recycling stats. This helps indentify
bad PP users and possible leaks.
- Refine TCP/DCCP source port selection to no longer favor even
source port at connect() time when IP_LOCAL_PORT_RANGE is set.
This lowers the time taken by connect() for hosts having
many active connections to the same destination.
- Refactor the TCP bind conflict code, shrinking related socket
structs.
- Refactor TCP SYN-Cookie handling, as a preparation step to
allow arbitrary SYN-Cookie processing via eBPF.
- Tune optmem_max for 0-copy usage, increasing the default value
to 128KB and namespecifying it.
- Allow coalescing for cloned skbs coming from page pools, improving
RX performances with some common configurations.
- Reduce extension header parsing overhead at GRO time.
- Add bridge MDB bulk deletion support, allowing user-space to
request the deletion of matching entries.
- Reorder nftables struct members, to keep data accessed by the
datapath first.
- Introduce TC block ports tracking and use. This allows supporting
multicast-like behavior at the TC layer.
- Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and
classifiers (RSVP and tcindex).
- More data-race annotations.
- Extend the diag interface to dump TCP bound-only sockets.
- Conditional notification of events for TC qdisc class and actions.
- Support for WPAN dynamic associations with nearby devices, to form
a sub-network using a specific PAN ID.
- Implement SMCv2.1 virtual ISM device support.
- Add support for Batman-avd mulicast packet type.
BPF
---
- Tons of verifier improvements:
- BPF register bounds logic and range support along with a large
test suite
- log improvements
- complete precision tracking support for register spills
- track aligned STACK_ZERO cases as imprecise spilled registers. It
improves the verifier "instructions processed" metric from single
digit to 50-60% for some programs
- support for user's global BPF subprogram arguments with few
commonly requested annotations for a better developer experience
- support tracking of BPF_JNE which helps cases when the compiler
transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the
like
- several fixes
- Add initial TX metadata implementation for AF_XDP with support in
mlx5 and stmmac drivers. Two types of offloads are supported right
now, that is, TX timestamp and TX checksum offload.
- Fix kCFI bugs in BPF all forms of indirect calls from BPF into
kernel and from kernel into BPF work with CFI enabled. This allows
BPF to work with CONFIG_FINEIBT=y.
- Change BPF verifier logic to validate global subprograms lazily
instead of unconditionally before the main program, so they can be
guarded using BPF CO-RE techniques.
- Support uid/gid options when mounting bpffs.
- Add a new kfunc which acquires the associated cgroup of a task
within a specific cgroup v1 hierarchy where the latter is identified
by its id.
- Extend verifier to allow bpf_refcount_acquire() of a map value field
obtained via direct load which is a use-case needed in sched_ext.
- Add BPF link_info support for uprobe multi link along with bpftool
integration for the latter.
- Support for VLAN tag in XDP hints.
- Remove deprecated bpfilter kernel leftovers given the project
is developed in user-space (https://github.com/facebook/bpfilter).
Misc
----
- Support for parellel TC self-tests execution.
- Increase MPTCP self-tests coverage.
- Updated the bridge documentation, including several so-far
undocumented features.
- Convert all the net self-tests to run in unique netns, to
avoid random failures due to conflict and allow concurrent
runs.
- Add TCP-AO self-tests.
- Add kunit tests for both cfg80211 and mac80211.
- Autogenerate Netlink families documentation from YAML spec.
- Add yml-gen support for fixed headers and recursive nests, the
tool can now generate user-space code for all genetlink families
for which we have specs.
- A bunch of additional module descriptions fixes.
- Catch incorrect freeing of pages belonging to a page pool.
Driver API
----------
- Rust abstractions for network PHY drivers; do not cover yet the
full C API, but already allow implementing functional PHY drivers
in rust.
- Introduce queue and NAPI support in the netdev Netlink interface,
allowing complete access to the device <> NAPIs <> queues
relationship.
- Introduce notifications filtering for devlink to allow control
application scale to thousands of instances.
- Improve PHY validation, requesting rate matching information for
each ethtool link mode supported by both the PHY and host.
- Add support for ethtool symmetric-xor RSS hash.
- ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD
platform.
- Expose pin fractional frequency offset value over new DPLL generic
netlink attribute.
- Convert older drivers to platform remove callback returning void.
- Add support for PHY package MMD read/write.
New hardware / drivers
----------------------
- Ethernet:
- Octeon CN10K devices
- Broadcom 5760X P7
- Qualcomm SM8550 SoC
- Texas Instrument DP83TG720S PHY
- Bluetooth:
- IMC Networks Bluetooth radio
Removed
-------
- WiFi:
- libertas 16-bit PCMCIA support
- Atmel at76c50x drivers
- HostAP ISA/PCMCIA style 802.11b driver
- zd1201 802.11b USB dongles
- Orinoco ISA/PCMCIA 802.11b driver
- Aviator/Raytheon driver
- Planet WL3501 driver
- RNDIS USB 802.11b driver
Drivers
-------
- Ethernet high-speed NICs:
- Intel (100G, ice, idpf):
- allow one by one port representors creation and removal
- add temperature and clock information reporting
- add get/set for ethtool's header split ringparam
- add again FW logging
- adds support switchdev hardware packet mirroring
- iavf: implement symmetric-xor RSS hash
- igc: add support for concurrent physical and free-running timers
- i40e: increase the allowable descriptors
- nVidia/Mellanox:
- Preparation for Socket-Direct multi-dev netdev. That will allow
in future releases combining multiple PFs devices attached to
different NUMA nodes under the same netdev
- Broadcom (bnxt):
- TX completion handling improvements
- add basic ntuple filter support
- reduce MSIX vectors usage for MQPRIO offload
- add VXLAN support, USO offload and TX coalesce completion for P7
- Marvell Octeon EP:
- xmit-more support
- add PF-VF mailbox support and use it for FW notifications for VFs
- Wangxun (ngbe/txgbe):
- implement ethtool functions to operate pause param, ring param,
coalesce channel number and msglevel
- Netronome/Corigine (nfp):
- add flow-steering support
- support UDP segmentation offload
- Ethernet NICs embedded, slower, virtual:
- Xilinx AXI: remove duplicate DMA code adopting the dma engine driver
- stmmac: add support for HW-accelerated VLAN stripping
- TI AM654x sw: add mqprio, frame preemption & coalescing
- gve: add support for non-4k page sizes.
- virtio-net: support dynamic coalescing moderation
- nVidia/Mellanox Ethernet datacenter switches:
- allow firmware upgrade without a reboot
- more flexible support for bridge flooding via the compressed
FID flooding mode
- Ethernet embedded switches:
- Microchip:
- fine-tune flow control and speed configurations in KSZ8xxx
- KSZ88X3: enable setting rmii reference
- Renesas:
- add jumbo frames support
- Marvell:
- 88E6xxx: add "eth-mac" and "rmon" stats support
- Ethernet PHYs:
- aquantia: add firmware load support
- at803x: refactor the driver to simplify adding support for more
chip variants
- NXP C45 TJA11xx: Add MACsec offload support
- Wifi:
- MediaTek (mt76):
- NVMEM EEPROM improvements
- mt7996 Extremely High Throughput (EHT) improvements
- mt7996 Wireless Ethernet Dispatcher (WED) support
- mt7996 36-bit DMA support
- Qualcomm (ath12k):
- support for a single MSI vector
- WCN7850: support AP mode
- Intel (iwlwifi):
- new debugfs file fw_dbg_clear
- allow concurrent P2P operation on DFS channels
- Bluetooth:
- QCA2066: support HFP offload
- ISO: more broadcast-related improvements
- NXP: better recovery in case receiver/transmitter get out of sync
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmWdamsSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkGC4P/2xjLzdw22ckSssuE9ORbGko9SNjnqHk
PQh1E+26BHiCg5KB8VvzMsL78E79MRNXEattSW+1g7dhCvln3oi+Vd0WkdRkgt35
98Iv18zLbbwFAJeyKvmLAPAkQkMLtVj19QILBBRrugF+egEZgVSE3JBcTAiKv2ZQ
HzkabA171Ri6LpCcEEtY5XuaKvimGnGzF8YMFf8rX0wtqd2p5kbY9aMe47WAGxvU
Vf9548XvH+A5yVH2/4/gujtUOpA/RHuhuCMb+oo0cZ+VCC1x9MGzoXzj6r87OTkf
k2W1whNzcGoin92f+9Lk1JYMuiGKBH4QVaDdNXJnYFSJWPTE7RvRsPzYTSD4/GzK
yEZbzSJXpy/2vDQm16NoAxl7evRs8Sorzkw4LQRviZHI/5SAkK2ZQiCK5CO8QSYy
C1LELcV5kn6Foe24xWnrWLjAGug9oJnYoGPMU5gvPmFJMvUMXqm5rmbBgUWL5Rxw
q1M6gVzabCyWUy6z2G2vaqW2ZntNVvCkdsLtIX0XZkcTzNoP0MA+TuhyGz4wbiuo
PeyQp/mbGnDgCYggqKIA0YWrTVxkhFrKN520cbO8qXBQytV9oFbM/0/+C0/r/5WX
pL1JVzLrh6l5ME7EIQfha8UOF9j8q4ueSwb40P3AR2NaZiDABM0zfUZ6+sx+91WF
ucqPEcZB5cRE
=1bW6
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"The most interesting thing is probably the networking structs
reorganization and a significant amount of changes is around
self-tests.
Core & protocols:
- Analyze and reorganize core networking structs (socks, netdev,
netns, mibs) to optimize cacheline consumption and set up build
time warnings to safeguard against future header changes
This improves TCP performances with many concurrent connections up
to 40%
- Add page-pool netlink-based introspection, exposing the memory
usage and recycling stats. This helps indentify bad PP users and
possible leaks
- Refine TCP/DCCP source port selection to no longer favor even
source port at connect() time when IP_LOCAL_PORT_RANGE is set. This
lowers the time taken by connect() for hosts having many active
connections to the same destination
- Refactor the TCP bind conflict code, shrinking related socket
structs
- Refactor TCP SYN-Cookie handling, as a preparation step to allow
arbitrary SYN-Cookie processing via eBPF
- Tune optmem_max for 0-copy usage, increasing the default value to
128KB and namespecifying it
- Allow coalescing for cloned skbs coming from page pools, improving
RX performances with some common configurations
- Reduce extension header parsing overhead at GRO time
- Add bridge MDB bulk deletion support, allowing user-space to
request the deletion of matching entries
- Reorder nftables struct members, to keep data accessed by the
datapath first
- Introduce TC block ports tracking and use. This allows supporting
multicast-like behavior at the TC layer
- Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and
classifiers (RSVP and tcindex)
- More data-race annotations
- Extend the diag interface to dump TCP bound-only sockets
- Conditional notification of events for TC qdisc class and actions
- Support for WPAN dynamic associations with nearby devices, to form
a sub-network using a specific PAN ID
- Implement SMCv2.1 virtual ISM device support
- Add support for Batman-avd mulicast packet type
BPF:
- Tons of verifier improvements:
- BPF register bounds logic and range support along with a large
test suite
- log improvements
- complete precision tracking support for register spills
- track aligned STACK_ZERO cases as imprecise spilled registers.
This improves the verifier "instructions processed" metric from
single digit to 50-60% for some programs
- support for user's global BPF subprogram arguments with few
commonly requested annotations for a better developer
experience
- support tracking of BPF_JNE which helps cases when the compiler
transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the
like
- several fixes
- Add initial TX metadata implementation for AF_XDP with support in
mlx5 and stmmac drivers. Two types of offloads are supported right
now, that is, TX timestamp and TX checksum offload
- Fix kCFI bugs in BPF all forms of indirect calls from BPF into
kernel and from kernel into BPF work with CFI enabled. This allows
BPF to work with CONFIG_FINEIBT=y
- Change BPF verifier logic to validate global subprograms lazily
instead of unconditionally before the main program, so they can be
guarded using BPF CO-RE techniques
- Support uid/gid options when mounting bpffs
- Add a new kfunc which acquires the associated cgroup of a task
within a specific cgroup v1 hierarchy where the latter is
identified by its id
- Extend verifier to allow bpf_refcount_acquire() of a map value
field obtained via direct load which is a use-case needed in
sched_ext
- Add BPF link_info support for uprobe multi link along with bpftool
integration for the latter
- Support for VLAN tag in XDP hints
- Remove deprecated bpfilter kernel leftovers given the project is
developed in user-space (https://github.com/facebook/bpfilter)
Misc:
- Support for parellel TC self-tests execution
- Increase MPTCP self-tests coverage
- Updated the bridge documentation, including several so-far
undocumented features
- Convert all the net self-tests to run in unique netns, to avoid
random failures due to conflict and allow concurrent runs
- Add TCP-AO self-tests
- Add kunit tests for both cfg80211 and mac80211
- Autogenerate Netlink families documentation from YAML spec
- Add yml-gen support for fixed headers and recursive nests, the tool
can now generate user-space code for all genetlink families for
which we have specs
- A bunch of additional module descriptions fixes
- Catch incorrect freeing of pages belonging to a page pool
Driver API:
- Rust abstractions for network PHY drivers; do not cover yet the
full C API, but already allow implementing functional PHY drivers
in rust
- Introduce queue and NAPI support in the netdev Netlink interface,
allowing complete access to the device <> NAPIs <> queues
relationship
- Introduce notifications filtering for devlink to allow control
application scale to thousands of instances
- Improve PHY validation, requesting rate matching information for
each ethtool link mode supported by both the PHY and host
- Add support for ethtool symmetric-xor RSS hash
- ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD
platform
- Expose pin fractional frequency offset value over new DPLL generic
netlink attribute
- Convert older drivers to platform remove callback returning void
- Add support for PHY package MMD read/write
New hardware / drivers:
- Ethernet:
- Octeon CN10K devices
- Broadcom 5760X P7
- Qualcomm SM8550 SoC
- Texas Instrument DP83TG720S PHY
- Bluetooth:
- IMC Networks Bluetooth radio
Removed:
- WiFi:
- libertas 16-bit PCMCIA support
- Atmel at76c50x drivers
- HostAP ISA/PCMCIA style 802.11b driver
- zd1201 802.11b USB dongles
- Orinoco ISA/PCMCIA 802.11b driver
- Aviator/Raytheon driver
- Planet WL3501 driver
- RNDIS USB 802.11b driver
Driver updates:
- Ethernet high-speed NICs:
- Intel (100G, ice, idpf):
- allow one by one port representors creation and removal
- add temperature and clock information reporting
- add get/set for ethtool's header split ringparam
- add again FW logging
- adds support switchdev hardware packet mirroring
- iavf: implement symmetric-xor RSS hash
- igc: add support for concurrent physical and free-running
timers
- i40e: increase the allowable descriptors
- nVidia/Mellanox:
- Preparation for Socket-Direct multi-dev netdev. That will
allow in future releases combining multiple PFs devices
attached to different NUMA nodes under the same netdev
- Broadcom (bnxt):
- TX completion handling improvements
- add basic ntuple filter support
- reduce MSIX vectors usage for MQPRIO offload
- add VXLAN support, USO offload and TX coalesce completion
for P7
- Marvell Octeon EP:
- xmit-more support
- add PF-VF mailbox support and use it for FW notifications
for VFs
- Wangxun (ngbe/txgbe):
- implement ethtool functions to operate pause param, ring
param, coalesce channel number and msglevel
- Netronome/Corigine (nfp):
- add flow-steering support
- support UDP segmentation offload
- Ethernet NICs embedded, slower, virtual:
- Xilinx AXI: remove duplicate DMA code adopting the dma engine
driver
- stmmac: add support for HW-accelerated VLAN stripping
- TI AM654x sw: add mqprio, frame preemption & coalescing
- gve: add support for non-4k page sizes.
- virtio-net: support dynamic coalescing moderation
- nVidia/Mellanox Ethernet datacenter switches:
- allow firmware upgrade without a reboot
- more flexible support for bridge flooding via the compressed
FID flooding mode
- Ethernet embedded switches:
- Microchip:
- fine-tune flow control and speed configurations in KSZ8xxx
- KSZ88X3: enable setting rmii reference
- Renesas:
- add jumbo frames support
- Marvell:
- 88E6xxx: add "eth-mac" and "rmon" stats support
- Ethernet PHYs:
- aquantia: add firmware load support
- at803x: refactor the driver to simplify adding support for more
chip variants
- NXP C45 TJA11xx: Add MACsec offload support
- Wifi:
- MediaTek (mt76):
- NVMEM EEPROM improvements
- mt7996 Extremely High Throughput (EHT) improvements
- mt7996 Wireless Ethernet Dispatcher (WED) support
- mt7996 36-bit DMA support
- Qualcomm (ath12k):
- support for a single MSI vector
- WCN7850: support AP mode
- Intel (iwlwifi):
- new debugfs file fw_dbg_clear
- allow concurrent P2P operation on DFS channels
- Bluetooth:
- QCA2066: support HFP offload
- ISO: more broadcast-related improvements
- NXP: better recovery in case receiver/transmitter get out of sync"
* tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1714 commits)
lan78xx: remove redundant statement in lan78xx_get_eee
lan743x: remove redundant statement in lan743x_ethtool_get_eee
bnxt_en: Fix RCU locking for ntuple filters in bnxt_rx_flow_steer()
bnxt_en: Fix RCU locking for ntuple filters in bnxt_srxclsrldel()
bnxt_en: Remove unneeded variable in bnxt_hwrm_clear_vnic_filter()
tcp: Revert no longer abort SYN_SENT when receiving some ICMP
Revert "mlx5 updates 2023-12-20"
Revert "net: stmmac: Enable Per DMA Channel interrupt"
ipvlan: Remove usage of the deprecated ida_simple_xx() API
ipvlan: Fix a typo in a comment
net/sched: Remove ipt action tests
net: stmmac: Use interrupt mode INTM=1 for per channel irq
net: stmmac: Add support for TX/RX channel interrupt
net: stmmac: Make MSI interrupt routine generic
dt-bindings: net: snps,dwmac: per channel irq
net: phy: at803x: make read_status more generic
net: phy: at803x: add support for cdt cross short test for qca808x
net: phy: at803x: refactor qca808x cable test get status function
net: phy: at803x: generalize cdt fault length function
net: ethernet: cortina: Drop TSO support
...
|
||
|
|
5d09f61e50 |
linux_kselftest-nolibc-6.8-rc1
This nolibc update for Linux 6.8-rc1 consists of: * Support for PIC mode on MIPS. * Support for getrlimit()/setrlimit(). * Replace some custom declarations with UAPI includes. * A new script "run-tests.sh" to run the testsuite over different architectures and configurations. * A few non-functional code cleanups. * Minor improvements to nolibc-test, primarily to support the test script. There are no urgent fixes available at this time. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmWckk8ACgkQCwJExA0N QxyHRQ//eXSdYmn3VkNepc3iFn75ntzH8KAFN29ZKCtuTu7+kVjSx+swpRjY4NYE jE3n9V8YXw7+R4VNj/AmJlSsnZXsx/PrRa9DNtjHeAza7jFYhWowM9LSSJUUyl78 bOh6EvdRhoKuz3zz9A68OTDYSUwA3LaZ0vin8f+WtLH05NfSdafmX1pHLRB9LHzj J235WktJHoSXOwSAkPZ6NHdtkyeqxy7QomHkuxmmxeVxHnI5SIEDexfa+1FNffGa 9n5TXGZtcgKPE/m1EqBvW02GbIpflpu6H2fAzssaDb9QxhOXEw2wySn06i5q3hGD 6gwTsNqBUfPxZCj2tF6FH/7TxxPmNqLqrJVag/e4pO1rDZzrcTL+Dd6HP5TagJtV O6/L6UJvqzogIjZD9lk/rWyKfXW0TKk5zBGczduZj/W/McjQ9BDfjR9EjRD/F57Y fTB3kHd4TFL4DJyN+AHEdzpm1gwc+0NeGE9CJcrMkKzvqjafo2MNMrlYD9GSxKLy aPlWExE7KIBbLIyrwDNxQbt42RYVfkNFGNVX274TighQ9nGBRjXybflPioTUpMw1 Qyi7qrIDA7QGIrsEgCr6pPeA+LVkWoSyvpXGi/RULQUeg94V9TPXpO4jmf4VNPDQ NtYwI/D3UcroYDYY2K2M7KUpv0J4vM9kgQ1wqQ4n/6aKth6kek4= =gQNC -----END PGP SIGNATURE----- Merge tag 'linux_kselftest-nolibc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull nolibc updates from Shuah Khan: - Support for PIC mode on MIPS - Support for getrlimit()/setrlimit() - Replace some custom declarations with UAPI includes - A new script "run-tests.sh" to run the testsuite over different architectures and configurations - A few non-functional code cleanups - Minor improvements to nolibc-test, primarily to support the test script * tag 'linux_kselftest-nolibc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (22 commits) selftests/nolibc: disable coredump via setrlimit tools/nolibc: add support for getrlimit/setrlimit tools/nolibc: drop custom definition of struct rusage tools/nolibc: drop duplicated testcase ioctl_tiocinq tools/nolibc: annotate va_list printf formats selftests/nolibc: make result alignment more robust tools/nolibc: mips: add support for PIC selftests/nolibc: run-tests.sh: enable testing via qemu-user selftests/nolibc: introduce QEMU_ARCH_USER selftests/nolibc: fix testcase status alignment selftests/nolibc: add configuration for mipso32be selftests/nolibc: extraconfig support selftests/nolibc: explicitly specify ABI for MIPS selftests/nolibc: use XARCH for MIPS tools/nolibc: move MIPS ABI validation into arch-mips.h tools/nolibc: error out on unsupported architecture selftests/nolibc: add script to run testsuite selftests/nolibc: support out-of-tree builds selftests/nolibc: anchor paths in $(srcdir) if possible selftests/nolibc: use EFI -bios for LoongArch qemu ... |
||
|
|
fb46e22a9e |
Many singleton patches against the MM code. The patch series which
are included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the
series
"maple_tree: add mt_free_one() and mt_attr() helpers"
"Some cleanups of maple tree"
- In the series "mm: use memmap_on_memory semantics for dax/kmem"
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few
fixes) in the patch series
"Add folio_zero_tail() and folio_fill_tail()"
"Make folio_start_writeback return void"
"Fix fault handler's handling of poisoned tail pages"
"Convert aops->error_remove_page to ->error_remove_folio"
"Finish two folio conversions"
"More swap folio conversions"
- Kefeng Wang has also contributed folio-related work in the series
"mm: cleanup and use more folio in page fault"
- Jim Cromie has improved the kmemleak reporting output in the
series "tweak kmemleak report format".
- In the series "stackdepot: allow evicting stack traces" Andrey
Konovalov to permits clients (in this case KASAN) to cause
eviction of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series "mm:
page_alloc: fixes for high atomic reserve caluculations".
- Dmitry Rokosov has added to the samples/ dorectory some sample
code for a userspace memcg event listener application. See the
series "samples: introduce cgroup events listeners".
- Some mapletree maintanance work from Liam Howlett in the series
"maple_tree: iterator state changes".
- Nhat Pham has improved zswap's approach to writeback in the
series "workload-specific and memory pressure-driven zswap
writeback".
- DAMON/DAMOS feature and maintenance work from SeongJae Park in
the series
"mm/damon: let users feed and tame/auto-tune DAMOS"
"selftests/damon: add Python-written DAMON functionality tests"
"mm/damon: misc updates for 6.8"
- Yosry Ahmed has improved memcg's stats flushing in the series
"mm: memcg: subtree stats flushing and thresholds".
- In the series "Multi-size THP for anonymous memory" Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series "More buffer_head
cleanups".
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
"userfaultfd move option". UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a "KSM Advisor", in the series
"mm/ksm: Add ksm advisor". This is a governor which tunes KSM's
scanning aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory
use in the series "mm/zswap: dstmem reuse optimizations and
cleanups".
- Matthew Wilcox has performed some maintenance work on the
writeback code, both code and within filesystems. The series is
"Clean up the writeback paths".
- Andrey Konovalov has optimized KASAN's handling of alloc and
free stack traces for secondary-level allocators, in the series
"kasan: save mempool stack traces".
- Andrey also performed some KASAN maintenance work in the series
"kasan: assorted clean-ups".
- David Hildenbrand has gone to town on the rmap code. Cleanups,
more pte batching, folio conversions and more. See the series
"mm/rmap: interface overhaul".
- Kinsey Ho has contributed some maintenance work on the MGLRU
code in the series "mm/mglru: Kconfig cleanup".
- Matthew Wilcox has contributed lruvec page accounting code
cleanups in the series "Remove some lruvec page accounting
functions".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA
jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27
Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU=
=0NHs
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the series
'maple_tree: add mt_free_one() and mt_attr() helpers'
'Some cleanups of maple tree'
- In the series 'mm: use memmap_on_memory semantics for dax/kmem'
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few fixes)
in the patch series
'Add folio_zero_tail() and folio_fill_tail()'
'Make folio_start_writeback return void'
'Fix fault handler's handling of poisoned tail pages'
'Convert aops->error_remove_page to ->error_remove_folio'
'Finish two folio conversions'
'More swap folio conversions'
- Kefeng Wang has also contributed folio-related work in the series
'mm: cleanup and use more folio in page fault'
- Jim Cromie has improved the kmemleak reporting output in the series
'tweak kmemleak report format'.
- In the series 'stackdepot: allow evicting stack traces' Andrey
Konovalov to permits clients (in this case KASAN) to cause eviction
of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series 'mm:
page_alloc: fixes for high atomic reserve caluculations'.
- Dmitry Rokosov has added to the samples/ dorectory some sample code
for a userspace memcg event listener application. See the series
'samples: introduce cgroup events listeners'.
- Some mapletree maintanance work from Liam Howlett in the series
'maple_tree: iterator state changes'.
- Nhat Pham has improved zswap's approach to writeback in the series
'workload-specific and memory pressure-driven zswap writeback'.
- DAMON/DAMOS feature and maintenance work from SeongJae Park in the
series
'mm/damon: let users feed and tame/auto-tune DAMOS'
'selftests/damon: add Python-written DAMON functionality tests'
'mm/damon: misc updates for 6.8'
- Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
memcg: subtree stats flushing and thresholds'.
- In the series 'Multi-size THP for anonymous memory' Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series 'More buffer_head
cleanups'.
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
Add ksm advisor'. This is a governor which tunes KSM's scanning
aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory use
in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.
- Matthew Wilcox has performed some maintenance work on the writeback
code, both code and within filesystems. The series is 'Clean up the
writeback paths'.
- Andrey Konovalov has optimized KASAN's handling of alloc and free
stack traces for secondary-level allocators, in the series 'kasan:
save mempool stack traces'.
- Andrey also performed some KASAN maintenance work in the series
'kasan: assorted clean-ups'.
- David Hildenbrand has gone to town on the rmap code. Cleanups, more
pte batching, folio conversions and more. See the series 'mm/rmap:
interface overhaul'.
- Kinsey Ho has contributed some maintenance work on the MGLRU code
in the series 'mm/mglru: Kconfig cleanup'.
- Matthew Wilcox has contributed lruvec page accounting code cleanups
in the series 'Remove some lruvec page accounting functions'"
* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
mm, treewide: introduce NR_PAGE_ORDERS
selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
selftests/mm: skip test if application doesn't has root privileges
selftests/mm: conform test to TAP format output
selftests: mm: hugepage-mmap: conform to TAP format output
selftests/mm: gup_test: conform test to TAP format output
mm/selftests: hugepage-mremap: conform test to TAP format output
mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
mm/memcontrol: remove __mod_lruvec_page_state()
mm/khugepaged: use a folio more in collapse_file()
slub: use a folio in __kmalloc_large_node
slub: use folio APIs in free_large_kmalloc()
slub: use alloc_pages_node() in alloc_slab_page()
mm: remove inc/dec lruvec page state functions
mm: ratelimit stat flush from workingset shrinker
kasan: stop leaking stack trace handles
mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
mm/mglru: add dummy pmd_dirty()
...
|
||
|
|
ab5f3fcb7c |
arm64 updates for 6.8
* for-next/cpufeature
- Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye olde
Thunder-X machines.
- Avoid mapping KPTI trampoline when it is not required.
- Make CPU capability API more robust during early initialisation.
* for-next/early-idreg-overrides
- Remove dependencies on core kernel helpers from the early
command-line parsing logic in preparation for moving this code
before the kernel is mapped.
* for-next/fpsimd
- Restore kernel-mode fpsimd context lazily, allowing us to run fpsimd
code sequences in the kernel with pre-emption enabled.
* for-next/kbuild
- Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y.
- Makefile cleanups.
* for-next/lpa2-prep
- Preparatory work for enabling the 'LPA2' extension, which will
introduce 52-bit virtual and physical addressing even with 4KiB
pages (including for KVM guests).
* for-next/misc
- Remove dead code and fix a typo.
* for-next/mm
- Pass NUMA node information for IRQ stack allocations.
* for-next/perf
- Add perf support for the Synopsys DesignWare PCIe PMU.
- Add support for event counting thresholds (FEAT_PMUv3_TH) introduced
in Armv8.8.
- Add support for i.MX8DXL SoCs to the IMX DDR PMU driver.
- Minor PMU driver fixes and optimisations.
* for-next/rip-vpipt
- Remove what support we had for the obsolete VPIPT I-cache policy.
* for-next/selftests
- Improvements to the SVE and SME selftests.
* for-next/stacktrace
- Refactor kernel unwind logic so that it can used by BPF unwinding
and, eventually, reliable backtracing.
* for-next/sysregs
- Update a bunch of register definitions based on the latest XML drop
from Arm.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmWWvKYQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNIiTB/9agZBkEhZjP2sNDGyE4UFwawweWHkt2r8h
WyvdwP91Z/AIsYSsGYu36J0l4pOnMKp/i6t+rt031SK4j+Q8hJYhSfDt3RvVbc0/
Pz9D18V6cLrfq+Yxycqq9ufVdjs+m+CQ5WeLaRGmNIyEzJ/Jv/qrAN+2r603EeLP
nq08qMZhDIQd2ZzbigCnGaNrTsVSafFfBFv1GsgDvnMZAjs1G6457A6zu+NatNUc
+TMSG+3EawutHZZ2noXl0Ra7VOfIbVZFiUssxRPenKQByHHHR+QB2c/O1blri+dm
XLMutvqO2/WvYGIfXO5koqZqvpVeR3zXxPwmGi5hQBsmOjtXzKd+
=U4mo
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"CPU features:
- Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye
olde Thunder-X machines
- Avoid mapping KPTI trampoline when it is not required
- Make CPU capability API more robust during early initialisation
Early idreg overrides:
- Remove dependencies on core kernel helpers from the early
command-line parsing logic in preparation for moving this code
before the kernel is mapped
FPsimd:
- Restore kernel-mode fpsimd context lazily, allowing us to run
fpsimd code sequences in the kernel with pre-emption enabled
KBuild:
- Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y
- Makefile cleanups
LPA2 prep:
- Preparatory work for enabling the 'LPA2' extension, which will
introduce 52-bit virtual and physical addressing even with 4KiB
pages (including for KVM guests).
Misc:
- Remove dead code and fix a typo
MM:
- Pass NUMA node information for IRQ stack allocations
Perf:
- Add perf support for the Synopsys DesignWare PCIe PMU
- Add support for event counting thresholds (FEAT_PMUv3_TH)
introduced in Armv8.8
- Add support for i.MX8DXL SoCs to the IMX DDR PMU driver.
- Minor PMU driver fixes and optimisations
RIP VPIPT:
- Remove what support we had for the obsolete VPIPT I-cache policy
Selftests:
- Improvements to the SVE and SME selftests
Stacktrace:
- Refactor kernel unwind logic so that it can used by BPF unwinding
and, eventually, reliable backtracing
Sysregs:
- Update a bunch of register definitions based on the latest XML drop
from Arm"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits)
kselftest/arm64: Don't probe the current VL for unsupported vector types
efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad
arm64: properly install vmlinuz.efi
arm64/sysreg: Add missing system instruction definitions for FGT
arm64/sysreg: Add missing system register definitions for FGT
arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1
arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1
arm64: memory: remove duplicated include
arm: perf: Fix ARCH=arm build with GCC
arm64: Align boot cpucap handling with system cpucap handling
arm64: Cleanup system cpucap handling
MAINTAINERS: add maintainers for DesignWare PCIe PMU driver
drivers/perf: add DesignWare PCIe PMU driver
PCI: Move pci_clear_and_set_dword() helper to PCI header
PCI: Add Alibaba Vendor ID to linux/pci_ids.h
docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
arm64: irq: set the correct node for shadow call stack
Revert "perf/arm_dmc620: Remove duplicate format attribute #defines"
arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
...
|
||
|
|
33241dca48 |
net/sched: Remove uapi support for CBQ qdisc
Commit
|
||
|
|
26cc8714fc |
net/sched: Remove uapi support for ATM qdisc
Commit
|
||
|
|
fe3b739a54 |
net/sched: Remove uapi support for dsmark qdisc
Commit
|
||
|
|
82b2545ed9 |
net/sched: Remove uapi support for tcindex classifier
commit
|
||
|
|
41bc3e8fc1 |
net/sched: Remove uapi support for rsvp classifier
commit
|
||
|
|
d17aff807f |
Revert BPF token-related functionality
This patch includes the following revert (one conflicting BPF FS patch and three token patch sets, represented by merge commits): - revert |
||
|
|
c49b292d03 |
netdev
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmWAz2EACgkQ6rmadz2v bToqrw/9EwroZCc8GEHOKAlb/fzrMvn92rLo0ZW/cGN84QJPnx4zM6Zo0+fgLaaN oqqztwMUwdzGC3uX3FfVXaaLKbJ/MeHeL9BXFZNW8zkRHciw4R7kIBhOdPnHyET7 uT+rQ4xPe1Mt7e9PjepKlSL5mEsxWfBkdUgsdn19Z2Vjdfr9mZMhYWYMJGcfTCD1 TwxHKBPhq5fN3IsshmMBB8IrRp1HStUKb65MgZ4dI22LJXxTsFkx5XMFXcmuqvkH NhKj8jDcPEEh31bYcb6aG2Z4onw5F2lquygjk1Qyy5cyw45m/ipJKAXKdAyvJG+R VZCWOET/9wbRwFSK5wxwihCuKghFiofK52i2PcGtXZh0PCouyZZneSJOKM0yVWKO BvuJBxK4ETRnQyN6ZxhuJiEXG3/YMBBhyR2TX1LntVK9ct/k7qFVzATG49J39/sR SYMbptBRj4a5oMJ1qn0nFVEDFkg0jTnTDNnsEpcz60Ayt6EsJ1XosO5yz2huf861 xgRMTKMseyG1/uV45tQ8ZPzbSPpBxjUi9Dl3coYsIm1a+y6clWUXcarONY5KVrpS CR98DuFgl+E7dXuisd/Kz2p2KxxSPq8nytsmLlgOvrUqhwiXqB+TKN8EHgIapVOt l1A5LrzXFTcGlT9MlaWBqEIy83Bu1nqQqbxrAFOE0k8A5jomXaw= =stU2 -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2023-12-18 This PR is larger than usual and contains changes in various parts of the kernel. The main changes are: 1) Fix kCFI bugs in BPF, from Peter Zijlstra. End result: all forms of indirect calls from BPF into kernel and from kernel into BPF work with CFI enabled. This allows BPF to work with CONFIG_FINEIBT=y. 2) Introduce BPF token object, from Andrii Nakryiko. It adds an ability to delegate a subset of BPF features from privileged daemon (e.g., systemd) through special mount options for userns-bound BPF FS to a trusted unprivileged application. The design accommodates suggestions from Christian Brauner and Paul Moore. Example: $ sudo mkdir -p /sys/fs/bpf/token $ sudo mount -t bpf bpffs /sys/fs/bpf/token \ -o delegate_cmds=prog_load:MAP_CREATE \ -o delegate_progs=kprobe \ -o delegate_attachs=xdp 3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei. - Complete precision tracking support for register spills - Fix verification of possibly-zero-sized stack accesses - Fix access to uninit stack slots - Track aligned STACK_ZERO cases as imprecise spilled registers. It improves the verifier "instructions processed" metric from single digit to 50-60% for some programs. - Fix verifier retval logic 4) Support for VLAN tag in XDP hints, from Larysa Zaremba. 5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu. End result: better memory utilization and lower I$ miss for calls to BPF via BPF trampoline. 6) Fix race between BPF prog accessing inner map and parallel delete, from Hou Tao. 7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu. It allows BPF interact with IPSEC infra. The intent is to support software RSS (via XDP) for the upcoming ipsec pcpu work. Experiments on AWS demonstrate single tunnel pcpu ipsec reaching line rate on 100G ENA nics. 8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao. 9) BPF file verification via fsverity, from Song Liu. It allows BPF progs get fsverity digest. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits) bpf: Ensure precise is reset to false in __mark_reg_const_zero() selftests/bpf: Add more uprobe multi fail tests bpf: Fail uprobe multi link with negative offset selftests/bpf: Test the release of map btf s390/bpf: Fix indirect trampoline generation selftests/bpf: Temporarily disable dummy_struct_ops test on s390 x86/cfi,bpf: Fix bpf_exception_cb() signature bpf: Fix dtor CFI cfi: Add CFI_NOSEAL() x86/cfi,bpf: Fix bpf_struct_ops CFI x86/cfi,bpf: Fix bpf_callback_t CFI x86/cfi,bpf: Fix BPF JIT call cfi: Flip headers selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment bpf: Limit the number of kprobes when attaching program to multiple kprobes bpf: Limit the number of uprobes when attaching program to multiple uprobes bpf: xdp: Register generic_kfunc_set with XDP programs selftests/bpf: utilize string values for delegate_xxx mount options ... ==================== Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
ab1c247094 |
Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up fixes that went thru perf-tools for v6.7 and to get in sync with upstream to check for drift in the copies of headers, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
e6795330f8 |
xdp: Add VLAN tag hint
Implement functionality that enables drivers to expose VLAN tag to XDP code. VLAN tag is represented by 2 variables: - protocol ID, which is passed to bpf code in BE - VLAN TCI, in host byte order Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20231205210847.28460-10-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
c7b98bf0fc |
KVM: selftests: aarch64: Update tools copy of arm_pmuv3.h
Now that ARMV8_PMU_PMCR_N is made with GENMASK, update usages to treat it as a pre-shifted mask. Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20231211161331.1277825-9-james.clark@arm.com Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
a0bb5f88fc |
tools/nolibc: add support for getrlimit/setrlimit
The implementation uses the prlimit64 systemcall as that is available on all architectures. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/20231123-nolibc-rlimit-v1-2-a428b131de2a@weissschuh.net/ Acked-by: Willy Tarreau <w@1wt.eu> |
||
|
|
7b20478b77 |
tools/nolibc: drop custom definition of struct rusage
A future commit will include linux/resource.h, which will conflict with the private definition of struct rusage in nolibc. Avoid the conflict by dropping the private definition and use the one from the UAPI headers. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/20231123-nolibc-rlimit-v1-1-a428b131de2a@weissschuh.net/ Acked-by: Willy Tarreau <w@1wt.eu> |
||
|
|
dece8476d6 |
tools/nolibc: annotate va_list printf formats
__attribute__(format(printf)) can also be used for functions that take a
va_list argument.
As per the GCC docs:
For functions where the arguments are not available to be checked
(such as vprintf), specify the third parameter as zero.
Link: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
||
|
|
544102458a |
tools/nolibc: mips: add support for PIC
MIPS requires some extra instructions to set up the $gp register for the with a pointer to the global data area. This isn't needed for non-PIC builds, but this patch enables the code unconditionally to prevent bitrot. Also enable PIC in one of the test configurations for ongoing validation. Link: https://lore.kernel.org/r/20231108-nolibc-pic-v2-1-4fb0d6284757@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> |
||
|
|
aa68a5a83a |
tools/nolibc: move MIPS ABI validation into arch-mips.h
When installing nolibc to a sysroot arch.h is not used so its ABI check is bypassed. This makes is possible to compile nolibc with a non O32 ABI which may build but can not run. Move the check into arch-mips.h so it will always be evaluated. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> |
||
|
|
48946c5aa7 |
tools/nolibc: error out on unsupported architecture
When an architecture is unsupported arch.h would silently continue. This leads to a lot of followup errors because my_syscallX() is not defined and the startup code is missing. Avoid these confusing errors and fail the build early with a clear error message and location. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> |
||
|
|
bb6ec2e9fd |
tools/nolibc: Use linux/wait.h rather than duplicating it
Linux defines a few custom flags for waitpid() which aren't currently provided by nolibc, make them available to nolibc based programs by just including linux/wait.h where they are defined instead of defining our own copy of the flags. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> |
||
|
|
e6a9a2cbc1 |
fs/proc/task_mmu: report SOFT_DIRTY bits through the PAGEMAP_SCAN ioctl
The PAGEMAP_SCAN ioctl returns information regarding page table entries. It is more efficient compared to reading pagemap files. CRIU can start to utilize this ioctl, but it needs info about soft-dirty bits to track memory changes. We are aware of a new method for tracking memory changes implemented in the PAGEMAP_SCAN ioctl. For CRIU, the primary advantage of this method is its usability by unprivileged users. However, it is not feasible to transparently replace the soft-dirty tracker with the new one. The main problem here is userfault descriptors that have to be preserved between pre-dump iterations. It means criu continues supporting the soft-dirty method to avoid breakage for current users. The new method will be implemented as a separate feature. [avagin@google.com: update tools/include/uapi/linux/fs.h] Link: https://lkml.kernel.org/r/20231107164139.576046-1-avagin@google.com Link: https://lkml.kernel.org/r/20231106220959.296568-1-avagin@google.com Signed-off-by: Andrei Vagin <avagin@google.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
446e1867e6 |
maple_tree: update check_forking() and bench_forking()
Updated check_forking() and bench_forking() to use __mt_dup() to duplicate maple tree. Link: https://lkml.kernel.org/r/20231027033845.90608-9-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
b2472efe43 |
maple_tree: introduce {mtree,mas}_lock_nested()
In some cases, nested locks may be needed, so {mtree,mas}_lock_nested is
introduced. For example, when duplicating maple tree, we need to hold the
locks of two trees, in which case nested locks are needed.
At the same time, add the definition of spin_lock_nested() in tools for
testing.
Link: https://lkml.kernel.org/r/20231027033845.90608-3-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
||
|
|
2483e7f04c |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/stmicro/stmmac/dwmac5.c drivers/net/ethernet/stmicro/stmmac/dwmac5.h drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c drivers/net/ethernet/stmicro/stmmac/hwif.h |
||
|
|
7065eefb38 |
bpf: rename MAX_BPF_LINK_TYPE into __MAX_BPF_LINK_TYPE for consistency
To stay consistent with the naming pattern used for similar cases in BPF UAPI (__MAX_BPF_ATTACH_TYPE, etc), rename MAX_BPF_LINK_TYPE into __MAX_BPF_LINK_TYPE. Also similar to MAX_BPF_ATTACH_TYPE and MAX_BPF_REG, add: #define MAX_BPF_LINK_TYPE __MAX_BPF_LINK_TYPE Not all __MAX_xxx enums have such #define, so I'm not sure if we should add it or not, but I figured I'll start with a completely backwards compatible way, and we can drop that, if necessary. Also adjust a selftest that used MAX_BPF_LINK_TYPE enum. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231206190920.1651226-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
e1cef620f5 |
bpf: add BPF token support to BPF_PROG_LOAD command
Add basic support of BPF token to BPF_PROG_LOAD. Wire through a set of allowed BPF program types and attach types, derived from BPF FS at BPF token creation time. Then make sure we perform bpf_token_capable() checks everywhere where it's relevant. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231130185229.2688956-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
ee54b1a910 |
bpf: add BPF token support to BPF_BTF_LOAD command
Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading through delegated BPF token. BTF loading is a pretty straightforward operation, so as long as BPF token is created with allow_cmds granting BPF_BTF_LOAD command, kernel proceeds to parsing BTF data and creating BTF object. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231130185229.2688956-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
688b7270b3 |
bpf: add BPF token support to BPF_MAP_CREATE command
Allow providing token_fd for BPF_MAP_CREATE command to allow controlled BPF map creation from unprivileged process through delegated BPF token. Wire through a set of allowed BPF map types to BPF token, derived from BPF FS at BPF token creation time. This, in combination with allowed_cmds allows to create a narrowly-focused BPF token (controlled by privileged agent) with a restrictive set of BPF maps that application can attempt to create. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231130185229.2688956-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
4527358b76 |
bpf: introduce BPF token object
Add new kind of BPF kernel object, BPF token. BPF token is meant to
allow delegating privileged BPF functionality, like loading a BPF
program or creating a BPF map, from privileged process to a *trusted*
unprivileged process, all while having a good amount of control over which
privileged operations could be performed using provided BPF token.
This is achieved through mounting BPF FS instance with extra delegation
mount options, which determine what operations are delegatable, and also
constraining it to the owning user namespace (as mentioned in the
previous patch).
BPF token itself is just a derivative from BPF FS and can be created
through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF
FS FD, which can be attained through open() API by opening BPF FS mount
point. Currently, BPF token "inherits" delegated command, map types,
prog type, and attach type bit sets from BPF FS as is. In the future,
having an BPF token as a separate object with its own FD, we can allow
to further restrict BPF token's allowable set of things either at the
creation time or after the fact, allowing the process to guard itself
further from unintentionally trying to load undesired kind of BPF
programs. But for now we keep things simple and just copy bit sets as is.
When BPF token is created from BPF FS mount, we take reference to the
BPF super block's owning user namespace, and then use that namespace for
checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
capabilities that are normally only checked against init userns (using
capable()), but now we check them using ns_capable() instead (if BPF
token is provided). See bpf_token_capable() for details.
Such setup means that BPF token in itself is not sufficient to grant BPF
functionality. User namespaced process has to *also* have necessary
combination of capabilities inside that user namespace. So while
previously CAP_BPF was useless when granted within user namespace, now
it gains a meaning and allows container managers and sys admins to have
a flexible control over which processes can and need to use BPF
functionality within the user namespace (i.e., container in practice).
And BPF FS delegation mount options and derived BPF tokens serve as
a per-container "flag" to grant overall ability to use bpf() (plus further
restrict on which parts of bpf() syscalls are treated as namespaced).
Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
within the BPF FS owning user namespace, rounding up the ns_capable()
story of BPF token.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231130185229.2688956-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
||
|
|
8481a249a0 |
netdev-genl: spec: Add PID in netdev netlink YAML spec
Add support in netlink spec(netdev.yaml) for PID of the NAPI thread. Add code generated from the spec. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Link: https://lore.kernel.org/r/170147335301.5260.11872351477120434501.stgit@anambiarhost.jf.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
5a5131d66f |
netdev-genl: spec: Add irq in netdev netlink YAML spec
Add support in netlink spec(netdev.yaml) for interrupt number among the NAPI attributes. Add code generated from the spec. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Link: https://lore.kernel.org/r/170147334210.5260.18178387869057516983.stgit@anambiarhost.jf.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
ff9991499f |
netdev-genl: spec: Extend netdev netlink spec in YAML for NAPI
Add support in netlink spec(netdev.yaml) for napi related information. Add code generated from the spec. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Link: https://lore.kernel.org/r/170147333119.5260.7050639053080529108.stgit@anambiarhost.jf.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
bc87795627 |
netdev-genl: spec: Extend netdev netlink spec in YAML for queue
Add support in netlink spec(netdev.yaml) for queue information. Add code generated from the spec. Note: The "queue-type" attribute takes values 0 and 1 for rx and tx queue type respectively. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Link: https://lore.kernel.org/r/170147330963.5260.2576294626647300472.stgit@anambiarhost.jf.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
753c8608f3 |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZWiCPAAKCRDbK58LschI g4djAQC1FdqCRIFkhbiIRNHTgHjnfQShELQbd9ofJqzylLqmmgD+JI1E7D9SXagm pIXQ26EGmq8/VcCT3VLncA8EsC76Gg4= =Xowm -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-11-30 We've added 30 non-merge commits during the last 7 day(s) which contain a total of 58 files changed, 1598 insertions(+), 154 deletions(-). The main changes are: 1) Add initial TX metadata implementation for AF_XDP with support in mlx5 and stmmac drivers. Two types of offloads are supported right now, that is, TX timestamp and TX checksum offload, from Stanislav Fomichev with stmmac implementation from Song Yoong Siang. 2) Change BPF verifier logic to validate global subprograms lazily instead of unconditionally before the main program, so they can be guarded using BPF CO-RE techniques, from Andrii Nakryiko. 3) Add BPF link_info support for uprobe multi link along with bpftool integration for the latter, from Jiri Olsa. 4) Use pkg-config in BPF selftests to determine ld flags which is in particular needed for linking statically, from Akihiko Odaki. 5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18, from Yonghong Song. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits) bpf/tests: Remove duplicate JSGT tests selftests/bpf: Add TX side to xdp_hw_metadata selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP selftests/bpf: Add TX side to xdp_metadata selftests/bpf: Add csum helpers selftests/xsk: Support tx_metadata_len xsk: Add option to calculate TX checksum in SW xsk: Validate xsk_tx_metadata flags xsk: Document tx_metadata_len layout net: stmmac: Add Tx HWTS support to XDP ZC net/mlx5e: Implement AF_XDP TX timestamp and checksum offload tools: ynl: Print xsk-features from the sample xsk: Add TX timestamp and TX checksum offload support xsk: Support tx_metadata_len selftests/bpf: Use pkg-config for libelf selftests/bpf: Override PKG_CONFIG for static builds selftests/bpf: Choose pkg-config for the target bpftool: Add support to display uprobe_multi links selftests/bpf: Add link_info test for uprobe_multi link selftests/bpf: Use bpf_link__destroy in fill_link_info tests ... ==================== Conflicts: Documentation/netlink/specs/netdev.yaml: |
||
|
|
11614723af |
xsk: Add option to calculate TX checksum in SW
For XDP_COPY mode, add a UMEM option XDP_UMEM_TX_SW_CSUM to call skb_checksum_help in transmit path. Might be useful to debugging issues with real hardware. I also use this mode in the selftests. Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-9-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
48eb03dd26 |
xsk: Add TX timestamp and TX checksum offload support
This change actually defines the (initial) metadata layout
that should be used by AF_XDP userspace (xsk_tx_metadata).
The first field is flags which requests appropriate offloads,
followed by the offload-specific fields. The supported per-device
offloads are exported via netlink (new xsk-flags).
The offloads themselves are still implemented in a bit of a
framework-y fashion that's left from my initial kfunc attempt.
I'm introducing new xsk_tx_metadata_ops which drivers are
supposed to implement. The drivers are also supposed
to call xsk_tx_metadata_request/xsk_tx_metadata_complete in
the right places. Since xsk_tx_metadata_{request,_complete}
are static inline, we don't incur any extra overhead doing
indirect calls.
The benefit of this scheme is as follows:
- keeps all metadata layout parsing away from driver code
- makes it easy to grep and see which drivers implement what
- don't need any extra flags to maintain to keep track of what
offloads are implemented; if the callback is implemented - the offload
is supported (used by netlink reporting code)
Two offloads are defined right now:
1. XDP_TXMD_FLAGS_CHECKSUM: skb-style csum_start+csum_offset
2. XDP_TXMD_FLAGS_TIMESTAMP: writes TX timestamp back into metadata
area upon completion (tx_timestamp field)
XDP_TXMD_FLAGS_TIMESTAMP is also implemented for XDP_COPY mode: it writes
SW timestamp from the skb destructor (note I'm reusing hwtstamps to pass
metadata pointer).
The struct is forward-compatible and can be extended in the future
by appending more fields.
Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231127190319.1190813-3-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
||
|
|
341ac980ea |
xsk: Support tx_metadata_len
For zerocopy mode, tx_desc->addr can point to an arbitrary offset and carry some TX metadata in the headroom. For copy mode, there is no way currently to populate skb metadata. Introduce new tx_metadata_len umem config option that indicates how many bytes to treat as metadata. Metadata bytes come prior to tx_desc address (same as in RX case). The size of the metadata has mostly the same constraints as XDP: - less than 256 bytes - 8-byte aligned (compared to 4-byte alignment on xdp, due to 8-byte timestamp in the completion) - non-zero This data is not interpreted in any way right now. Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231127190319.1190813-2-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
e56fdbfb06 |
bpf: Add link_info support for uprobe multi link
Adding support to get uprobe_link details through bpf_link_info interface. Adding new struct uprobe_multi to struct bpf_link_info to carry the uprobe_multi link details. The uprobe_multi.count is passed from user space to denote size of array fields (offsets/ref_ctr_offsets/cookies). The actual array size is stored back to uprobe_multi.count (allowing user to find out the actual array size) and array fields are populated up to the user passed size. All the non-array fields (path/count/flags/pid) are always set. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20231125193130.834322-4-jolsa@kernel.org |
||
|
|
637567e4a3 |
tools: ynl: add sample for getting page-pool information
Regenerate the tools/ code after netdev spec changes.
Add sample to query page-pool info in a concise fashion:
$ ./page-pool
eth0[2] page pools: 10 (zombies: 0)
refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
||
|
|
2112aa0349 |
ALSA: pcm: Introduce MSBITS subformat interface
Improve granularity of format selection for S32/U32 formats by adding constants representing 20, 24 and MAX most significant bits. The MAX means the maximum number of significant bits which can the physical format hold. For 32-bit formats, MAX is related to 32 bits. For 8-bit formats, MAX is related to 8 bits etc. As there is only one user currently (format S32_LE), subformat is represented by a simple u32 and stores flags only for that one user alone. The approach of subformat being part of struct snd_pcm_hardware is a compromise between ALSA and ASoC allowing for hw_params-intersection code to be alloc/free-less while not adding any new responsibilities to ASoC runtime structures. Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jaroslav Kysela <perex@perex.cz> Co-developed-by: Cezary Rojewski <cezary.rojewski@intel.com> Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Link: https://lore.kernel.org/r/20231117120610.1755254-2-cezary.rojewski@intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de> |
||
|
|
57686a72da |
tools: Disable __packed attribute compiler warning due to -Werror=attributes
Noticed on several perf tools cross build test containers:
[perfbuilder@five ~]$ grep FAIL ~/dm.log/summary
19 10.18 debian:experimental-x-mips : FAIL gcc version 12.3.0 (Debian 12.3.0-6)
20 11.21 debian:experimental-x-mips64 : FAIL gcc version 12.3.0 (Debian 12.3.0-6)
21 11.30 debian:experimental-x-mipsel : FAIL gcc version 12.3.0 (Debian 12.3.0-6)
37 12.07 ubuntu:18.04-x-arm : FAIL gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)
42 11.91 ubuntu:18.04-x-riscv64 : FAIL gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
44 13.17 ubuntu:18.04-x-sh4 : FAIL gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
45 12.09 ubuntu:18.04-x-sparc64 : FAIL gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
[perfbuilder@five ~]$
In file included from util/intel-pt-decoder/intel-pt-pkt-decoder.c:10:
/tmp/perf-6.6.0-rc1/tools/include/asm-generic/unaligned.h: In function 'get_unaligned_le16':
/tmp/perf-6.6.0-rc1/tools/include/asm-generic/unaligned.h:13:29: error: packed attribute causes inefficient alignment for 'x' [-Werror=attributes]
13 | const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr); \
| ^
/tmp/perf-6.6.0-rc1/tools/include/asm-generic/unaligned.h:27:28: note: in expansion of macro '__get_unaligned_t'
27 | return le16_to_cpu(__get_unaligned_t(__le16, p));
| ^~~~~~~~~~~~~~~~~
This comes from the kernel, where the -Wattributes and -Wpacked isn't
used, -Wpacked is already disabled, do it for the attributes as well.
Fixes:
|
||
|
|
91c97b36bd |
tools headers UAPI: Update tools's copy of unistd.h header
tldr; Just FYI, I'm carrying this on the perf tools tree.
Full explanation:
There used to be no copies, with tools/ code using kernel headers
directly. From time to time tools/perf/ broke due to legitimate kernel
hacking. At some point Linus complained about such direct usage. Then we
adopted the current model.
The way these headers are used in perf are not restricted to just
including them to compile something.
There are sometimes used in scripts that convert defines into string
tables, etc, so some change may break one of these scripts, or new MSRs
may use some different #define pattern, etc.
E.g.:
$ ls -1 tools/perf/trace/beauty/*.sh | head -5
tools/perf/trace/beauty/arch_errno_names.sh
tools/perf/trace/beauty/drm_ioctl.sh
tools/perf/trace/beauty/fadvise.sh
tools/perf/trace/beauty/fsconfig.sh
tools/perf/trace/beauty/fsmount.sh
$
$ tools/perf/trace/beauty/fadvise.sh
static const char *fadvise_advices[] = {
[0] = "NORMAL",
[1] = "RANDOM",
[2] = "SEQUENTIAL",
[3] = "WILLNEED",
[4] = "DONTNEED",
[5] = "NOREUSE",
};
$
The tools/perf/check-headers.sh script, part of the tools/ build
process, points out changes in the original files.
So its important not to touch the copies in tools/ when doing changes in
the original kernel headers, that will be done later, when
check-headers.sh inform about the change to the perf tools hackers.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20231121225650.390246-6-namhyung@kernel.org
|