linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-03-22 23:46:49 +08:00

Author	SHA1	Message	Date
Andrei Otcheretianski	3cbadd84f5	wifi: nl80211: Add more NAN capabilities Add better break down for NAN capabilities, as NAN has multiple optional features. This allows to better indicate which features are supported or or offloaded to the device. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250908140015.bb02cd8c1596.I01fb2e8dc3662b847f3c27117bc4e199fc96d0a3@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-09-19 11:26:22 +02:00
Andrei Otcheretianski	1ccfd8db34	wifi: cfg80211: Add cluster joined notification APIs The drivers should notify upper layers and user space when a NAN device joins a cluster. This is needed, for example, to set the correct addr3 in SDF frames. Add API to report cluster join event. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250908140015.ad27b7b6e4d9.I70b213a2a49f18d1ba2ad325e67e8eff51cc7a1f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-09-19 11:26:21 +02:00
Andrei Otcheretianski	ba9b2ceaa2	wifi: nl80211: Add NAN Discovery Window (DW) notification This notification will be used by the device to inform user space about upcoming DW. When received, user space will be able to prepare multicast Service Discovery Frames (SDFs) to be transmitted during the next DW using %NL80211_CMD_FRAME command on the NAN management interface. The device/driver will take care to transmit the frames in the correct timing. This allows to implement a synchronized Discovery Engine (DE) in user space, if the device doesn't support DE offload. Note that this notification can be sent before the actual DW starts as long as the driver/device handles the actual timing of the SDF transmission. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250908140015.0e1d15031bab.I5b1721e61b63910452b3c5cdcdc1e94cb094d4c9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-09-19 11:26:21 +02:00
Andrei Otcheretianski	01b4a3061b	wifi: nl80211: Add more configuration options for NAN commands Current NAN APIs have only basic configuration for master preference and operating bands. Add and parse additional parameters which provide more control over NAN synchronization. The newly added attributes allow to publish additional NAN attributes and vendor elements in NAN beacons, control scan and discovery beacons periodicity, enable/disable DW notifications etc. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> tested: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250908140015.a4779492bf8e.I375feb919bd72358173766b9fe10010c40796b33@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-09-19 11:26:21 +02:00
Jakub Kicinski	f2cdc4c22b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.17-rc7). No conflicts. Adjacent changes: drivers/net/ethernet/mellanox/mlx5/core/en/fs.h `9536fbe10c` ("net/mlx5e: Add PSP steering in local NIC RX") `7601a0a462` ("net/mlx5e: Add a miss level for ipsec crypto offload") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-18 11:26:06 -07:00
Jakub Kicinski	6b46ca260e	net: psp: add socket security association code Add the ability to install PSP Rx and Tx crypto keys on TCP connections. Netlink ops are provided for both operations. Rx side combines allocating a new Rx key and installing it on the socket. Theoretically these are separate actions, but in practice they will always be used one after the other. We can add distinct "alloc" and "install" ops later. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Co-developed-by: Daniel Zahka <daniel.zahka@gmail.com> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250917000954.859376-9-daniel.zahka@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-18 12:32:06 +02:00
Jakub Kicinski	117f02a49b	psp: add op for rotation of device key Rotating the device key is a key part of the PSP protocol design. Some external daemon needs to do it once a day, or so. Add a netlink op to perform this operation. Add a notification group for informing users that key has been rotated and they should rekey (next rotation will cut them off). Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250917000954.859376-6-daniel.zahka@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-18 12:32:06 +02:00
Jakub Kicinski	00c94ca2b9	psp: base PSP device support Add a netlink family for PSP and allow drivers to register support. The "PSP device" is its own object. This allows us to perform more flexible reference counting / lifetime control than if PSP information was part of net_device. In the future we should also be able to "delegate" PSP access to software devices, such as *vlan, veth or netkit more easily. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250917000954.859376-3-daniel.zahka@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-18 12:32:06 +02:00
Chia-Yu Chang	b40671b5ee	tcp: accecn: AccECN option failure handling AccECN option may fail in various way, handle these: - Attempt to negotiate the use of AccECN on the 1st retransmitted SYN - From the 2nd retransmitted SYN, stop AccECN negotiation - Remove option from SYN/ACK rexmits to handle blackholes - If no option arrives in SYN/ACK, assume Option is not usable - If an option arrives later, re-enabled - If option is zeroed, disable AccECN option processing This patch use existing padding bits in tcp_request_sock and holes in tcp_sock without increasing the size. Signed-off-by: Ilpo Järvinen <ij@kernel.org> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250916082434.100722-9-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-18 08:47:52 +02:00
Ilpo Järvinen	b5e74132df	tcp: accecn: AccECN option The Accurate ECN allows echoing back the sum of bytes for each IP ECN field value in the received packets using AccECN option. This change implements AccECN option tx & rx side processing without option send control related features that are added by a later change. Based on specification: https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt (Some features of the spec will be added in the later changes rather than in this one). A full-length AccECN option is always attempted but if it does not fit, the minimum length is selected based on the counters that have changed since the last update. The AccECN option (with 24-bit fields) often ends in odd sizes so the option write code tries to take advantage of some nop used to pad the other TCP options. The delivered_ecn_bytes pairs with received_ecn_bytes similar to how delivered_ce pairs with received_ce. In contrast to ACE field, however, the option is not always available to update delivered_ecn_bytes. For ACK w/o AccECN option, the delivered bytes calculated based on the cumulative ACK+SACK information are assigned to one of the counters using an estimation heuristic to select the most likely ECN byte counter. Any estimation error is corrected when the next AccECN option arrives. It may occur that the heuristic gets too confused when there are enough different byte counter deltas between ACKs with the AccECN option in which case the heuristic just gives up on updating the counters for a while. tcp_ecn_option sysctl can be used to select option sending mode for AccECN: TCP_ECN_OPTION_DISABLED, TCP_ECN_OPTION_MINIMUM, and TCP_ECN_OPTION_FULL. This patch increases the size of tcp_info struct, as there is no existing holes for new u32 variables. Below are the pahole outcomes before and after this patch: [BEFORE THIS PATCH] struct tcp_info { [...] __u32 tcpi_total_rto_time; /* 244 4 / / size: 248, cachelines: 4, members: 61 / } [AFTER THIS PATCH] struct tcp_info { [...] __u32 tcpi_total_rto_time; / 244 4 / __u32 tcpi_received_ce; / 248 4 / __u32 tcpi_delivered_e1_bytes; / 252 4 / __u32 tcpi_delivered_e0_bytes; / 256 4 / __u32 tcpi_delivered_ce_bytes; / 260 4 / __u32 tcpi_received_e1_bytes; / 264 4 / __u32 tcpi_received_e0_bytes; / 268 4 / __u32 tcpi_received_ce_bytes; / 272 4 / / size: 280, cachelines: 5, members: 68 / } This patch uses the existing 1-byte holes in the tcp_sock_write_txrx group for new u8 members, but adds a 4-byte hole in tcp_sock_write_rx group after the new u32 delivered_ecn_bytes[3] member. Therefore, the group size of tcp_sock_write_rx is increased from 96 to 112. Below are the pahole outcomes before and after this patch: [BEFORE THIS PATCH] struct tcp_sock { [...] u8 received_ce_pending:4; / 2522: 0 1 / u8 unused2:4; / 2522: 4 1 / / XXX 1 byte hole, try to pack / [...] u32 rcv_rtt_last_tsecr; / 2668 4 / [...] __cacheline_group_end__tcp_sock_write_rx[0]; / 2728 0 / [...] / size: 3200, cachelines: 50, members: 167 / } [AFTER THIS PATCH] struct tcp_sock { [...] u8 received_ce_pending:4;/ 2522: 0 1 / u8 unused2:4; / 2522: 4 1 / u8 accecn_minlen:2; / 2523: 0 1 / u8 est_ecnfield:2; / 2523: 2 1 / u8 unused3:4; / 2523: 4 1 / [...] u32 rcv_rtt_last_tsecr; / 2668 4 / u32 delivered_ecn_bytes[3];/ 2672 12 / / XXX 4 bytes hole, try to pack / [...] __cacheline_group_end__tcp_sock_write_rx[0]; / 2744 0 / [...] / size: 3200, cachelines: 50, members: 171 */ } Signed-off-by: Ilpo Järvinen <ij@kernel.org> Signed-off-by: Neal Cardwell <ncardwell@google.com> Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250916082434.100722-7-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-18 08:47:52 +02:00
Matthieu Baerts (NGI0)	2293c57484	mptcp: pm: nl: announce deny-join-id0 flag During the connection establishment, a peer can tell the other one that it cannot establish new subflows to the initial IP address and port by setting the 'C' flag [1]. Doing so makes sense when the sender is behind a strict NAT, operating behind a legacy Layer 4 load balancer, or using anycast IP address for example. When this 'C' flag is set, the path-managers must then not try to establish new subflows to the other peer's initial IP address and port. The in-kernel PM has access to this info, but the userspace PM didn't. The RFC8684 [1] is strict about that: (...) therefore the receiver MUST NOT try to open any additional subflows toward this address and port. So it is important to tell the userspace about that as it is responsible for the respect of this flag. When a new connection is created and established, the Netlink events now contain the existing but not currently used 'flags' attribute. When MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 is set, it means no other subflows to the initial IP address and port -- info that are also part of the event -- can be established. Link: https://datatracker.ietf.org/doc/html/rfc8684#section-3.1-20.6 [1] Fixes: `702c2f646d` ("mptcp: netlink: allow userspace-driven subflow establishment") Reported-by: Marek Majkowski <marek@cloudflare.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/532 Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-2-40171884ade8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-15 18:12:05 -07:00
Jakub Kicinski	bd569dd935	Merge tag 'nf-next-25-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter: updates for net-next 1) Don't respond to ICMP_UNREACH errors with another ICMP_UNREACH error. 2) Support fetching the current bridge ethernet address. This allows a more flexible approach to packet redirection on bridges without need to use hardcoded addresses. From Fernando Fernandez Mancera. 3) Zap a few no-longer needed conditionals from ipvs packet path and convert to READ/WRITE_ONCE to avoid KCSAN warnings. From Zhang Tengfei. 4) Remove a no-longer-used macro argument in ipset, from Zhen Ni. * tag 'nf-next-25-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_reject: don't reply to icmp error messages ipvs: Use READ_ONCE/WRITE_ONCE for ipvs->enable netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support netfilter: ipset: Remove unused htable_bits in macro ahash_region selftest:net: fixed spelling mistakes ==================== Link: https://patch.msgid.link/20250911143819.14753-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-12 17:06:25 -07:00
Petr Machata	21446c06b4	net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0 The previous patches introduced a new option, BR_BOOLOPT_FDB_LOCAL_VLAN_0. When enabled, it has local FDB entries installed only on VLAN 0, instead of duplicating them across all VLANs. In this patch, add the corresponding UAPI toggle, and the code for turning the feature on and off. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/ea99bfb10f687fa58091e6e1c2f8acc33f47ca45.1757004393.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-11 19:02:50 -07:00
Jakub Kicinski	d103f26a5c	Merge tag 'wireless-next-2025-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Plenty of things going on, notably: - iwlwifi: major cleanups/rework - brcmfmac: gets AP isolation support - mac80211: gets more S1G support * tag 'wireless-next-2025-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (94 commits) wifi: mwifiex: fix endianness handling in mwifiex_send_rgpower_table wifi: cfg80211: Remove the redundant wiphy_dev wifi: mac80211: fix incorrect comment wifi: cfg80211: update the time stamps in hidden ssid wifi: mac80211: Fix HE capabilities element check wifi: mac80211: add tx_handlers_drop statistics to ethtool wifi: mac80211: fix reporting of all valid links in sta_set_sinfo() wifi: iwlwifi: mld: CHANNEL_SURVEY_NOTIF is always supported wifi: iwlwifi: mld: remove support of iwl_esr_mode_notif version 1 wifi: iwlwifi: mld: remove support from of sta cmd version 1 wifi: iwlwifi: mld: remove support of roc cmd version 5 wifi: iwlwifi: mld: remove support of mac cmd ver 2 wifi: iwlwifi: mld: don't consider phy cmd version 5 wifi: iwlwifi: implement wowlan status notification API update wifi: iwlwifi: fw: Add ASUS to PPAG and TAS list wifi: iwlwifi: add kunit tests for nvm parse wifi: iwlwifi: api: add a flag to iwl_link_ctx_modify_flags wifi: iwlwifi: pcie: move ltr_enabled to the specific transport wifi: iwlwifi: pcie: move pm_support to the specific transport wifi: iwlwifi: rename iwl_finish_nic_init ... ==================== Link: https://patch.msgid.link/20250911100854.20445-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-11 17:50:46 -07:00
Fernando Fernandez Mancera	cbd2257dc9	netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support Expose the input bridge interface ethernet address so it can be used to redirect the packet to the receiving physical device for processing. Tested with nft command line tool. table bridge nat { chain PREROUTING { type filter hook prerouting priority 0; policy accept; ether daddr de:ad:00:00:be:ef meta pkttype set host ether daddr set meta ibrhwdr accept } } Joint work with Pablo Neira. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-11 15:40:55 +02:00
Hangbin Liu	6b6dc81ee7	bonding: add support for per-port LACP actor priority Introduce a new netlink attribute 'actor_port_prio' to allow setting the LACP actor port priority on a per-slave basis. This extends the existing bonding infrastructure to support more granular control over LACP negotiations. The priority value is embedded in LACPDU packets and will be used by subsequent patches to influence aggregator selection policies. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250902064501.360822-2-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-09 10:56:02 +02:00
Carolina Jubran	faf23f54d3	ptp: Add ioctl commands to expose raw cycle counter values Introduce two new ioctl commands, PTP_SYS_OFFSET_PRECISE_CYCLES and PTP_SYS_OFFSET_EXTENDED_CYCLES, to allow user space to access the raw free-running cycle counter from PTP devices. These ioctls are variants of the existing PRECISE and EXTENDED offset queries, but instead of returning device time in realtime, they return the raw cycle counter value. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://patch.msgid.link/1755008228-88881-2-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-09 09:33:24 +02:00
Jakub Kicinski	5ef04a7b06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.17-rc5). No conflicts. Adjacent changes: include/net/sock.h `c51613fa27` ("net: add sk->sk_drop_counters") `5d6b58c932` ("net: lockless sock_i_ino()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-04 13:33:00 -07:00
Arend van Spriel	4f652a390d	wifi: nl80211: strict checking attributes for NL80211_CMD_SET_BSS Assure user-space only modifies attributes for NL80211_CMD_SET_BSS that are supported by the driver. This stricter checking is only done when user-space commits to it by including NL80211_ATTR_BSS_PARAM. Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://patch.msgid.link/20250817190435.1495094-4-arend.vanspriel@broadcom.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-09-04 11:19:02 +02:00
Arend van Spriel	2418553491	wifi: nl80211: allow drivers to support subset of NL80211_CMD_SET_BSS The so-called fullmac devices rely on firmware functionality and/or API to change BSS parameters. Today there are limited drivers supporting the nl80211 primitive, but they only handle a subset of the bss parameters passed if any. The mac80211 driver does handle all parameters and stores their configured values. Some of the BSS parameters were already conditional by wiphy->features. For these the wiphy->bss_param_support and wiphy->features fields are silently aligned in wiphy_register(). Maybe better to issue a warning instead when they are misaligned. Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://patch.msgid.link/20250817190435.1495094-2-arend.vanspriel@broadcom.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-09-04 11:19:02 +02:00
Muna Sinada	d0bf06158c	wifi: nl80211: Add EHT fixed Tx rate support Add new attributes to support EHT MCS/NSS Tx rates and EHT GI/LTF. Parse EHT fixed MCS/NSS Tx rates and EHT GI/LTF values passed by the userspace, validate and add as part of cfg80211_bitrate_mask. MCS mask is constructed by new function, eht_build_mcs_mask(). Max NSS supported for MCS rates of 7, 9, 11 and 13 is utilized to set MCS bitmask for each NSS. MCS rates 14, and 15 if supported, are set only for NSS = 0. Co-developed-by: Aloka Dixit <aloka.dixit@oss.qualcomm.com> Signed-off-by: Aloka Dixit <aloka.dixit@oss.qualcomm.com> Signed-off-by: Muna Sinada <muna.sinada@oss.qualcomm.com> Link: https://patch.msgid.link/20250815213011.2704803-1-muna.sinada@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-09-04 11:19:02 +02:00
Lachlan Hodges	ee63609454	wifi: mac80211: support block bitmap S1G TIM encoding An S1G TIM PVB is encoded differently compared to a non-s1g TIM PVB. As the AP dictates which encoding mode it uses, here we only implement block bitmap encoding. This is the default encoding mode used by all current vendor implementations. Additionally, S1G has a maximum AID count of 8192, however we are limiting the current implementation to 1600. This has no resemblence to the standard and is purely an implementation detail. The reason for this is due to the TIM elements maximum length of 255. This allows for, at most, 25 encoded blocks for a PVB encoded with block bitmap. Support for the maximum of 8192 AIDs will require an implementation of page slicing to be added to mac80211. As a result, we perform extra validation on both the STA and AP side when receiving an AID as an S1G interface. Add support for block bitmap encoding for an S1G AP and limit the maximum AID count to 1600 for the current mac80211 implementations. Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250725132221.258217-2-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-09-04 11:19:01 +02:00
Phil Sutter	4039ce7ef4	netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX This new attribute is supposed to be used instead of NFTA_DEVICE_NAME for simple wildcard interface specs. It holds a NUL-terminated string representing an interface name prefix to match on. While kernel code to distinguish full names from prefixes in NFTA_DEVICE_NAME is simpler than this solution, reusing the existing attribute with different semantics leads to confusion between different versions of kernel and user space though: * With old kernels, wildcards submitted by user space are accepted yet silently treated as regular names. * With old user space, wildcards submitted by kernel may cause crashes since libnftnl expects NUL-termination when there is none. Using a distinct attribute type sanitizes these situations as the receiving part detects and rejects the unexpected attribute nested in *_HOOK_DEVS attributes. Fixes: `6d07a28950` ("netfilter: nf_tables: Support wildcard netdev hook specs") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-04 09:19:25 +02:00
Jakub Kicinski	d23ad54de7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.17-rc4). No conflicts. Adjacent changes: drivers/net/ethernet/intel/idpf/idpf_txrx.c `02614eee26` ("idpf: do not linearize big TSO packets") `6c4e684802` ("idpf: remove obsolete stashing code") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-29 11:48:01 -07:00
Jakub Kicinski	c2a756891b	uapi: wrap compiler_types.h in an ifdef instead of the implicit strip The uAPI stddef header includes compiler_types.h, a kernel-only header, to make sure that kernel definitions of annotations like __counted_by() take precedence. There is a hack in scripts/headers_install.sh which strips includes of compiler.h and compiler_types.h when installing uAPI headers. While explicit handling makes sense for compiler.h, which is included all over the uAPI, compiler_types.h is only included by stddef.h (within the uAPI, obviously it's included in kernel code a lot). Remove the stripping from scripts/headers_install.sh and wrap the include of compiler_types.h in #ifdef __KERNEL__ instead. This should be equivalent functionally, but is easier to understand to a casual reader of the code. It also makes it easier to work with kernel headers directly from under tools/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250825201828.2370083-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-28 13:06:48 +02:00
Shahar Shitrit	da0e219764	devlink: Make health reporter burst period configurable Enable configuration of the burst period — a time window starting from the first error recovery, during which the reporter allows recovery attempts for each reported error. This feature is helpful when a single underlying issue causes multiple errors, as it delays the start of the grace period to allow sufficient time for recovering all related errors. For example, if multiple TX queues time out simultaneously, a sufficient burst period could allow all affected TX queues to be recovered within that window. Without this period, only the first TX queue that reports a timeout will undergo recovery, while the remaining TX queues will be blocked once the grace period begins. Configuration example: $ devlink health set pci/0000:00:09.0 reporter tx burst_period 500 Configuration example with ynl: ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/devlink.yaml \ --do health-reporter-set --json '{ "bus-name": "auxiliary", "dev-name": "mlx5_core.eth.0", "port-index": 65535, "health-reporter-name": "tx", "health-reporter-burst-period": 500 }' Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250824084354.533182-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-26 17:24:16 -07:00
Namhyung Kim	24fc631539	vhost: Fix ioctl # for VHOST_[GS]ET_FORK_FROM_OWNER The VHOST_[GS]ET_FEATURES_ARRAY ioctl already took 0x83 and it would result in a build error when the vhost uapi header is used for perf tool build like below. In file included from trace/beauty/ioctl.c:93: tools/perf/trace/beauty/generated/ioctl/vhost_virtio_ioctl_array.c: In function ‘ioctl__scnprintf_vhost_virtio_cmd’: tools/perf/trace/beauty/generated/ioctl/vhost_virtio_ioctl_array.c:36:18: error: initialized field overwritten [-Werror=override-init] 36 \| [0x83] = "SET_FORK_FROM_OWNER", \| ^~~~~~~~~~~~~~~~~~~~~ tools/perf/trace/beauty/generated/ioctl/vhost_virtio_ioctl_array.c:36:18: note: (near initialization for ‘vhost_virtio_ioctl_cmds[131]’) Fixes: `7d9896e9f6` ("vhost: Reintroduce kthread API and add mode selection") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Message-Id: <20250819063958.833770-1-namhyung@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>	2025-08-26 03:38:19 -04:00
Linus Torvalds	a2e94e8079	Merge tag 'block-6.17-20250822' of git://git.kernel.dk/linux Pull block fixes from Jens Axboe: "A set of fixes for block that should go into this tree. A bit larger than what I usually have at this point in time, a lot of that is the continued fixing of the lockdep annotation for queue freezing that we recently added, which has highlighted a number of little issues here and there. This contains: - MD pull request via Yu: - Add a legacy_async_del_gendisk mode, to prevent a user tools regression. New user tools releases will not use such a mode, the old release with a new kernel now will have warning about deprecated behavior, and we prepare to remove this legacy mode after about a year later - The rename in kernel causing user tools build failure, revert the rename in mdp_superblock_s - Fix a regression that interrupted resync can be shown as recover from mdstat or sysfs - Improve file size detection for loop, particularly for networked file systems, by using getattr to get the size rather than the cached inode size. - Hotplug CPU lock vs queue freeze fix - Lockdep fix while updating the number of hardware queues - Fix stacking for PI devices - Silence bio_check_eod() for the known case of device removal where the size is truncated to 0 sectors" * tag 'block-6.17-20250822' of git://git.kernel.dk/linux: block: avoid cpu_hotplug_lock depedency on freeze_lock block: decrement block_rq_qos static key in rq_qos_del() block: skip q->rq_qos check in rq_qos_done_bio() blk-mq: fix lockdep warning in __blk_mq_update_nr_hw_queues block: tone down bio_check_eod loop: use vfs_getattr_nosec for accurate file size loop: Consolidate size calculation logic into lo_calculate_size() block: remove newlines from the warnings in blk_validate_integrity_limits block: handle pi_tuple_size in queue_limits_stack_integrity selftests: ublk: Use ARRAY_SIZE() macro to improve code md: fix sync_action incorrect display during resync md: add helper rdev_needs_recovery() md: keep recovery_cp in mdp_superblock_s md: add legacy_async_del_gendisk mode	2025-08-22 09:29:51 -04:00
Chen Yu	8151320c74	ACPI: pfr_update: Fix the driver update version check The security-version-number check should be used rather than the runtime version check for driver updates. Otherwise, the firmware update would fail when the update binary had a lower runtime version number than the current one. Fixes: `0db89fa243` ("ACPI: Introduce Platform Firmware Runtime Update device driver") Cc: 5.17+ <stable@vger.kernel.org> # 5.17+ Reported-by: "Govindarajulu, Hariganesh" <hariganesh.govindarajulu@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Link: https://patch.msgid.link/20250722143233.3970607-1-yu.c.chen@intel.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-08-20 15:30:36 +02:00
Xiao Ni	c27973211f	md: keep recovery_cp in mdp_superblock_s commit `907a99c314` ("md: rename recovery_cp to resync_offset") replaces recovery_cp with resync_offset in mdp_superblock_s which is in md_p.h. md_p.h is used in userspace too. So mdadm building fails because of this. This patch revert this change. Fixes: `907a99c314` ("md: rename recovery_cp to resync_offset") Signed-off-by: Xiao Ni <xni@redhat.com> Link: https://lore.kernel.org/linux-raid/20250815040028.18085-1-xni@redhat.com Signed-off-by: Yu Kuai <yukuai3@huawei.com>	2025-08-16 08:47:38 +08:00
Jakub Kicinski	f24775c325	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.17-rc2). No conflicts. Adjacent changes: drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c `d7a276a576` ("net: stmmac: rk: convert to suspend()/resume() methods") `de1e963ad0` ("net: stmmac: rk: put the PHY clock on remove") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-14 12:13:00 -07:00
Jakub Kicinski	f22cc6f766	net: ethtool: support including Flow Label in the flow hash for RSS Some modern NICs support including the IPv6 Flow Label in the flow hash for RSS queue selection. This is outside the old "Microsoft spec", but was included in the OCP NIC spec: [ ] RSS include flow label in the hash (configurable) https://www.opencompute.org/w/index.php?title=Core_Offloads#Receive_Side_Scaling RSS Flow Label hashing allows TCP Protective Load Balancing (PLB) to recover from receiver congestion / overload. Rx CPU/queue hotspots are relatively common for data ingest workloads, and so far we had to try to detect the condition at the RPC layer and reopen the connection. PLB lets us change the Flow Label and therefore Rx CPU on RTO, with minimal packet reordering. PLB reaction times are much faster, and can happen at any point in the connection, not just at RPC boundaries. Due to the nature of host processing (relatively long queues, other kernel subsystems masking IRQs for 100s of msecs) the risk of reordering within the host is higher than in the network. But for applications which need it - it is far preferable to potentially persistent overload of subset of queues. It is expected that the hash communicated to the host may change if the Flow Label changes. This may be surprising to some host software, but I don't expect the devices can compute two Toeplitz hashes, one with the Flow Label for queue selection and one without for the rx hash communicated to the host. Besides, changing the hash may potentially help to change the path thru host queues. User can disable NETIF_F_RXHASH if they require a stable flow hash. The name RXH_IP6_FL was chosen based on what we call Flow Label variables in IPv6 processing (fl). I prefer fl_lbl but that appears to be an fbnic-only spelling. We could spell out RXH_IP6_FLOW_LABEL but existing RXH_ defines are a lot more terse. Willem notes [1] that Flow Label is defined as identifying the flow and therefore including both the flow label _and_ the L4 header fields is not generally necessary. But it should not hurt so it's not explicitly prevented if the driver supports hashing on both at the same time. Link: https://lore.kernel.org/68483433b45e2_3cd66f29440@willemb.c.googlers.com.notmuch [1] Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250811234212.580748-2-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-14 11:40:13 +02:00
Linus Torvalds	561c80369d	Merge tag 'tty-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull TTY fix from Greg KH: "Here is a single revert of one of the previous patches that went in the last tty/serial merge that is breaking userspace on some platforms (specifically powerpc, probably a few others.) It accidentially changed the ioctl values of some tty ioctls, which breaks xorg. The revert has been in linux-next all this week with no reported issues" * tag 'tty-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: Revert "tty: vt: use _IO() to define ioctl numbers"	2025-08-09 18:12:23 +03:00
Linus Torvalds	2988dfed8a	Merge tag 'block-6.17-20250808' of git://git.kernel.dk/linux Pull more block updates from Jens Axboe: - MD pull request via Yu: - mddev null-ptr-dereference fix, by Erkun - md-cluster fail to remove the faulty disk regression fix, by Heming - minor cleanup, by Li Nan and Jinchao - mdadm lifetime regression fix reported by syzkaller, by Yu Kuai - MD pull request via Christoph - add support for getting the FDP featuee in fabrics passthru path (Nitesh Shetty) - add capability to connect to an administrative controller (Kamaljit Singh) - fix a leak on sgl setup error (Keith Busch) - initialize discovery subsys after debugfs is initialized (Mohamed Khalfella) - fix various comment typos (Bjorn Helgaas) - remove unneeded semicolons (Jiapeng Chong) - nvmet debugfs ordering issue fix - Fix UAF in the tag_set in zloop - Ensure sbitmap shallow depth covers entire set - Reduce lock roundtrips in io context lookup - Move scheduler tags alloc/free out of elevator and freeze lock, to fix some lockdep found issues - Improve robustness of queue limits checking - Fix a regression with IO priorities, if no io context exists * tag 'block-6.17-20250808' of git://git.kernel.dk/linux: (26 commits) lib/sbitmap: make sbitmap_get_shallow() internal lib/sbitmap: convert shallow_depth from one word to the whole sbitmap nvmet: exit debugfs after discovery subsystem exits block, bfq: Reorder struct bfq_iocq_bfqq_data md: make rdev_addable usable for rcu mode md/raid1: remove struct pool_info and related code md/raid1: change r1conf->r1bio_pool to a pointer type block: ensure discard_granularity is zero when discard is not supported zloop: fix KASAN use-after-free of tag set block: Fix default IO priority if there is no IO context nvme: fix various comment typos nvme-auth: remove unneeded semicolon nvme-pci: fix leak on sgl setup error nvmet: initialize discovery subsys after debugfs is initialized nvme: add capability to connect to an administrative controller nvmet: add support for FDP in fabrics passthru path md: rename recovery_cp to resync_offset md/md-cluster: handle REMOVE message earlier md: fix create on open mddev lifetime regression block: fix potential deadlock while running nr_hw_queue update ...	2025-08-09 08:47:28 +03:00
Linus Torvalds	24bbfb8920	Merge tag 'io_uring-6.17-20250808' of git://git.kernel.dk/linux Pull io_uring fixes from Jens Axboe: - Allow vectorized payloads for send/send-zc - like sendmsg, but without the hassle of a msghdr. - Fix for an integer wrap that should go to stable, spotted by syzbot. Nothing alarming here, as you need to be root to hit this. Nevertheless, it should get fixed. FWIW, kudos to the syzbot crew for having much nicer reproducers now, and with nicely annotated source code as well. This is particularly useful as syzbot uses the raw interface rather than liburing, historically it's been difficult to turn a syzbot reproducer into a meaningful test case. With the recent changes, not true anymore! * tag 'io_uring-6.17-20250808' of git://git.kernel.dk/linux: io_uring/memmap: cast nr_pages to size_t before shifting io_uring/net: Allow to do vectorized send	2025-08-09 08:45:08 +03:00
Linus Torvalds	6e64f45803	Merge tag 'input-for-v6.17-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - updates to several drivers consuming GPIO APIs to use setters returning error codes - an infrastructure allowing to define "overlays" for touchscreens carving out regions implementing buttons and other elements from a bigger sensors and a corresponding update to st1232 driver - an update to AT/PS2 keyboard driver to map F13-F24 by default - Samsung keypad driver got a facelift - evdev input handler will now bind to all devices using EV_SYN event instead of abusing id->driver_info - two new sub-drivers implementing 1A (capacitive buttons) and 21 (forcepad button) functions in Synaptics RMI driver - support for polling mode in Goodix touchscreen driver - support for support for FocalTech FT8716 in edt-ft5x06 driver - support for MT6359 in mtk-pmic-keys driver - removal of pcf50633-input driver since platform it was used on is gone - new definitions for game controller "grip" buttons (BTN_GRIP) and corresponding changes to xpad and hid-steam controller drivers - a new definition for "performance" key tag 'input-for-v6.17-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (38 commits) HID: hid-steam: Use new BTN_GRIP* buttons Input: add keycode for performance mode key Input: max77693 - convert to atomic pwm operation Input: st1232 - add touch-overlay handling dt-bindings: input: touchscreen: st1232: add touch-overlay example Input: touch-overlay - add touchscreen overlay handling dt-bindings: touchscreen: add touch-overlay property Input: atkbd - correctly map F13 - F24 Input: xpad - use new BTN_GRIP* buttons Input: Add and document BTN_GRIP* Input: xpad - change buttons the D-Pad gets mapped as to BTN_DPAD_* Documentation: Fix capitalization of XBox -> Xbox Input: synaptics-rmi4 - add support for F1A dt-bindings: input: syna,rmi4: Document F1A function Input: synaptics-rmi4 - add support for Forcepads (F21) Input: mtk-pmic-keys - add support for MT6359 PMIC keys Input: remove special handling of id->driver_info when matching Input: evdev - switch matching to EV_SYN Input: samsung-keypad - use BIT() and GENMASK() where appropriate Input: samsung-keypad - use per-chip parameters ...	2025-08-07 07:40:01 +03:00
Linus Torvalds	e8214ed59b	Merge tag 'vfio-v6.17-rc1-v2' of https://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Fix imbalance where the no-iommu/cdev device path skips too much on open, failing to increment a reference, but still decrements the reference on close. Add bounds checking to prevent such underflows (Jacob Pan) - Fill missing detach_ioas op for pds_vfio_pci, fixing probe failure when used with IOMMUFD (Brett Creeley) - Split SR-IOV VFs to separate dev_set, avoiding unnecessary serialization between VFs that appear on the same bus (Alex Williamson) - Fix a theoretical integer overflow is the mlx5-vfio-pci variant driver (Artem Sadovnikov) - Implement missing VF token checking support via vfio cdev/IOMMUFD interface (Jason Gunthorpe) - Update QAT vfio-pci variant driver to claim latest VF devices (Małgorzata Mielnik) - Add a cond_resched() call to avoid holding the CPU too long during DMA mapping operations (Keith Busch) * tag 'vfio-v6.17-rc1-v2' of https://github.com/awilliam/linux-vfio: vfio/type1: conditional rescheduling while pinning vfio/qat: add support for intel QAT 6xxx virtual functions vfio/qat: Remove myself from VFIO QAT PCI driver maintainers vfio/pci: Do vf_token checks for VFIO_DEVICE_BIND_IOMMUFD vfio/mlx5: fix possible overflow in tracking max message size vfio/pci: Separate SR-IOV VF dev_set vfio/pds: Fix missing detach_ioas op vfio: Prevent open_count decrement to negative vfio: Fix unbalanced vfio_df_close call in no-iommu mode	2025-08-07 07:32:50 +03:00
Jason Gunthorpe	86624ba3b5	vfio/pci: Do vf_token checks for VFIO_DEVICE_BIND_IOMMUFD This was missed during the initial implementation. The VFIO PCI encodes the vf_token inside the device name when opening the device from the group FD, something like: "0000:04:10.0 vf_token=bd8d9d2b-5a5f-4f5a-a211-f591514ba1f3" This is used to control access to a VF unless there is co-ordination with the owner of the PF. Since we no longer have a device name in the cdev path, pass the token directly through VFIO_DEVICE_BIND_IOMMUFD using an optional field indicated by VFIO_DEVICE_BIND_FLAG_TOKEN. Fixes: `5fcc26969a` ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD") Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-bdd8716e85fe+3978a-vfio_token_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2025-08-05 15:41:14 -06:00
Marcos Alano	89c5214639	Input: add keycode for performance mode key Alienware calls this key "Performance Boost". Dell calls it "G-Mode". The goal is to have a specific keycode to detect when this key is pressed, so userspace can act upon it and do what have to do, usually starting the power profile for performance. Signed-off-by: Marcos Alano <marcoshalano@gmail.com> Link: https://lore.kernel.org/r/20250509193708.2190586-1-marcoshalano@gmail.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>	2025-08-05 13:51:10 -07:00
Dmitry Torokhov	a7bee4e7f7	Merge tag 'ib-mfd-gpio-input-pwm-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into next Merge an immutable branch between MFD, GPIO, Input and PWM to resolve conflicts for the merge window pull request.	2025-08-03 23:28:48 -07:00
Linus Torvalds	e991acf1bc	Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Significant patch series in this pull request: - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets us closer to being able to remove page->mapping - "relayfs: misc changes" (Jason Xing) does some maintenance and minor feature addition work in relayfs - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches us from static preallocation of the kdump crashkernel's working memory over to dynamic allocation. So the difficulty of a-priori estimation of the second kernel's needs is removed and the first kernel obtains extra memory - "generalize panic_print's dump function to be used by other kernel parts" (Feng Tang) implements some consolidation and rationalization of the various ways in which a failing kernel splats information at the operator * tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits) tools/getdelays: add backward compatibility for taskstats version kho: add test for kexec handover delaytop: enhance error logging and add PSI feature description samples: Kconfig: fix spelling mistake "instancess" -> "instances" fat: fix too many log in fat_chain_add() scripts/spelling.txt: add notifer\|\|notifier to spelling.txt xen/xenbus: fix typo "notifer" net: mvneta: fix typo "notifer" drm/xe: fix typo "notifer" cxl: mce: fix typo "notifer" KVM: x86: fix typo "notifer" MAINTAINERS: add maintainers for delaytop ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below() ucount: fix atomic_long_inc_below() argument type kexec: enable CMA based contiguous allocation stackdepot: make max number of pools boot-time configurable lib/xxhash: remove unused functions init/Kconfig: restore CONFIG_BROKEN help text lib/raid6: update recov_rvv.c zero page usage docs: update docs after introducing delaytop ...	2025-08-03 16:23:09 -07:00
Alexander Graf	07d2490297	kexec: enable CMA based contiguous allocation When booting a new kernel with kexec_file, the kernel picks a target location that the kernel should live at, then allocates random pages, checks whether any of those patches magically happens to coincide with a target address range and if so, uses them for that range. For every page allocated this way, it then creates a page list that the relocation code - code that executes while all CPUs are off and we are just about to jump into the new kernel - copies to their final memory location. We can not put them there before, because chances are pretty good that at least some page in the target range is already in use by the currently running Linux environment. Copying is happening from a single CPU at RAM rate, which takes around 4-50 ms per 100 MiB. All of this is inefficient and error prone. To successfully kexec, we need to quiesce all devices of the outgoing kernel so they don't scribble over the new kernel's memory. We have seen cases where that does not happen properly (cough GIC cough) and hence the new kernel was corrupted. This started a month long journey to root cause failing kexecs to eventually see memory corruption, because the new kernel was corrupted severely enough that it could not emit output to tell us about the fact that it was corrupted. By allocating memory for the next kernel from a memory range that is guaranteed scribbling free, we can boot the next kernel up to a point where it is at least able to detect corruption and maybe even stop it before it becomes severe. This increases the chance for successful kexecs. Since kexec got introduced, Linux has gained the CMA framework which can perform physically contiguous memory mappings, while keeping that memory available for movable memory when it is not needed for contiguous allocations. The default CMA allocator is for DMA allocations. This patch adds logic to the kexec file loader to attempt to place the target payload at a location allocated from CMA. If successful, it uses that memory range directly instead of creating copy instructions during the hot phase. To ensure that there is a safety net in case anything goes wrong with the CMA allocation, it also adds a flag for user space to force disable CMA allocations. Using CMA allocations has two advantages: 1) Faster by 4-50 ms per 100 MiB. There is no more need to copy in the hot phase. 2) More robust. Even if by accident some page is still in use for DMA, the new kernel image will be safe from that access because it resides in a memory region that is considered allocated in the old kernel and has a chance to reinitialize that component. Link: https://lkml.kernel.org/r/20250610085327.51817-1-graf@amazon.com Signed-off-by: Alexander Graf <graf@amazon.com> Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Zhongkun He <hezhongkun.hzk@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:01:38 -07:00
Linus Torvalds	821c9e515d	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: - vhost can now support legacy threading if enabled in Kconfig - vsock memory allocation strategies for large buffers have been improved, reducing pressure on kmalloc - vhost now supports the in-order feature. guest bits missed the merge window. - fixes, cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (30 commits) vsock/virtio: Allocate nonlinear SKBs for handling large transmit buffers vsock/virtio: Rename virtio_vsock_skb_rx_put() vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers vsock/virtio: Move SKB allocation lower-bound check to callers vsock/virtio: Rename virtio_vsock_alloc_skb() vsock/virtio: Resize receive buffers so that each SKB fits in a 4K page vsock/virtio: Move length check to callers of virtio_vsock_skb_rx_put() vsock/virtio: Validate length in packet header before skb_put() vhost/vsock: Avoid allocating arbitrarily-sized SKBs vhost_net: basic in_order support vhost: basic in order support vhost: fail early when __vhost_add_used() fails vhost: Reintroduce kthread API and add mode selection vdpa: Fix IDR memory leak in VDUSE module exit vdpa/mlx5: Fix release of uninitialized resources on error path vhost-scsi: Fix check for inline_sg_cnt exceeding preallocated limit virtio: virtio_dma_buf: fix missing parameter documentation vhost: Fix typos vhost: vringh: Remove unused functions vhost: vringh: Remove unused iotlb functions ...	2025-08-01 14:17:48 -07:00
Linus Torvalds	0bd0a41a51	Merge tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Allow built-in drivers, not just modular drivers, to use async initial probing (Lukas Wunner) - Support Immediate Readiness even on devices with no PM Capability (Sean Christopherson) - Consolidate definition of PCIE_RESET_CONFIG_WAIT_MS (100ms), the required delay between a reset and sending config requests to a device (Niklas Cassel) - Add pci_is_display() to check for "Display" base class and use it in ALSA hda, vfio, vga_switcheroo, vt-d (Mario Limonciello) - Allow 'isolated PCI functions' (multi-function devices without a function 0) for LoongArch, similar to s390 and jailhouse (Huacai Chen) Power control: - Add ability to enable optional slot clock for cases where the PCIe host controller and the slot are supplied by different clocks (Marek Vasut) PCIe native device hotplug: - Fix runtime PM ref imbalance on Hot-Plug Capable ports caused by misinterpreting a config read failure after a device has been removed (Lukas Wunner) - Avoid creating a useless PCIe port service device for pciehp if the slot is handled by the ACPI hotplug driver (Lukas Wunner) - Ignore ACPI hotplug slots when calculating depth of pciehp hotplug ports (Lukas Wunner) Virtualization: - Save VF resizable BAR state and restore it after reset (Michał Winiarski) - Allow IOV resources (VF BARs) to be resized (Michał Winiarski) - Add pci_iov_vf_bar_set_size() so drivers can control VF BAR size (Michał Winiarski) Endpoint framework: - Add RC-to-EP doorbell support using platform MSI controller, including a test case (Frank Li) - Allow BAR assignment via configfs so platforms have flexibility in determining BAR usage (Jerome Brunet) Native PCIe controller drivers: - Convert amazon,al-alpine-v[23]-pcie, apm,xgene-pcie, axis,artpec6-pcie, marvell,armada-3700-pcie, st,spear1340-pcie to DT schema format (Rob Herring) - Use dev_fwnode() instead of of_fwnode_handle() to remove OF dependency in altera (fixes an unused variable), designware-host, mediatek, mediatek-gen3, mobiveil, plda, xilinx, xilinx-dma, xilinx-nwl (Jiri Slaby, Arnd Bergmann) - Convert aardvark, altera, brcmstb, designware-host, iproc, mediatek, mediatek-gen3, mobiveil, plda, rcar-host, vmd, xilinx, xilinx-dma, xilinx-nwl from using pci_msi_create_irq_domain() to using msi_create_parent_irq_domain() instead; this makes the interrupt controller per-PCI device, allows dynamic allocation of vectors after initialization, and allows support of IMS (Nam Cao) APM X-Gene PCIe controller driver: - Rewrite MSI handling to MSI CPU affinity, drop useless CPU hotplug bits, use device-managed memory allocations, and clean things up (Marc Zyngier) - Probe xgene-msi as a standard platform driver rather than a subsys_initcall (Marc Zyngier) Broadcom STB PCIe controller driver: - Add optional DT 'num-lanes' property and if present, use it to override the Maximum Link Width advertised in Link Capabilities (Jim Quinlan) Cadence PCIe controller driver: - Use PCIe Message routing types from the PCI core rather than defining private ones (Hans Zhang) Freescale i.MX6 PCIe controller driver: - Add IMX8MQ_EP third 64-bit BAR in epc_features (Richard Zhu) - Add IMX8MM_EP and IMX8MP_EP fixed 256-byte BAR 4 in epc_features (Richard Zhu) - Configure LUT for MSI/IOMMU in Endpoint mode so Root Complex can trigger doorbel on Endpoint (Frank Li) - Remove apps_reset (LTSSM_EN) from imx_pcie_{assert,deassert}_core_reset(), which fixes a hotplug regression on i.MX8MM (Richard Zhu) - Delay Endpoint link start until configfs 'start' written (Richard Zhu) Intel VMD host bridge driver: - Add Intel Panther Lake (PTL)-H/P/U Vendor ID (George D Sworo) Qualcomm PCIe controller driver: - Add DT binding and driver support for SA8255p, which supports ECAM for Configuration Space access (Mayank Rana) - Update DT binding and driver to describe PHYs and per-Root Port resets in a Root Port stanza and deprecate describing them in the host bridge; this makes it possible to support multiple Root Ports in the future (Krishna Chaitanya Chundru) - Add Qualcomm QCS615 to SM8150 DT binding (Ziyue Zhang) - Add Qualcomm QCS8300 to SA8775p DT binding (Ziyue Zhang) - Drop TBU and ref clocks from Qualcomm SM8150 and SC8180x DT bindings (Konrad Dybcio) - Document 'link_down' reset in Qualcomm SA8775P DT binding (Ziyue Zhang) - Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ (Niklas Cassel) Rockchip PCIe controller driver: - Drop unused PCIe Message routing and code definitions (Hans Zhang) - Remove several unused header includes (Hans Zhang) - Use standard PCIe config register definitions instead of rockchip-specific redefinitions (Geraldo Nascimento) - Set Target Link Speed to 5.0 GT/s before retraining so we have a chance to train at a higher speed (Geraldo Nascimento) Rockchip DesignWare PCIe controller driver: - Prevent race between link training and register update via DBI by inhibiting link training after hot reset and link down (Wilfred Mallawa) - Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ (Niklas Cassel) Sophgo PCIe controller driver: - Add DT binding and driver for Sophgo SG2044 PCIe controller driver in Root Complex mode (Inochi Amaoto) Synopsys DesignWare PCIe controller driver: - Add required PCIE_RESET_CONFIG_WAIT_MS after waiting for Link up on Ports that support > 5.0 GT/s. Slower Ports still rely on the not-quite-correct PCIE_LINK_WAIT_SLEEP_MS 90ms default delay while waiting for the Link (Niklas Cassel)" * tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (116 commits) dt-bindings: PCI: qcom,pcie-sa8775p: Document 'link_down' reset dt-bindings: PCI: Remove 83xx-512x-pci.txt dt-bindings: PCI: Convert amazon,al-alpine-v[23]-pcie to DT schema dt-bindings: PCI: Convert marvell,armada-3700-pcie to DT schema dt-bindings: PCI: Convert apm,xgene-pcie to DT schema dt-bindings: PCI: Convert axis,artpec6-pcie to DT schema dt-bindings: PCI: Convert st,spear1340-pcie to DT schema PCI: Move is_pciehp check out of pciehp_is_native() PCI: pciehp: Use is_pciehp instead of is_hotplug_bridge PCI/portdrv: Use is_pciehp instead of is_hotplug_bridge PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports selftests: pci_endpoint: Add doorbell test case misc: pci_endpoint_test: Add doorbell test case PCI: endpoint: pci-epf-test: Add doorbell test support PCI: endpoint: Add pci_epf_align_inbound_addr() helper for inbound address alignment PCI: endpoint: pci-ep-msi: Add checks for MSI parent and mutability PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller PCI: dwc: Add Sophgo SG2044 PCIe controller driver in Root Complex mode PCI: vmd: Switch to msi_create_parent_irq_domain() PCI: vmd: Convert to lock guards ...	2025-08-01 13:59:07 -07:00
Cindy Lu	7d9896e9f6	vhost: Reintroduce kthread API and add mode selection Since commit `6e890c5d50` ("vhost: use vhost_tasks for worker threads"), the vhost uses vhost_task and operates as a child of the owner thread. This is required for correct CPU usage accounting, especially when using containers. However, this change has caused confusion for some legacy userspace applications, and we didn't notice until it's too late. Unfortunately, it's too late to revert - we now have userspace depending both on old and new behaviour :( To address the issue, reintroduce kthread mode for vhost workers and provide a configuration to select between kthread and task worker. - Add 'fork_owner' parameter to vhost_dev to let users select kthread or task mode. Default mode is task mode(VHOST_FORK_OWNER_TASK). - Reintroduce kthread mode support: * Bring back the original vhost_worker() implementation, and renamed to vhost_run_work_kthread_list(). * Add cgroup support for the kthread * Introduce struct vhost_worker_ops: - Encapsulates create / stop / wake‑up callbacks. - vhost_worker_create() selects the proper ops according to inherit_owner. - Userspace configuration interface: * New IOCTLs: - VHOST_SET_FORK_FROM_OWNER lets userspace select task mode (VHOST_FORK_OWNER_TASK) or kthread mode (VHOST_FORK_OWNER_KTHREAD) - VHOST_GET_FORK_FROM_OWNER reads the current worker mode * Expose module parameter 'fork_from_owner_default' to allow system administrators to configure the default mode for vhost workers * Kconfig option CONFIG_VHOST_ENABLE_FORK_OWNER_CONTROL controls whether these IOCTLs and the parameter are available - The VHOST_NEW_WORKER functionality requires fork_owner to be set to true, with validation added to ensure proper configuration This partially reverts or improves upon: commit `6e890c5d50` ("vhost: use vhost_tasks for worker threads") commit `1cdaafa1b8` ("vhost: replace single worker pointer with xarray") Fixes: `6e890c5d50` ("vhost: use vhost_tasks for worker threads"), Signed-off-by: Cindy Lu <lulu@redhat.com> Message-Id: <20250714071333.59794-2-lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>	2025-08-01 09:11:09 -04:00
Jiri Slaby (SUSE)	55a984928b	Revert "tty: vt: use _IO() to define ioctl numbers" This reverts commit `f1180ca37a`. Since the commit, the vt ioctl numbers are defined differently on platforms where _IOC_NONE is non-zero: alpha, mips, powerpc, sparc. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/all/436489B9-E67B-4630-909F-386C30A2AAC9@xenosoft.de/ Link: https://lore.kernel.org/all/97ec2636-915a-498c-903b-d66957420d21@csgroup.eu/ Cc: Nicolas Pitre <nico@fluxnic.net> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20250801082613.2564584-1-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-08-01 10:42:22 +02:00
Bjorn Helgaas	63e6f0df6a	Merge branch 'pci/endpoint/doorbell' - Add RC-to-EP doorbell support using platform MSI controller (Frank Li) - Check for MSI parent and mutability since we currently don't support mutable MSI controllers (Frank Li) - Add pci_epf_align_inbound_addr() helper (Frank Li) - Add a doorbell test (Frank Li) * pci/endpoint/doorbell: selftests: pci_endpoint: Add doorbell test case misc: pci_endpoint_test: Add doorbell test case PCI: endpoint: pci-epf-test: Add doorbell test support PCI: endpoint: Add pci_epf_align_inbound_addr() helper for inbound address alignment PCI: endpoint: pci-ep-msi: Add checks for MSI parent and mutability PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller	2025-07-31 16:11:45 -05:00
Linus Torvalds	0cdee263bc	Merge tag 'media/v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - v4l2 core: - sub-device framework routing improvements - NV12M tiled variants added to v4l2_format_info - some fixes at control handler freeing logic - fixed H264 SEPARATE_COLOUR_PLANE check - new staging driver: Intel IPU7 PCI - Rockchip video decoder driver got promoted from staging - iris: added HEVC/VP9 encoder/decoder support - vsp1: driver has gained Renesas VSPX support - uvc: - switched to vb2 ioctl helpers - added MSXU 1.5 metadata support - atomisp: GC0310 sensor driver cleanups in preparation for moving it out of staging - Lots of cleanup, fixes and improvements * tag 'media/v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (310 commits) media: rkvdec: Unstage the driver media: rkvdec: Remove TODO file media: dt-bindings: rockchip: Add RK3576 Video Decoder bindings media: dt-bindings: rockchip: Document RK3588 Video Decoder bindings media: amphion: Support dmabuf and v4l2 buffer without binding media: verisilicon: postproc: 4K support media: v4l2: Add support for NV12M tiled variants to v4l2_format_info() media: uvcvideo: Use a count variable for meta_formats instead of 0 terminating media: uvcvideo: Auto-set UVC_QUIRK_MSXU_META media: uvcvideo: Introduce V4L2_META_FMT_UVC_MSXU_1_5 media: uvcvideo: Introduce dev->meta_formats media: Documentation: Add note about UVCH length field media: uvcvideo: Do not mark valid metadata as invalid media: uvcvideo: uvc_v4l2_unlocked_ioctl: Invert PM logic media: core: export v4l2_translate_cmd media: uvcvideo: Turn on the camera if V4L2_EVENT_SUB_FL_SEND_INITIAL media: uvcvideo: Remove stream->is_streaming field media: uvcvideo: Split uvc_stop_streaming() media: uvcvideo: Handle locks in uvc_queue_return_buffers media: uvcvideo: Use vb2 ioctl and fop helpers ...	2025-07-31 13:16:09 -07:00
Linus Torvalds	c93529ad4f	Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "This broadly brings the assigned HW command queue support to iommufd. This feature is used to improve SVA performance in VMs by avoiding paravirtualization traps during SVA invalidations. Along the way I think some of the core logic is in a much better state to support future driver backed features. Summary: - IOMMU HW now has features to directly assign HW command queues to a guest VM. In this mode the command queue operates on a limited set of invalidation commands that are suitable for improving guest invalidation performance and easy for the HW to virtualize. This brings the generic infrastructure to allow IOMMU drivers to expose such command queues through the iommufd uAPI, mmap the doorbell pages, and get the guest physical range for the command queue ring itself. - An implementation for the NVIDIA SMMUv3 extension "cmdqv" is built on the new iommufd command queue features. It works with the existing SMMU driver support for cmdqv in guest VMs. - Many precursor cleanups and improvements to support the above cleanly, changes to the general ioctl and object helpers, driver support for VDEVICE, and mmap pgoff cookie infrastructure. - Sequence VDEVICE destruction to always happen before VFIO device destruction. When using the above type features, and also in future confidential compute, the internal virtual device representation becomes linked to HW or CC TSM configuration and objects. If a VFIO device is removed from iommufd those HW objects should also be cleaned up to prevent a sort of UAF. This became important now that we have HW backing the VDEVICE. - Fix one syzkaller found error related to math overflows during iova allocation" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (57 commits) iommu/arm-smmu-v3: Replace vsmmu_size/type with get_viommu_size iommu/arm-smmu-v3: Do not bother impl_ops if IOMMU_VIOMMU_TYPE_ARM_SMMUV3 iommufd: Rename some shortterm-related identifiers iommufd/selftest: Add coverage for vdevice tombstone iommufd/selftest: Explicitly skip tests for inapplicable variant iommufd/vdevice: Remove struct device reference from struct vdevice iommufd: Destroy vdevice on idevice destroy iommufd: Add a pre_destroy() op for objects iommufd: Add iommufd_object_tombstone_user() helper iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice iommufd/selftest: Test reserved regions near ULONG_MAX iommufd: Prevent ALIGN() overflow iommu/tegra241-cmdqv: import IOMMUFD module namespace iommufd: Do not allow _iommufd_object_alloc_ucmd if abort op is set iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support iommu/tegra241-cmdqv: Add user-space use support iommu/tegra241-cmdqv: Do not statically map LVCMDQs iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf() iommu/tegra241-cmdqv: Use request_threaded_irq iommu/arm-smmu-v3-iommufd: Add hw_info to impl_ops ...	2025-07-31 12:43:08 -07:00
Linus Torvalds	7ce4de1cda	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: - Various minor code cleanups and fixes for hns, iser, cxgb4, hfi1, rxe, erdma, mana_ib - Prefetch supprot for rxe ODP - Remove memory window support from hns as new device FW is no longer support it - Remove qib, it is very old and obsolete now, Cornelis wishes to restructure the hfi1/qib shared layer - Fix a race in destroying CQs where we can still end up with work running because the work is cancled before the driver stops triggering it - Improve interaction with namespaces: * Follow the devlink namespace for newly spawned RDMA devices * Create iopoib net devces in the parent IB device's namespace * Allow CAP_NET_RAW checks to pass in user namespaces - A new flow control scheme for IB MADs to try and avoid queue overflows in the network - Fix 2G message sizes in bnxt_re - Optimize mkey layout for mlx5 DMABUF - New "DMA Handle" concept to allow controlling PCI TPH and steering tags * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits) RDMA/siw: Change maintainer email address RDMA/mana_ib: add support of multiple ports RDMA/mlx5: Refactor optional counters steering code RDMA/mlx5: Add DMAH support for reg_user_mr/reg_user_dmabuf_mr IB: Extend UVERBS_METHOD_REG_MR to get DMAH RDMA/mlx5: Add DMAH object support RDMA/core: Introduce a DMAH object and its alloc/free APIs IB/core: Add UVERBS_METHOD_REG_MR on the MR object net/mlx5: Add support for device steering tag net/mlx5: Expose IFC bits for TPH PCI/TPH: Expose pcie_tph_get_st_table_size() RDMA/mlx5: Fix incorrect MKEY masking RDMA/mlx5: Fix returned type from _mlx5r_umr_zap_mkey() RDMA/mlx5: remove redundant check on err on return expression RDMA/mana_ib: add additional port counters RDMA/mana_ib: Fix DSCP value in modify QP RDMA/efa: Add CQ with external memory support RDMA/core: Add umem "is_contiguous" and "start_dma_addr" helpers RDMA/uverbs: Add a common way to create CQ with umem RDMA/mlx5: Optimize DMABUF mkey page size ...	2025-07-31 12:19:55 -07:00

1 2 3 4 5 ...

15290 Commits