linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-03-24 00:16:55 +08:00

Author	SHA1	Message	Date
Eric Dumazet	3d3f075e80	ipv6: use np->final in inet6_sk_rebuild_header() Instead of using an automatic variable, use np->final to get rid of the stack canary in inet6_sk_rebuild_header(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:49 -08:00
Eric Dumazet	03ff0cb1a9	ipv6: add daddr/final storage in struct ipv6_pinfo After commit `b409a7f717` ("ipv6: colocate inet6_cork in inet_cork_full") we have room in ipv6_pinfo to hold daddr/final in case they need to be populated in fl6_update_dst() calls. This will allow stack canary removal in IPv6 tx fast paths. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:49 -08:00
Jakub Kicinski	792aaea994	Merge tag 'nf-next-26-02-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter: updates for net-next The following patchset contains Netfilter updates for net-next: 1) Fix net-next-only use-after-free bug in nf_tables rbtree set: Expired elements cannot be released right away after unlink anymore because there is no guarantee that the binary-search blob is going to be updated. Spotted by syzkaller. 2) Fix esoteric bug in nf_queue with udp fraglist gro, broken since 6.11. Patch 3 adds extends the nfqueue selftest for this. 4) Use dedicated slab for flowtable entries, currently the -512 cache is used, which is wasteful. From Qingfang Deng. 5) Recent net-next update extended existing test for ip6ip6 tunnels, add the required /config entry. Test still passed by accident because the previous tests network setup gets re-used, so also update the test so it will fail in case the ip6ip6 tunnel interface cannot be added. 6) Fix 'nft get element mytable myset { 1.2.3.4 }' on big endian platforms, this was broken since code was added in v5.1. 7) Fix nf_tables counter reset support on 32bit platforms, where counter reset may cause huge values to appear due to wraparound. Broken since reset feature was added in v6.11. From Anders Grahn. 8-11) update nf_tables rbtree set type to detect partial operlaps. This will eventually speed up nftables userspace: at this time userspace does a netlink dump of the set content which slows down incremental updates on interval sets. From Pablo Neira Ayuso. * tag 'nf-next-26-02-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nft_set_rbtree: validate open interval overlap netfilter: nft_set_rbtree: validate element belonging to interval netfilter: nft_set_rbtree: check for partial overlaps in anonymous sets netfilter: nft_set_rbtree: fix bogus EEXIST with NLM_F_CREATE with null interval netfilter: nft_counter: fix reset of counters on 32bit archs netfilter: nft_set_hash: fix get operation on big endian selftests: netfilter: add IPV6_TUNNEL to config netfilter: flowtable: dedicated slab for flow entry selftests: netfilter: nft_queue.sh: add udp fraglist gro test case netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation netfilter: nft_set_rbtree: don't gc elements on insert ==================== Link: https://patch.msgid.link/20260206153048.17570-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:25:38 -08:00
Russell King (Oracle)	3a46873661	net: stmmac: qcom-ethqos: fix qcom_ethqos_serdes_powerup() Add cleanup for failure paths in qcom_ethqos_serdes_powerup(). This was missing calling phy_exit() and phy_power_off() at appropriate failure points. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1voPUH-000000083ji-25FH@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:22:05 -08:00
Paolo Abeni	dc010e1b4b	xfrm: reduce struct sec_path size The mentioned struct has an hole and uses unnecessary wide type to store MAC length and indexes of very small arrays. It's also embedded into the skb_extensions, and the latter, due to recent CAN changes, may exceeds the 192 bytes mark (3 cachelines on x86_64 arch) on some reasonable configurations. Reordering and the sec_path fields, shrinking xfrm_offload.orig_mac_len to 16 bits and xfrm_offload.{len,olen,verified_cnt} to u8, we can save 16 bytes and keep skb_extensions size under control. Before: struct sec_path { int len; int olen; int verified_cnt; /* XXX 4 bytes hole, try to pack /$ struct xfrm_state xvec[6]; struct xfrm_offload ovec[1]; /* size: 88, cachelines: 2, members: 5 / / sum members: 84, holes: 1, sum holes: 4 / / last cacheline: 24 bytes / }; After: struct sec_path { struct xfrm_state xvec[6]; struct xfrm_offload ovec[1]; /* typedef u8 -> __u8 / unsigned char len; / typedef u8 -> __u8 / unsigned char olen; / typedef u8 -> __u8 / unsigned char verified_cnt; / size: 72, cachelines: 2, members: 5 / / padding: 1 / / last cacheline: 8 bytes */ }; Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Reviewed-by: Steffen Klassert <steffen.klassert@secunet.com> Link: https://patch.msgid.link/83846bd2e3fa08899bd0162e41bfadfec95e82ef.1770398071.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:21:48 -08:00
Jakub Kicinski	e72d4c537f	Merge branch 'bnxt_en-add-rss-context-resource-check' Michael Chan says: ==================== bnxt_en: Add RSS context resource check Add missing logic to check that we have enough RSS contexts. This will make the recent change to increase the use of RSS contexts for a larger RSS indirection table more complete. ==================== Link: https://patch.msgid.link/20260207235118.1987301-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:17:59 -08:00
Michael Chan	b9355ad52b	bnxt_en: Check RSS contexts in bnxt_need_reserve_rings() bnxt_need_reserve_rings() checks all resources except HW RSS contexts to determine if a new reservation is required. For completeness, add the check for HW RSS contexts. This makes the code more complete after the recent commit to increase the number of RSS contexts for a larger RSS indirection table: Fixes: `51b9d3f948` ("bnxt_en: Use a larger RSS indirection table on P5_PLUS chips") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260207235118.1987301-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:17:42 -08:00
Michael Chan	5a2f3aa289	bnxt_en: Refactor bnxt_need_reserve_rings() bnxt_need_reserve_rings() checks 6 ring resources against the reserved values to determine if a new reservation is needed. Factor out the code to collect the total resources into a new helper function bnxt_get_total_resources() to make the code cleaner and easier to read. Instead of individual scalar variables, use the struct bnxt_hw_rings to hold all the ring resources. Using the struct, hwr.cp replaces the nq variable and the chip specific hwr.cp_p5 replaces cp on newer chips. There is no change in behavior. This will make it easier to check the RSS context resource in the next patch. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260207235118.1987301-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:17:36 -08:00
Geliang Tang	e5e2e43002	mptcp: allow overridden write_space to be invoked Future extensions with psock will override their own sk->sk_write_space callback. This patch ensures that the overridden sk_write_space can be invoked by MPTCP. INDIRECT_CALL is used to keep the default path optimised. Note that sk->sk_write_space was never called directly with MPTCP sockets, so changing it to sk_stream_write_space in the init, and using it from mptcp_write_space() is not supposed to change the current behaviour. This patch is shared early to ease discussions around future RFC and avoid confusions with this "fix" that is needed for different future extensions. Suggested-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Gang Yan <yangang@kylinos.cn> Signed-off-by: Gang Yan <yangang@kylinos.cn> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260206-net-next-mptcp-write_space-override-v2-1-e0b12be818c6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 19:54:21 -08:00
Jakub Kicinski	f2c7fdebf0	Merge branch 'net-netconsole-convert-to-nbcon-console-infrastructure' Breno Leitao says: ==================== net: netconsole: convert to NBCON console infrastructure This series adds support for the nbcon (new buffer console) infrastructure to netconsole, enabling lock-free, priority-based console operations that are safer in crash scenarios. The implementation is introduced in three steps: 0) Extend printk to expose CPU and taskname (task->comm) where the printk originated from. (Thanks John and Petr for the support in getting this done) 1) Refactor the message fragmentation logic into a reusable helper function 2) Extend nbcon support to non-extended (basic) consoles using the same infrastructure. The initial discussion about it appeared a while ago in [1], in order to solve Mike's HARDIRQ-safe -> HARDIRQ-unsafe lock order warning, and the root cause is that some hosts were calling IRQ unsafe locks from inside console lock. At that time, we didn't have the CON_NBCON_ATOMIC_UNSAFE yet. John kindly implemented CON_NBCON_ATOMIC_UNSAFE in `187de7c212` ("printk: nbcon: Allow unsafe write_atomic() for panic"), and now we can implement netconsole on top of nbcon. Important to note that netconsole continues to call netpoll and the network TX helpers with interrupt disable, given the TX are called with target_list_lock. ==================== Link: https://patch.msgid.link/20260206-nbcon-v7-0-62bda69b1b41@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 19:52:00 -08:00
Breno Leitao	79ba362b43	netconsole: Use printk context for CPU and task information Use the CPU and task name captured at printk() time from nbcon_write_context instead of querying the current execution context. This provides accurate information about where the message originated, rather than where netconsole happens to be running. For CPU, use wctxt->cpu instead of raw_smp_processor_id(). For taskname, use wctxt->comm directly which contains the task name captured at printk time. This change ensures netconsole outputs reflect the actual context that generated the log message, which is especially important when the console driver runs asynchronously in a dedicated thread. Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260206-nbcon-v7-4-62bda69b1b41@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 19:51:57 -08:00
Breno Leitao	7eab73b186	netconsole: convert to NBCON console infrastructure Convert netconsole from the legacy console API to the NBCON framework. NBCON provides threaded printing which unblocks printk()s and flushes in a thread, decoupling network TX from printk() when netconsole is in use. Since netconsole relies on the network stack which cannot safely operate from all atomic contexts, mark both consoles with CON_NBCON_ATOMIC_UNSAFE. (See discussion in [1]) CON_NBCON_ATOMIC_UNSAFE restricts write_atomic() usage to emergency scenarios (panic) where regular messages are sent in threaded mode. Implementation changes: - Unify write_ext_msg() and write_msg() into netconsole_write() - Add device_lock/device_unlock callbacks to manage target_list_lock - Use nbcon_enter_unsafe()/nbcon_exit_unsafe() around network operations. - If nbcon_enter_unsafe() fails, just return given netconsole lost the ownership of the console. - Set write_thread and write_atomic callbacks (both use same function) Link: https://lore.kernel.org/all/b2qps3uywhmjaym4mht2wpxul4yqtuuayeoq4iv4k3zf5wdgh3@tocu6c7mj4lt/ [1] Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260206-nbcon-v7-3-62bda69b1b41@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 19:51:57 -08:00
Breno Leitao	eaf35bc63b	netconsole: extract message fragmentation into send_msg_udp() Extract the message fragmentation logic from write_msg() into a dedicated send_msg_udp() function. This improves code readability and prepares for future enhancements. The new send_msg_udp() function handles splitting messages that exceed MAX_PRINT_CHUNK into smaller fragments and sending them sequentially. This function is placed before send_ext_msg_udp() to maintain a logical ordering of related functions. No functional changes - this is purely a refactoring commit. Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260206-nbcon-v7-2-62bda69b1b41@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 19:51:56 -08:00
Breno Leitao	60325c27d3	printk: Add execution context (task name/CPU) to printk_info Extend struct printk_info to include the task name, pid, and CPU number where printk messages originate. This information is captured at vprintk_store() time and propagated through printk_message to nbcon_write_context, making it available to nbcon console drivers. This is useful for consoles like netconsole that want to include execution context in their output, allowing correlation of messages with specific tasks and CPUs regardless of where the console driver actually runs. The feature is controlled by CONFIG_PRINTK_EXECUTION_CTX, which is automatically selected by CONFIG_NETCONSOLE_DYNAMIC. When disabled, the helper functions compile to no-ops with no overhead. Suggested-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20260206-nbcon-v7-1-62bda69b1b41@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 19:51:56 -08:00
Simon Horman	ad1f18e985	net/mlx5e: remove declarations of mlx5e_shampo_{fill_umr,dealloc_hd} These functions were recently removed by commit `24cf78c738` ("net/mlx5e: SHAMPO, Switch to header memcpy"), however, their declarations were left behind. This patch removes those declarations. Flagged by review-prompts while I was exercising Orc mode locally. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260206-shampo-v1-1-75b20c6657e5@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 15:49:47 +01:00
Vladimir Oltean	c22ba07c82	net: dsa: eliminate local type for tc policers David Yang is saying that struct flow_action_entry in include/net/flow_offload.h has gained new fields and DSA's struct dsa_mall_policer_tc_entry, derived from that, isn't keeping up. This structure is passed to drivers and they are completely oblivious to the values of fields they don't see. This has happened before, and almost always the solution was to make the DSA layer thinner and use the upstream data structures. Here, the reason why we didn't do that is because struct flow_action_entry :: police is an anonymous structure. That is easily enough fixable, just name those fields "struct flow_action_police" and reference them from DSA. Make the according transformations to the two users (sja1105 and felix): "rate_bytes_per_sec" -> "rate_bytes_ps". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Co-developed-by: David Yang <mmyangfl@gmail.com> Signed-off-by: David Yang <mmyangfl@gmail.com> Link: https://patch.msgid.link/20260206075427.44733-1-mmyangfl@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 15:30:11 +01:00
Paolo Abeni	86dbebfb90	Merge branch 'net-ftgmac100-various-probe-cleanups' Jacky Chou says: ==================== net: ftgmac100: Various probe cleanups The probe function of the ftgmac100 is rather complex, due to the way it has evolved over time, dealing with poor DT descriptions, and new variants of the MAC. Make use of DT match data to identify the MAC variant, rather than looking at the compatible string all the time. Make use of devm_ calls to simplify cleanup. This indirectly fixes inconsistent goto label names. Always probe the MDIO bus, when it exists. This simplifies the logic a bit. Move code into helpers to simply probe. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> ==================== Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-0-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:53 +01:00
Jacky Chou	7ac5dddef5	net: ftgmac100: Use devm_mdiobus_alloc/devm_of_mdiobus_register Make use of devm_ methods to allocate and register mdiobus to simplify cleanup. Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-15-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:51 +01:00
Andrew Lunn	201dddf688	net: ftgmac100: Fix wrong netif_napi_del in release netif_napi_add() is called in open. There is a symmetric call to netif_napi_del() in stop. Remove to wrong call to netif_napi_del() in release. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-14-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:51 +01:00
Andrew Lunn	96b4887a71	net: ftgmac100: Simplify condition on HW arbitration The MAC ID is sufficient to indicate this is a ast2600. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-13-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:50 +01:00
Andrew Lunn	0855b43d82	net: ftgmac100: Remove redundant PHY_POLL When an MDIO bus is allocated, the irqs for each PHY are set to polling. Remove the redundant code in the MAC driver which does the same. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-12-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:50 +01:00
Andrew Lunn	20248a719d	net: ftgmac100: Move DT probe into a helper By moving all the DT probe code into a helper, the complex if else if else structure can be simplified. No functional change intended. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-11-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:50 +01:00
Andrew Lunn	3e523741aa	net: ftgmac100: Simplify legacy MDIO setup There are old device trees which place the PHY nodes directly in the MAC nodes, rather than within an MDIO container node. The probe logic indicates that the use of NCSI and the legacy placement of PHYs is mutually exclusive. Hence priv->use_ncsi cannot be true, so there is no reason to set it false. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-10-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:50 +01:00
Andrew Lunn	efeb214411	net: ftgmac100: Always register the MDIO bus when it exists Both the Aspeed 2400 and 2500 and the original faraday version of the MAC have MDIO bus controllers as part of the MAC. Since it exists, always registering it makes the code simpler, and causes no harm. If there is no mdio node in device tree, of_mdiobus_register() will fall back to mdiobus_register(), making it safe. AST2600 uses an external MDIO controller and does not have an embedded MDIO bus in the MAC. For such configurations, the legacy MII probe path must not be entered without a registered mii_bus. Add an explicit check to fail gracefully when no MDIO bus is present, preventing a NULL pointer dereference while keeping the intended behavior for platforms without embedded MDIO. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-9-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:50 +01:00
Andrew Lunn	7535d70ba0	net: ftgmac100: Move NCSI probe code into a helper To help reduce the complexity of the probe function, move the NCSI probe code into a helper. The refactoring results in improved cleanup of the fixed PHY in error paths. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-8-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:50 +01:00
Andrew Lunn	d1d8392883	net: ftgmac100: Simplify error handling for ftgmac100_initial_mac ftgmac100_initial_mac() does not allocate any resources. All resources by the probe function up until this call point use devm_ methods. So just return the error code rather than use a goto. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-7-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:50 +01:00
Andrew Lunn	4659ccedf2	net: ftgmac100: Use devm_clk_get_enabled Make use of devm_ methods to request and enable clocks to simplify cleanup. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-6-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:50 +01:00
Andrew Lunn	67127f88c8	net: ftgmac100: Use devm_request_memory_region/devm_ioremap Make use of devm_ methods to request and remap the device memory to simplify cleanup. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-5-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:50 +01:00
Andrew Lunn	f4bef838a4	net: ftgmac100: Use devm_alloc_etherdev() Make use of devm_alloc_etherdev() to simplify cleanup. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-4-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:50 +01:00
Andrew Lunn	9b42f74808	net: ftgmac100: Replace all of_device_is_compatible() Now that the priv structure includes the MAC ID, make use of it instead of the more expensive of_device_is_compatible(). Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-3-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:50 +01:00
Andrew Lunn	41fbe5aa50	net: ftgmac100: Add match data containing MAC ID The driver supports 4 different versions of the FTGMAC core. Extend the compatible matching to include match data, which indicates the version of the MAC. Default to the initial Faraday device if DT is not being used. Lookup the match data early in probe to keep error handing simple. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-2-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:50 +01:00
Andrew Lunn	03a2aba50b	net: ftgmac100: List all compatibles As a step towards cleanup the probe function, list each compatible the driver supports. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260206-ftgmac-cleanup-v5-1-ad28a9067ea7@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 13:40:50 +01:00
Paolo Abeni	4f65ee7e0c	Merge branch 'hsr-implement-more-robust-duplicate-discard-algorithm' Felix Maurer says: ==================== hsr: Implement more robust duplicate discard algorithm The duplicate discard algorithms for PRP and HSR do not work reliably with certain link faults. Especially with packet loss on one link, the duplicate discard algorithms drop valid packets. For a more thorough description see patches 4 (for PRP) and 6 (for HSR). This patchset replaces the current algorithms (based on a drop window for PRP and highest seen sequence number for HSR) with a single new one that tracks the received sequence numbers individually (descriptions again in patches 4 and 6). The changes will lead to higher memory usage and more work to do for each packet. But I argue that this is an acceptable trade-off to make for a more robust PRP and HSR behavior with faulty links. After all, both protocols are to be used in environments where redundancy is needed and people are willing to setup special network topologies to achieve that. Some more reasoning on the overhead and expected scale of the deployment from the RFC discussion: > As for the expected scale, there are two dimensions: the number of nodes > in the network and the data rate with which they send. > > The number of nodes in the network affect the memory usage because each > node now has the block buffer. For PRP that's 64 blocks * 32 byte = > 2kbyte for each node in the node table. A PRP network doesn't have an > explicit limit for the number of nodes. However, the whole network is a > single layer-2 segment which shouldn't grow too large anyways. Even if > one really tries to put 1000 nodes into the PRP network, the memory > overhead (2Mbyte) is acceptable in my opinion. > > For HSR, the blocks would be larger because we need to track the > sequence numbers per port. I expect 64 blocks * 80 byte = 5kbyte per > node in the node table. There is no explicit limit for the size of an > HSR ring either. But I expect them to be of limited size because the > forwarding delays add up throughout the ring. I've seen vendors limiting > the ring size to 50 nodes with 100Mbit/s links and 300 with 1Gbit/s > links. In both cases I consider the memory overhead acceptable. > > The data rates are harder to reason about. In general, the data rates > for HSR and PRP are limited because too high packet rates would lead to > very fast re-use of the 16bit sequence numbers. The IEC 62439-3:2021 > mentions 100Mbit/s links and 1Gbit/s links. I don't expect HSR or PRP > networks to scale out to, e.g., 10Gbit/s links with the current > specification as this would mean that sequence numbers could repeat as > often as every ~4ms. The default constants in the IEC standard, which we > also use, are oriented at a 100Mbit/s network. > > In my tests with veth pairs, the CPU overhead didn't lead to > significantly lower data rates. The main factor limiting the data rate > at the moment, I assume, is the per-node spinlock that is taken for each > received packet. IMHO, there is a lot more to gain in terms of CPU > overhead from making this lock smaller or getting rid of it, than we > loose with the more accurate duplicate discard algorithm in this patchset. > > The CPU overhead of the algorithm benefits from the fact that in high > packet rate scenarios (where it really matters) many packets will have > sequence numbers in already initialized blocks. These packets just have > additionally: one xarray lookup, one comparison, and one bit setting. If > a block needs to be initialized (once every 128 packets plus their 128 > duplicates if all sequence numbers are seen), we will have: one > xa_erase, a bunch of memory writes, and one xa_store. > > In theory, all packets could end up in the slow path if a node sends > every 128th packet to us. If this is sent from a well behaving node, the > packet rate wouldn't be an issue anymore, though. Signed-off-by: Felix Maurer <fmaurer@redhat.com> ==================== Link: https://patch.msgid.link/cover.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 12:02:32 +01:00
Felix Maurer	b3dabced35	MAINTAINERS: Assign hsr selftests to HSR Despite the HSR subsystem being orphaned at the moment due to the original maintainer being unreachable for a while, assign the selftests to the subsystem for the future. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/f4a356b96f5e0c99d9db3984ea62596c99a97469.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 12:02:29 +01:00
Felix Maurer	bbbd531faa	selftests: hsr: Add more link fault tests for HSR Run the packet loss and reordering tests also for both HSR versions. Now they can be removed from the hsr_ping tests completely. The timeout needs to be increased because there are 15 link fault test cases now, with each of them taking 5-6sec for the test and at most 5sec for the HSR node tables to get merged and we also want some room to make the test runs stable. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/eb6f667d3804ce63d86f0ee3fbc0e0ac9e1a209a.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 12:02:29 +01:00
Felix Maurer	aae9d6b616	hsr: Implement more robust duplicate discard for HSR The HSR duplicate discard algorithm had even more basic problems than the described for PRP in the previous patch. It relied only on the last received sequence number to decide if a new frame should be forwarded to any port. This does not work correctly in any case where frames are received out of order. The linked bug report claims that this can even happen with perfectly fine links due to the order in which incoming frames are processed (which can be unexpected on multi-core systems). The issue also occasionally shows up in the HSR selftests. The main reason is that the sequence number that was last forwarded to the master port may have skipped a number which will in turn never be delivered to the host. As the problem (we accidentally skip over a sequence number that has not been received but will be received in the future) is similar to PRP, we can apply a similar solution. The duplicate discard algorithm based on the "sparse bitmap" works well for HSR if it is extended to track one bitmap for each port (A, B, master, interlink). To do this, change the sequence number blocks to contain a flexible array member as the last member that can keep chunks for as many bitmaps as we need. This design makes it easy to reuse the same algorithm in a potential PRP RedBox implementation. The duplicate discard algorithm functions are modified to deal with sequence number blocks of different sizes and to correctly use the array of bitmap chunks. There is a notable speciality for HSR: the port type has a special port type NONE with value 0. This leads to the number of port types being 5 instead of actually 4. To save memory, remove the NONE port from the bitmap (by subtracting 1) when setting up the block buffer and when accessing the bitmap chunks in the array. Removing the old algorithm allows us to get rid of a few fields that are not needed any more: time_out and seq_out for each port. We can also remove some functions that were only necessary for the previous duplicate discard algorithm. The removal of seq_out is possible despite its previous usage in hsr_register_frame_in: it was used to prevent updates to time_in when "invalid" sequence numbers were received. With the new duplicate discard algorithm, time_in has no relevance for the expiry of sequence numbers anymore. They will expire based on the timestamps in the sequence number blocks after at most 400ms. There is no need that a node "re-registers" to "resume communication": after 400ms, all sequence numbers are accepted again. Also, according to the IEC 62439-3:2021, all nodes are supposed to send no traffic for 500ms after boot to lead exactly to this expiry of seen sequence numbers. time_in is still used for pruning nodes from the node table after no traffic has been received for 60sec. Pruning is only needed if the node is really gone and has not been sending any traffic for that period. seq_out was also used to report the last incoming sequence number from a node through netlink. I am not sure how useful this value is to userspace at all, but added getting it from the sequence number blocks. This number can be outdated after node merging until a new block has been added. Update the KUnit test for the PRP duplicate discard so that the node allocation matches and expectations on the removed fields are removed. Reported-by: Yoann Congal <yoann.congal@smile.fr> Closes: https://lore.kernel.org/netdev/7d221a07-8358-4c0b-a09c-3b029c052245@smile.fr/ Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/36dc3bc5bdb7e68b70bb5ef86f53ca95a3f35418.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 12:02:29 +01:00
Felix Maurer	8908c3c8ce	selftests: hsr: Add tests for more link faults with PRP Add tests where one link has different rates of packet loss or reorders packets. PRP should still be able to recover from these link faults and show no packet loss. However, it is acceptable to receive some level of duplicate packets. This matches the current specification (IEC 62439-3:2021) of the duplicate discard algorithm that requires it to be "designed such that it never rejects a legitimate frame, while occasional acceptance of a duplicate can be tolerated." The rate of acceptable duplicates in this test is intentionally high (10%) to make the test stable, the values I observed in the worst test cases (20% loss) are around 5% duplicates. The duplicates occur because of the 10ms ping interval in the test. As blocks expire after 400ms based on the timestamp of the first received sequence number in the block, every approx. 40th will lead to a new, clean block being used where the sequence number hasn't been seen before. As this occurs on both nodes in the test (for requests and replies), we observe around 20 duplicate frames. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/7b36506d3a80e53786fe56526cf6046c74dfeee1.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 12:02:29 +01:00
Felix Maurer	415e636751	hsr: Implement more robust duplicate discard for PRP The PRP duplicate discard algorithm does not work reliably with certain link faults. Especially with packet loss on one link, the duplicate discard algorithm drops valid packets which leads to packet loss on the PRP interface where the link fault should in theory be perfectly recoverable by PRP. This happens because the algorithm opens the drop window on the lossy link, covering received and lost sequence numbers. If the other, non-lossy link receives the duplicate for a lost frame, it is within the drop window of the lossy link and therefore dropped. Since IEC 62439-3:2012, a node has one sequence number counter for frames it sends, instead of one sequence number counter for each destination. Therefore, a node can not expect to receive contiguous sequence numbers from a sender. A missing sequence number can be totally normal (if the sender intermittently communicates with another node) or mean a frame was lost. The algorithm, as previously implemented in commit `05fd00e5e7` ("net: hsr: Fix PRP duplicate detection"), was part of IEC 62439-3:2010 (HSRv0/PRPv0) but was removed with IEC 62439-3:2012 (HSRv1/PRPv1). Since that, no algorithm is specified but up to implementers. It should be "designed such that it never rejects a legitimate frame, while occasional acceptance of a duplicate can be tolerated" (IEC 62439-3:2021). For the duplicate discard algorithm, this means that 1) we need to track the sequence numbers individually to account for non-contiguous sequence numbers, and 2) we should always err on the side of accepting a duplicate than dropping a valid frame. The idea of the new algorithm is to store the seen sequence numbers in a bitmap. To keep the size of the bitmap in control, we store it as a "sparse bitmap" where the bitmap is split into blocks and not all blocks exist at the same time. The sparse bitmap is implemented using an xarray that keeps the references to the individual blocks and a backing ring buffer that stores the actual blocks. New blocks are initialized in the buffer and added to the xarray as needed when new frames arrive. Existing blocks are removed in two conditions: 1. The block found for an arriving sequence number is old and therefore not relevant to the duplicate discard algorithm anymore, i.e., it has been added more than the entry forget time ago. In this case, the block is removed from the xarray and marked as forgotten (by setting its timestamp to 0). 2. Space is needed in the ring buffer for a new block. In this case, the block is removed from the xarray, if it hasn't already been forgotten (by 1.). Afterwards, the new block is initialized in its place. This has the nice property that we can reliably track sequence numbers on low traffic situations (where they expire based on their timestamp) and more quickly forget sequence numbers in high traffic situations before they potentially wrap over and repeat before they are expired. When nodes are merged, the blocks are merged as well. The timestamp of a merged block is set to the minimum of the two timestamps to never keep around a seen sequence number for too long. The bitmaps are or'd to mark all seen sequence numbers as seen. All of this still happens under seq_out_lock, to prevent concurrent access to the blocks. The KUnit test for the algorithm is updated as well. The updates are done in a way to match the original intends pretty closely. Currently, there is much knowledge about the actual algorithm baked into the tests (especially the expectations) which may need some redesign in the future. Reported-by: Steffen Lindner <steffen.lindner@de.abb.com> Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Steffen Lindner <steffen.lindner@de.abb.com> Link: https://patch.msgid.link/8ce15a996099df2df5b700969a39e7df400e8dbb.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 12:02:28 +01:00
Felix Maurer	ca4a09a950	selftests: hsr: Add tests for faulty links Add a test case that can support different types of faulty links for all protocol versions (HSRv0, HSRv1, PRPv1). It starts with a baseline with fully functional links. The first faulty case is one link being cut during the ping. This test uses a different function for ping that sends more packets in shorter intervals to stress the duplicate detection algorithms a bit more and allow for future tests with other link faults (packet loss, reordering, etc.). As the link fault tests now cover the cut link for HSR and PRP, it can be removed from the hsr_ping test. Note that the removed cut link test did not really test the fault because do_ping_long takes about 1sec while the link is only cut after a 3sec sleep. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/dad52276e2c349ecb96168bef7e3001bf7becc81.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 12:02:28 +01:00
Felix Maurer	776b64ba12	selftests: hsr: Check duplicates on HSR with VLAN Previously the hsr_ping test only checked that all nodes in a VLAN are reachable (using do_ping). Update the test to also check that there is no packet loss and no duplicate packets by running the same tests for VLANs as without VLANs (including using do_ping_long). This also adds tests for IPv6 over VLAN. To unify the test code, the topology without VLANs now uses IP addresses from dead:beef:0::/64 to align with the 100.64.0.0/24 range for IPv4. Error messages are updated across the board to make it easier to find what actually failed. Also update the VLAN test to only run in VLAN 2, as there is no need to check if ping really works with VLAN IDs 2, 3, 4, and 5. This lowers the number of long ping tests on VLANs to keep the overall test runtime in bounds. It's still necessary to bump the test timeout a bit, though: a ping long tests takes 1sec, do_ping_tests performs 12 of them, do_link_problem_tests 6, and the VLAN tests again 12. With some buffer for setup and waiting and for two protocol versions, 90sec timeout seems reasonable. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/e3ded0e2547b5f720524b62fabeb96debc579697.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 12:02:28 +01:00
Felix Maurer	c01a6c700f	selftests: hsr: Add ping test for PRP Add a selftest for PRP that performs a basic ping test on IPv4 and IPv6, over the plain PRP interface and a VLAN interface, similar to the existing ping test for HSR. The test first checks reachability of the other node, then checks for no loss and no duplicates. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/4a342189e842d7308d037da72af566729ee75834.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 12:02:28 +01:00
Paolo Abeni	310d80b0f8	Merge branch 'net-fec-improve-xdp-copy-mode-and-add-af_xdp-zero-copy-support' Wei Fang says: ==================== net: fec: improve XDP copy mode and add AF_XDP zero-copy support This patch set optimizes the XDP copy mode logic as follows. 1. Separate the processing of RX XDP frames from fec_enet_rx_queue(), and adds a separate function fec_enet_rx_queue_xdp() for handling XDP frames. 2. For TX XDP packets, using the batch sending method to avoid frequent MMIO writes. 3. Use the switch statement to check the tx_buf type instead of the if...else... statement, making the cleanup logic of TX BD ring cleared and more efficient. We compared the performance of XDP copy mode before and after applying this patch set, and the results show that the performance has improved. Before applying this patch set. root@imx93evk:~# ./xdp-bench tx eth0 Summary 396,868 rx/s 0 err,drop/s Summary 396,024 rx/s 0 err,drop/s root@imx93evk:~# ./xdp-bench drop eth0 Summary 684,781 rx/s 0 err/s Summary 675,746 rx/s 0 err/s root@imx93evk:~# ./xdp-bench pass eth0 Summary 208,552 rx/s 0 err,drop/s Summary 208,654 rx/s 0 err,drop/s root@imx93evk:~# ./xdp-bench redirect eth0 eth0 eth0->eth0 311,210 rx/s 0 err,drop/s 311,208 xmit/s eth0->eth0 310,808 rx/s 0 err,drop/s 310,809 xmit/s After applying this patch set. root@imx93evk:~# ./xdp-bench tx eth0 Summary 425,778 rx/s 0 err,drop/s Summary 426,042 rx/s 0 err,drop/s root@imx93evk:~# ./xdp-bench drop eth0 Summary 698,351 rx/s 0 err/s Summary 701,882 rx/s 0 err/s root@imx93evk:~# ./xdp-bench pass eth0 Summary 210,348 rx/s 0 err,drop/s Summary 210,016 rx/s 0 err,drop/s root@imx93evk:~# ./xdp-bench redirect eth0 eth0 eth0->eth0 354,407 rx/s 0 err,drop/s 354,401 xmit/s eth0->eth0 350,381 rx/s 0 err,drop/s 350,389 xmit/s This patch set also addes the AF_XDP zero-copy support, and we tested the performance on i.MX93 platform with xdpsock tool. The following is the performance comparison of copy mode and zero-copy mode. It can be seen that the performance of zero-copy mode is better than that of copy mode. 1. MAC swap L2 forwarding 1.1 Zero-copy mode root@imx93evk:~# ./xdpsock -i eth0 -l -z sock0@eth0:0 l2fwd xdp-drv pps pkts 1.00 rx 414715 415455 tx 414715 415455 1.2 Copy mode root@imx93evk:~# ./xdpsock -i eth0 -l -c sock0@eth0:0 l2fwd xdp-drv pps pkts 1.00 rx 356396 356609 tx 356396 356609 2. TX only 2.1 Zero-copy mode root@imx93evk:~# ./xdpsock -i eth0 -t -s 64 -z sock0@eth0:0 txonly xdp-drv pps pkts 1.00 rx 0 0 tx 1119573 1126720 2.2 Copy mode root@imx93evk:~# ./xdpsock -i eth0 -t -s 64 -c sock0@eth0:0 txonly xdp-drv pps pkts 1.00 rx 0 0 tx 406864 407616 ==================== Link: https://patch.msgid.link/20260205085742.2685134-1-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 10:58:23 +01:00
Wei Fang	25eb3058eb	net: fec: add AF_XDP zero-copy support This patch adds AF_XDP zero-copy support for both TX and RX on the FEC driver. It introduces new functions for XSK buffer allocation, RX/TX queue processing in zero-copy mode, and XSK pool setup/teardown. For RX, fec_alloc_rxq_buffers_zc() is added to allocate RX buffers from XSK pool. And fec_enet_rx_queue_xsk() is used to process the frames from the RX queue which is bound to the AF_XDP socket. Similar to the copy mode, the zero-copy mode also supports XDP_TX, XDP_PASS, XDP_DROP and XDP_REDIRECT actions. In addition, fec_enet_xsk_tx_xmit() is similar to fec_enet_xdp_tx_xmit() and is used to handle XDP_TX action in zero-copy mode. For TX, there are two cases, one is the frames from the AF_XDP socket, so fec_enet_xsk_xmit() is added to directly transmit the frames from the socket and the buffer type is marked as FEC_TXBUF_T_XSK_XMIT. The other one is the frames from the RX queue (XDP_TX action), the buffer type is marked as FEC_TXBUF_T_XSK_TX. Therefore, fec_enet_tx_queue() could correctly clean the TX queue base on the buffer type. Also, some tests have been done on the i.MX93-EVK board with the xdpsock tool, the following are the results. Env: i.MX93 connects to a packet generator, the link speed is 1Gbps, and flow-control is off. The RX packet size is 64 bytes including FCS. Only one RX queue (CPU) is used to receive frames. 1. MAC swap L2 forwarding 1.1 Zero-copy mode root@imx93evk:~# ./xdpsock -i eth0 -l -z sock0@eth0:0 l2fwd xdp-drv pps pkts 1.00 rx 414715 415455 tx 414715 415455 1.2 Copy mode root@imx93evk:~# ./xdpsock -i eth0 -l -c sock0@eth0:0 l2fwd xdp-drv pps pkts 1.00 rx 356396 356609 tx 356396 356609 2. TX only 2.1 Zero-copy mode root@imx93evk:~# ./xdpsock -i eth0 -t -s 64 -z sock0@eth0:0 txonly xdp-drv pps pkts 1.00 rx 0 0 tx 1119573 1126720 2.2 Copy mode root@imx93evk:~# ./xdpsock -i eth0 -t -s 64 -c sock0@eth0:0 txonly xdp-drv pps pkts 1.00 rx 0 0 tx 406864 407616 Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260205085742.2685134-16-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 10:58:20 +01:00
Wei Fang	fee723a48c	net: fec: improve fec_enet_tx_queue() To support AF_XDP zero-copy mode in the subsequent patch, the following adjustments have been made to fec_tx_queue(). 1. Change the parameters of fec_tx_queue(). 2. Some variables are initialized at the time of declaration, and the order of local variables is updated to follow the reverse xmas tree style. 3. Remove the variable xdpf and add the variable tx_buf. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260205085742.2685134-15-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 10:58:20 +01:00
Wei Fang	a2ae70c0ef	net: fec: add fec_alloc_rxq_buffers_pp() to allocate buffers from page pool Currently, the buffers of RX queue are allocated from the page pool. In the subsequent patches to support XDP zero copy, the RX buffers will be allocated from the UMEM. Therefore, extract fec_alloc_rxq_buffers_pp() from fec_enet_alloc_rxq_buffers() and we will add another helper to allocate RX buffers from UMEM for the XDP zero copy mode. In addition, fec_alloc_rxq_buffers_pp() only initializes bdp->bufaddr and does not initialize other fields of bdp, because these will be initialized in fec_enet_bd_init(). Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260205085742.2685134-14-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 10:58:20 +01:00
Wei Fang	edd393153e	net: fec: move xdp_rxq_info* APIs out of fec_enet_create_page_pool() Extract fec_xdp_rxq_info_reg() from fec_enet_create_page_pool() and move it out of fec_enet_create_page_pool(), so that it can be reused in the subsequent patches to support XDP zero copy mode. Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260205085742.2685134-13-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 10:58:20 +01:00
Wei Fang	6729b24c1c	net: fec: remove the size parameter from fec_enet_create_page_pool() Remove the size parameter from fec_enet_create_page_pool(), since rxq->bd.ring_size already contains this information. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260205085742.2685134-12-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 10:58:20 +01:00
Wei Fang	8492e4f195	net: fec: use switch statement to check the type of tx_buf The tx_buf has three types: FEC_TXBUF_T_SKB, FEC_TXBUF_T_XDP_NDO and FEC_TXBUF_T_XDP_TX. Currently, the driver uses 'if...else...' statements to check the type and perform the corresponding processing. This is very detrimental to future expansion. To support AF_XDP zero-copy mode, two new types will be added in the future, continuing to use 'if...else...' would be a very bad coding style. So the 'if...else...' statements in the current driver are replaced with switch statements. Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260205085742.2685134-11-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 10:58:20 +01:00
Wei Fang	2dcc934755	net: fec: remove unnecessary NULL pointer check when clearing TX BD ring The tx_buf pointer will not NULL when its type is FEC_TXBUF_T_XDP_NDO or FEC_TXBUF_T_XDP_TX. If the type is FEC_TXBUF_T_SKB, dev_kfree_skb_any() will do NULL pointer check. So it is unnecessary to do NULL pointer check in fec_enet_bd_init() and fec_enet_tx_queue(). Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260205085742.2685134-10-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 10:58:20 +01:00
Wei Fang	2ff7a7d392	net: fec: transmit XDP frames in bulk Currently, the driver writes the ENET_TDAR register for every XDP frame to trigger transmit start. Frequent MMIO writes consume more CPU cycles and may reduce XDP TX performance, so transmit XDP frames in bulk. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260205085742.2685134-9-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-10 10:58:20 +01:00

1 2 3 4 5 ...

1415659 Commits