linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-01 20:27:40 +08:00

Author	SHA1	Message	Date
Eric Dumazet	ecfea98b7d	tcp: add net.ipv4.tcp_rcvbuf_low_rtt This is a follow up of commit `aa251c8463` ("tcp: fix too slow tcp_rcvbuf_grow() action") which brought again the issue that I tried to fix in commit `65c5287892` ("tcp: fix sk_rcvbuf overshoot") We also recently increased tcp_rmem[2] to 32 MB in commit `572be9bf9d` ("tcp: increase tcp_rmem[2] to 32 MB") Idea of this patch is to not let tcp_rcvbuf_grow() grow sk->sk_rcvbuf too fast for small RTT flows. If sk->sk_rcvbuf is too big, this can force NIC driver to not recycle pages from their page pool, and also can cause cache evictions for DDIO enabled cpus/NIC, as receivers are usually slower than senders. Add net.ipv4.tcp_rcvbuf_low_rtt sysctl, set by default to 1000 usec (1 ms) If RTT if smaller than the sysctl value, use the RTT/tcp_rcvbuf_low_rtt ratio to control sk_rcvbuf inflation. Tested: Pair of hosts with a 200Gbit IDPF NIC. Using netperf/netserver Client initiates 8 TCP bulk flows, asking netserver to use CPU #10 only. super_netperf 8 -H server -T,10 -l 30 On server, use perf -e tcp:tcp_rcvbuf_grow while test is running. Before: sysctl -w net.ipv4.tcp_rcvbuf_low_rtt=1 perf record -a -e tcp:tcp_rcvbuf_grow sleep 30 ; perf script\|tail -20\|cut -c30-230 1153.051201: tcp:tcp_rcvbuf_grow: time=398 rtt_us=382 copied=6905856 inq=180224 space=6115328 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25600000 famil 1153.138752: tcp:tcp_rcvbuf_grow: time=446 rtt_us=413 copied=5529600 inq=180224 space=4505600 ooo=0 scaling_ratio=240 rcvbuf=23068672 rcv_ssthresh=21571860 window_clamp=21626880 rcv_wnd=21286912 famil 1153.361484: tcp:tcp_rcvbuf_grow: time=415 rtt_us=380 copied=7061504 inq=204800 space=6725632 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25600000 famil 1153.457642: tcp:tcp_rcvbuf_grow: time=483 rtt_us=421 copied=5885952 inq=720896 space=4407296 ooo=0 scaling_ratio=240 rcvbuf=23763511 rcv_ssthresh=22223271 window_clamp=22278291 rcv_wnd=21430272 famil 1153.466002: tcp:tcp_rcvbuf_grow: time=308 rtt_us=281 copied=3244032 inq=180224 space=2883584 ooo=0 scaling_ratio=240 rcvbuf=44854314 rcv_ssthresh=41992059 window_clamp=42050919 rcv_wnd=41713664 famil 1153.747792: tcp:tcp_rcvbuf_grow: time=394 rtt_us=332 copied=4460544 inq=585728 space=3063808 ooo=0 scaling_ratio=240 rcvbuf=44854314 rcv_ssthresh=41992059 window_clamp=42050919 rcv_wnd=41373696 famil 1154.260747: tcp:tcp_rcvbuf_grow: time=652 rtt_us=226 copied=10977280 inq=737280 space=9486336 ooo=0 scaling_ratio=240 rcvbuf=31165538 rcv_ssthresh=29197743 window_clamp=29217691 rcv_wnd=28368896 fami 1154.375019: tcp:tcp_rcvbuf_grow: time=461 rtt_us=443 copied=7573504 inq=507904 space=6856704 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25288704 famil 1154.463072: tcp:tcp_rcvbuf_grow: time=494 rtt_us=408 copied=7983104 inq=200704 space=7065600 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25579520 famil 1154.474658: tcp:tcp_rcvbuf_grow: time=507 rtt_us=459 copied=5586944 inq=540672 space=4718592 ooo=0 scaling_ratio=240 rcvbuf=17852266 rcv_ssthresh=16692999 window_clamp=16736499 rcv_wnd=16056320 famil 1154.584657: tcp:tcp_rcvbuf_grow: time=494 rtt_us=427 copied=8126464 inq=204800 space=7782400 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25600000 famil 1154.702117: tcp:tcp_rcvbuf_grow: time=480 rtt_us=406 copied=5734400 inq=180224 space=5349376 ooo=0 scaling_ratio=240 rcvbuf=23068672 rcv_ssthresh=21571860 window_clamp=21626880 rcv_wnd=21286912 famil 1155.941595: tcp:tcp_rcvbuf_grow: time=717 rtt_us=670 copied=11042816 inq=3784704 space=7159808 ooo=0 scaling_ratio=240 rcvbuf=19581357 rcv_ssthresh=18333222 window_clamp=18357522 rcv_wnd=14614528 fam 1156.384735: tcp:tcp_rcvbuf_grow: time=529 rtt_us=473 copied=9011200 inq=180224 space=7258112 ooo=0 scaling_ratio=240 rcvbuf=19581357 rcv_ssthresh=18333222 window_clamp=18357522 rcv_wnd=18018304 famil 1157.821676: tcp:tcp_rcvbuf_grow: time=529 rtt_us=272 copied=8224768 inq=602112 space=6545408 ooo=0 scaling_ratio=240 rcvbuf=67000000 rcv_ssthresh=62793576 window_clamp=62812500 rcv_wnd=62115840 famil 1158.906379: tcp:tcp_rcvbuf_grow: time=710 rtt_us=445 copied=11845632 inq=540672 space=10240000 ooo=0 scaling_ratio=240 rcvbuf=31165538 rcv_ssthresh=29205935 window_clamp=29217691 rcv_wnd=28536832 fam 1164.600160: tcp:tcp_rcvbuf_grow: time=841 rtt_us=430 copied=12976128 inq=1290240 space=11304960 ooo=0 scaling_ratio=240 rcvbuf=31165538 rcv_ssthresh=29212591 window_clamp=29217691 rcv_wnd=27856896 fa 1165.163572: tcp:tcp_rcvbuf_grow: time=845 rtt_us=800 copied=12632064 inq=540672 space=7921664 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25912795 window_clamp=25937095 rcv_wnd=25260032 fami 1165.653464: tcp:tcp_rcvbuf_grow: time=388 rtt_us=309 copied=4493312 inq=180224 space=3874816 ooo=0 scaling_ratio=240 rcvbuf=44854314 rcv_ssthresh=41995899 window_clamp=42050919 rcv_wnd=41713664 famil 1166.651211: tcp:tcp_rcvbuf_grow: time=556 rtt_us=553 copied=6328320 inq=540672 space=5554176 ooo=0 scaling_ratio=240 rcvbuf=23068672 rcv_ssthresh=21571860 window_clamp=21626880 rcv_wnd=20946944 famil After: sysctl -w net.ipv4.tcp_rcvbuf_low_rtt=1000 perf record -a -e tcp:tcp_rcvbuf_grow sleep 30 ; perf script\|tail -20\|cut -c30-230 1457.053149: tcp:tcp_rcvbuf_grow: time=128 rtt_us=24 copied=1441792 inq=40960 space=1269760 ooo=0 scaling_ratio=240 rcvbuf=2960741 rcv_ssthresh=2605474 window_clamp=2775694 rcv_wnd=2568192 family=AF_I 1458.000778: tcp:tcp_rcvbuf_grow: time=128 rtt_us=31 copied=1441792 inq=24576 space=1400832 ooo=0 scaling_ratio=240 rcvbuf=3060163 rcv_ssthresh=2810042 window_clamp=2868902 rcv_wnd=2674688 family=AF_I 1458.088059: tcp:tcp_rcvbuf_grow: time=190 rtt_us=110 copied=3227648 inq=385024 space=2781184 ooo=0 scaling_ratio=240 rcvbuf=6728240 rcv_ssthresh=6252705 window_clamp=6307725 rcv_wnd=5799936 family=AF 1458.148549: tcp:tcp_rcvbuf_grow: time=232 rtt_us=129 copied=3956736 inq=237568 space=2842624 ooo=0 scaling_ratio=240 rcvbuf=6731333 rcv_ssthresh=6252705 window_clamp=6310624 rcv_wnd=5918720 family=AF 1458.466861: tcp:tcp_rcvbuf_grow: time=193 rtt_us=83 copied=2949120 inq=180224 space=2457600 ooo=0 scaling_ratio=240 rcvbuf=5751438 rcv_ssthresh=5357689 window_clamp=5391973 rcv_wnd=5054464 family=AF_ 1458.775476: tcp:tcp_rcvbuf_grow: time=257 rtt_us=127 copied=4304896 inq=352256 space=3346432 ooo=0 scaling_ratio=240 rcvbuf=8067131 rcv_ssthresh=7523275 window_clamp=7562935 rcv_wnd=7061504 family=AF 1458.776631: tcp:tcp_rcvbuf_grow: time=200 rtt_us=96 copied=3260416 inq=143360 space=2768896 ooo=0 scaling_ratio=240 rcvbuf=6397256 rcv_ssthresh=5938567 window_clamp=5997427 rcv_wnd=5828608 family=AF_ 1459.707973: tcp:tcp_rcvbuf_grow: time=215 rtt_us=96 copied=2506752 inq=163840 space=1388544 ooo=0 scaling_ratio=240 rcvbuf=3068867 rcv_ssthresh=2768282 window_clamp=2877062 rcv_wnd=2555904 family=AF_ 1460.246494: tcp:tcp_rcvbuf_grow: time=231 rtt_us=80 copied=3756032 inq=204800 space=3117056 ooo=0 scaling_ratio=240 rcvbuf=7288091 rcv_ssthresh=6773725 window_clamp=6832585 rcv_wnd=6471680 family=AF_ 1460.714596: tcp:tcp_rcvbuf_grow: time=270 rtt_us=110 copied=4714496 inq=311296 space=3719168 ooo=0 scaling_ratio=240 rcvbuf=8957739 rcv_ssthresh=8339020 window_clamp=8397880 rcv_wnd=7933952 family=AF 1462.029977: tcp:tcp_rcvbuf_grow: time=101 rtt_us=19 copied=1105920 inq=40960 space=1036288 ooo=0 scaling_ratio=240 rcvbuf=2338970 rcv_ssthresh=2091684 window_clamp=2192784 rcv_wnd=1986560 family=AF_I 1462.802385: tcp:tcp_rcvbuf_grow: time=89 rtt_us=45 copied=1069056 inq=0 space=1064960 ooo=0 scaling_ratio=240 rcvbuf=2338970 rcv_ssthresh=2091684 window_clamp=2192784 rcv_wnd=2035712 family=AF_INET6 1462.918648: tcp:tcp_rcvbuf_grow: time=105 rtt_us=33 copied=1441792 inq=180224 space=1069056 ooo=0 scaling_ratio=240 rcvbuf=2383282 rcv_ssthresh=2091684 window_clamp=2234326 rcv_wnd=1896448 family=AF_ 1463.222533: tcp:tcp_rcvbuf_grow: time=273 rtt_us=144 copied=4603904 inq=385024 space=3469312 ooo=0 scaling_ratio=240 rcvbuf=8422564 rcv_ssthresh=7891053 window_clamp=7896153 rcv_wnd=7409664 family=AF 1466.519312: tcp:tcp_rcvbuf_grow: time=130 rtt_us=23 copied=1343488 inq=0 space=1261568 ooo=0 scaling_ratio=240 rcvbuf=2780158 rcv_ssthresh=2493778 window_clamp=2606398 rcv_wnd=2494464 family=AF_INET6 1466.681003: tcp:tcp_rcvbuf_grow: time=128 rtt_us=21 copied=1441792 inq=12288 space=1343488 ooo=0 scaling_ratio=240 rcvbuf=2932027 rcv_ssthresh=2578555 window_clamp=2748775 rcv_wnd=2568192 family=AF_I 1470.689959: tcp:tcp_rcvbuf_grow: time=255 rtt_us=122 copied=3932160 inq=204800 space=3551232 ooo=0 scaling_ratio=240 rcvbuf=8182038 rcv_ssthresh=7647384 window_clamp=7670660 rcv_wnd=7442432 family=AF 1471.754154: tcp:tcp_rcvbuf_grow: time=188 rtt_us=95 copied=2138112 inq=577536 space=1429504 ooo=0 scaling_ratio=240 rcvbuf=3113650 rcv_ssthresh=2806426 window_clamp=2919046 rcv_wnd=2248704 family=AF_ 1476.813542: tcp:tcp_rcvbuf_grow: time=269 rtt_us=99 copied=3088384 inq=180224 space=2564096 ooo=0 scaling_ratio=240 rcvbuf=6219470 rcv_ssthresh=5771893 window_clamp=5830753 rcv_wnd=5509120 family=AF_ 1477.738309: tcp:tcp_rcvbuf_grow: time=166 rtt_us=54 copied=1777664 inq=180224 space=1417216 ooo=0 scaling_ratio=240 rcvbuf=3117118 rcv_ssthresh=2874958 window_clamp=2922298 rcv_wnd=2613248 family=AF_ We can see sk_rcvbuf values are much smaller, and that rtt_us (estimation of rtt from a receiver point of view) is kept small, instead of being bloated. No difference in throughput. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Tested-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20251119084813.3684576-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 17:44:23 -08:00
Eric Dumazet	6d5dea6824	tcp: tcp_moderate_rcvbuf is only used in rx path sysctl_tcp_moderate_rcvbuf is only used from tcp_rcvbuf_grow(). Move it to netns_ipv4_read_rx group. Remove various CACHELINE_ASSERT_GROUP_SIZE() from netns_ipv4_struct_check(), as they have no real benefit but cause pain for all changes. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251119084813.3684576-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 17:44:23 -08:00
Jakub Kicinski	738cd803b9	Merge branch 'net-mdio-improve-reset-handling-of-mdio-devices' Buday Csaba says: ==================== net: mdio: improve reset handling of mdio devices This patchset refactors and slightly improves the reset handling of `mdio_device`. The patches were split from a larger series, discussed previously in the links below. The difference between v2 and v3, is that the helper function declarations have been moved to a new header file: drivers/net/phy/mdio-private.h See links for the previous versions, and for the now separate leak fix. ==================== Link: https://patch.msgid.link/cover.1763473655.git.buday.csaba@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 17:41:41 -08:00
Buday Csaba	e5a440bf02	net: mdio: improve reset handling in mdio_device.c Change fwnode_property_read_u32() in mdio_device_register_reset() to device_property_read_u32(), which is more appropriate here. Make mdio_device_unregister_reset() truly reverse mdio_device_register_reset() by setting the internal fields to their default values. Signed-off-by: Buday Csaba <buday.csaba@prolan.hu> Link: https://patch.msgid.link/641df1488517ae71ba10158ec1e38424211d8651.1763473655.git.buday.csaba@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 17:41:39 -08:00
Buday Csaba	acde7ad968	net: mdio: common handling of phy device reset properties Unify the handling of the per device reset properties for `mdio_device`. Merge mdio_device_register_gpiod() and mdio_device_register_reset() into mdio_device_register_reset(), that handles both reset-controllers and reset-gpios. Move reading of the reset firmware properties (reset-assert-us, reset-deassert-us) from fwnode_mdio.c to mdio_device_register_reset(), so all reset related initialization code is kept in one place. Introduce mdio_device_unregister_reset() to release the associated resources. These changes make tracking the reset properties easier. Added kernel-doc for mdio_device_register/unregister_reset(). Signed-off-by: Buday Csaba <buday.csaba@prolan.hu> Link: https://patch.msgid.link/17c216efd7a47be17db104378b6aacfc8741d8b9.1763473655.git.buday.csaba@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 17:41:39 -08:00
Buday Csaba	02aeff20e8	net: mdio: move device reset functions to mdio_device.c The functions mdiobus_register_gpiod() and mdiobus_register_reset() handle the mdio device reset initialization, which belong to mdio_device.c. Move them from mdio_bus.c to mdio_device.c, and rename them to match the corresponding source file: mdio_device_register_gpio() and mdio_device_register_reset(). Remove 'static' qualifiers and declare them in drivers/net/phy/mdio-private.h (new header file). Signed-off-by: Buday Csaba <buday.csaba@prolan.hu> Link: https://patch.msgid.link/5f684838ee897130f21b21beb07695eea4af8988.1763473655.git.buday.csaba@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 17:41:39 -08:00
Jakub Kicinski	9e203721ec	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.18-rc7). No conflicts, adjacent changes: tools/testing/selftests/net/af_unix/Makefile `e1bb28bf13` ("selftest: af_unix: Add test for SO_PEEK_OFF.") `45a1cd8346` ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 09:13:26 -08:00
Linus Torvalds	8e621c9a33	Merge tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from IPsec and wireless. Previous releases - regressions: - prevent NULL deref in generic_hwtstamp_ioctl_lower(), newer APIs don't populate all the pointers in the request - phylink: add missing supported link modes for the fixed-link - mptcp: fix false positive warning in mptcp_pm_nl_rm_addr Previous releases - always broken: - openvswitch: remove never-working support for setting NSH fields - xfrm: number of fixes for error paths of xfrm_state creation/ modification/deletion - xfrm: fixes for offload - fix the determination of the protocol of the inner packet - don't push locally generated packets directly to L2 tunnel mode offloading, they still need processing from the standard xfrm path - mptcp: fix a couple of corner cases in fallback and fastclose handling - wifi: rtw89: hw_scan: prevent connections from getting stuck, work around apparent bug in FW by tweaking messages we send - af_unix: fix duplicate data if PEEK w/ peek_offset needs to wait - veth: more robust handing of race to avoid txq getting stuck - eth: ps3_gelic_net: handle skb allocation failures" * tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits) vsock: Ignore signal/timeout on connect() if already established be2net: pass wrb_params in case of OS2BMC l2tp: reset skb control buffer on xmit net: dsa: microchip: lan937x: Fix RGMII delay tuning selftests: mptcp: add a check for 'add_addr_accepted' mptcp: fix address removal logic in mptcp_pm_nl_rm_addr selftests: mptcp: join: userspace: longer timeout selftests: mptcp: join: endpoints: longer timeout selftests: mptcp: join: fastclose: remove flaky marks mptcp: fix duplicate reset on fastclose mptcp: decouple mptcp fastclose from tcp close mptcp: do not fallback when OoO is present mptcp: fix premature close in case of fallback mptcp: avoid unneeded subflow-level drops mptcp: fix ack generation for fallback msk wifi: rtw89: hw_scan: Don't let the operating channel be last net: phylink: add missing supported link modes for the fixed-link selftest: af_unix: Add test for SO_PEEK_OFF. af_unix: Read sk_peek_offset() again after sleeping in unix_stream_read_generic(). net/mlx5: Clean up only new IRQ glue on request_irq() failure ...	2025-11-20 08:52:07 -08:00
Michal Luczaj	002541ef65	vsock: Ignore signal/timeout on connect() if already established During connect(), acting on a signal/timeout by disconnecting an already established socket leads to several issues: 1. connect() invoking vsock_transport_cancel_pkt() -> virtio_transport_purge_skbs() may race with sendmsg() invoking virtio_transport_get_credit(). This results in a permanently elevated `vvs->bytes_unsent`. Which, in turn, confuses the SOCK_LINGER handling. 2. connect() resetting a connected socket's state may race with socket being placed in a sockmap. A disconnected socket remaining in a sockmap breaks sockmap's assumptions. And gives rise to WARNs. 3. connect() transitioning SS_CONNECTED -> SS_UNCONNECTED allows for a transport change/drop after TCP_ESTABLISHED. Which poses a problem for any simultaneous sendmsg() or connect() and may result in a use-after-free/null-ptr-deref. Do not disconnect socket on signal/timeout. Keep the logic for unconnected sockets: they don't linger, can't be placed in a sockmap, are rejected by sendmsg(). [1]: https://lore.kernel.org/netdev/e07fd95c-9a38-4eea-9638-133e38c2ec9b@rbox.co/ [2]: https://lore.kernel.org/netdev/20250317-vsock-trans-signal-race-v4-0-fc8837f3f1d4@rbox.co/ [3]: https://lore.kernel.org/netdev/60f1b7db-3099-4f6a-875e-af9f6ef194f6@rbox.co/ Fixes: `d021c34405` ("VSOCK: Introduce VM Sockets") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251119-vsock-interrupted-connect-v2-1-70734cf1233f@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 07:40:06 -08:00
Andrey Vatoropin	7d277a7a58	be2net: pass wrb_params in case of OS2BMC be_insert_vlan_in_pkt() is called with the wrb_params argument being NULL at be_send_pkt_to_bmc() call site. This may lead to dereferencing a NULL pointer when processing a workaround for specific packet, as commit `bc0c3405ab` ("be2net: fix a Tx stall bug caused by a specific ipv6 packet") states. The correct way would be to pass the wrb_params from be_xmit(). Fixes: `760c295e0e` ("be2net: Support for OS2BMC.") Cc: stable@vger.kernel.org Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru> Link: https://patch.msgid.link/20251119105015.194501-1-a.vatoropin@crpt.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 07:39:54 -08:00
Paolo Abeni	0888a0d76d	Merge branch 'ynl-cli-list-attrs-argument' Gal Pressman says: ==================== YNL CLI --list-attrs argument While experimenting with the YNL CLI, I found the process of going back and forth to examine the YAML spec files in order to figure out how to use each command quite tiring. The addition of --list-attrs helps by providing all information needed directly in the tool. I figured others would likely find it useful as well. v1: https://lore.kernel.org/all/20251116192845.1693119-1-gal@nvidia.com/ ==================== Link: https://patch.msgid.link/20251118143208.2380814-1-gal@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 15:43:06 +01:00
Gal Pressman	6c10f1a1c0	tools: ynl: cli: Display enum values in --list-attrs output When listing attributes with --list-attrs, display the actual enum values for attributes that reference an enum type. # ./cli.py --family netdev --list-attrs dev-get [..] - xdp-features: u64 (enum: xdp-act) Flags: basic, redirect, ndo-xmit, xsk-zerocopy, hw-offload, rx-sg, ndo-xmit-sg Bitmask of enabled xdp-features. [..] Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20251118143208.2380814-4-gal@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 15:43:04 +01:00
Gal Pressman	bc1bc1b357	tools: ynl: cli: Parse nested attributes in --list-attrs output Enhance the --list-attrs option to recursively display nested attributes instead of just showing "nest" as the type. Nested attributes now show their attribute set name and expand to display their contents. # ./cli.py --family ethtool --list-attrs rss-get [..] Do request attributes: - header: nest -> header - dev-index: u32 - dev-name: string - flags: u32 (enum: header-flags) - phy-index: u32 - context: u32 [..] Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20251118143208.2380814-3-gal@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 15:43:04 +01:00
Gal Pressman	2a2d5a3392	tools: ynl: cli: Add --list-attrs option to show operation attributes Add a --list-attrs option to the YNL CLI that displays information about netlink operations, including request and reply attributes. This eliminates the need to manually inspect YAML spec files to determine the JSON structure required for operations, or understand the structure of the reply. Example usage: # ./cli.py --family netdev --list-attrs dev-get Operation: dev-get Get / dump information about a netdev. Do request attributes: - ifindex: u32 netdev ifindex Do reply attributes: - ifindex: u32 netdev ifindex - xdp-features: u64 (enum: xdp-act) Bitmask of enabled xdp-features. - xdp-zc-max-segs: u32 max fragment count supported by ZC driver - xdp-rx-metadata-features: u64 (enum: xdp-rx-metadata) Bitmask of supported XDP receive metadata features. See Documentation/networking/xdp-rx-metadata.rst for more details. - xsk-features: u64 (enum: xsk-flags) Bitmask of enabled AF_XDP features. Dump reply attributes: - ifindex: u32 netdev ifindex - xdp-features: u64 (enum: xdp-act) Bitmask of enabled xdp-features. - xdp-zc-max-segs: u32 max fragment count supported by ZC driver - xdp-rx-metadata-features: u64 (enum: xdp-rx-metadata) Bitmask of supported XDP receive metadata features. See Documentation/networking/xdp-rx-metadata.rst for more details. - xsk-features: u64 (enum: xsk-flags) Bitmask of enabled AF_XDP features. Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20251118143208.2380814-2-gal@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 15:43:04 +01:00
Paolo Abeni	7828a4d3f6	Merge branch 'add-af_xdp-zero-copy-support' Meghana Malladi says: ==================== Add AF_XDP zero copy support This series adds AF_XDP zero coppy support to icssg driver. Tests were performed on AM64x-EVM with xdpsock application [1]. A clear improvement is seen Transmit (txonly) and receive (rxdrop) for 64 byte packets. 1500 byte test seems to be limited by line rate (1G link) so no improvement seen there in packet rate Having some issue with l2fwd as the benchmarking numbers show 0 for 64 byte packets after forwading first batch packets and I am currently looking into it. AF_XDP performance using 64 byte packets in Kpps. AF_XDP performance using 64 byte packets in Kpps. Benchmark: XDP-SKB XDP-Native XDP-Native(ZeroCopy) rxdrop 253 473 656 txonly 350 354 855 l2fwd 178 240 0 AF_XDP performance using 1500 byte packets in Kpps. Benchmark: XDP-SKB XDP-Native XDP-Native(ZeroCopy) rxdrop 82 82 82 txonly 81 82 82 l2fwd 81 82 82 [1]: https://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-example v5: https://lore.kernel.org/all/20251111101523.3160680-1-m-malladi@ti.com/ ==================== Link: https://patch.msgid.link/20251118135542.380574-1-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 15:24:13 +01:00
Meghana Malladi	c6a1ec1870	net: ti: icssg-prueth: Enable zero copy in XDP features Enable the zero copy feature flag in xdp_set_features_flag() for a given ndev to get the AF-XDP zero copy support running for both Tx and Rx. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Meghana Malladi <m-malladi@ti.com> Link: https://patch.msgid.link/20251118135542.380574-7-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 15:24:11 +01:00
Meghana Malladi	7a64bb388d	net: ti: icssg-prueth: Add AF_XDP zero copy for RX Use xsk_pool inside rx_chn to check if a given Rx queue id is registered for xsk zero copy, which gets populated during xsk enable. Update prueth_create_xdp_rxqs to register and support two different memory models (xsk and page) for a given Rx queue, if registered for zero copy. If xsk_pool is registered, allocate buffers from UMEM and map them to the hardware Rx descriptors. In NAPI context, run the XDP program for each packet and process the xsk buffer according to the XDP result codes. Also allocate new set of buffers from UMEM for the next batch of NAPI Rx processing. Add XDK_WAKEUP_RX support to support xsk wakeup for Rx. Move prueth_create_page_pool to prueth_init_rx_chns to avoid freeing and re-allocating the system memory every time there is a transition from zero copy to copy and prevents any type of memory fragmentation or leak. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Meghana Malladi <m-malladi@ti.com> Link: https://patch.msgid.link/20251118135542.380574-6-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 15:24:11 +01:00
Meghana Malladi	121133163c	net: ti: icssg-prueth: Make emac_run_xdp function independent of page emac_run_xdp function runs xdp program, at a given hook point in the Rx path of the driver in NAPI context and returns XDP return codes. In zero copy mode the driver receives packets using UMEM frames instead of pages (native XDP). Decouple the usage of page in this function. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Meghana Malladi <m-malladi@ti.com> Link: https://patch.msgid.link/20251118135542.380574-5-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 15:24:11 +01:00
Meghana Malladi	8756ef2eb0	net: ti: icssg-prueth: Add AF_XDP zero copy for TX Use xsk_pool inside tx_chn to check if a given Tx queue id is registered for xsk zero copy, which gets populated during xsk enable If xsk_pool is set, get frames from the pool in NAPI context and submit them to the Tx channel. Tx completion is also handled in the NAPI context. Use PRUETH_SWDATA_XSK to recycle xsk buffers back to the umem pool. Add XDP_WAKEUP_TX support to enable xsk_wakeup for Tx. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Meghana Malladi <m-malladi@ti.com> Link: https://patch.msgid.link/20251118135542.380574-4-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 15:24:11 +01:00
Meghana Malladi	7dfd759791	net: ti: icssg-prueth: Add XSK pool helpers Implement XSK NDOs (setup, wakeup) and create XSK Rx and Tx queues. xsk_qid stores the queue id for a given port which has been registered for zero copy AF_XDP and used to acquire UMEM pointer if registered. Based on the xsk_qid and the xsk_pool (umem) the driver is either in copy or zero copy mode. In case of copy mode the xsk_qid value will be invalid and will be set to valid queue id when enabling zero copy. To enable zero copy, the Rx queues are destroyed, i.e., descriptors pushed to fq and cq are freed to remap them to xdp buffers from the umem. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Meghana Malladi <m-malladi@ti.com> Link: https://patch.msgid.link/20251118135542.380574-3-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 15:24:11 +01:00
Meghana Malladi	41dde7f1d0	net: ti: icssg-prueth: Add functions to create and destroy Rx/Tx queues Each port for a given ICSSG instance has their own set of Tx and Rx queues. Add functions to create and destroy these queues, which will be further used while performing ndo_bpf operations to set up XSK Tx/Rx queues for a given port. In the destroy Rx queue sequence add teardown wait to ensure that all the descriptors including the TDCM (teardown completion marker) have been serviced and freed to avoid any sort of descriptor leaks. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Meghana Malladi <m-malladi@ti.com> Link: https://patch.msgid.link/20251118135542.380574-2-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 15:24:11 +01:00
Paolo Abeni	dc9e7e652f	Merge tag 'wireless-2025-11-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== wireless-2025-11-20 A single fix for scanning on some rtw89 devices. * tag 'wireless-2025-11-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: rtw89: hw_scan: Don't let the operating channel be last ==================== Link: https://patch.msgid.link/20251120085433.8601-3-johannes@sipsolutions.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 13:03:43 +01:00
Paolo Abeni	070b87f64a	Merge branch 'txgbe-support-more-modules' Jiawen Wu says: ==================== TXGBE support more modules Support CR modules for 25G devices and QSFP modules for 40G devices. And implement .get_module_eeprom_by_page() to get module info. v1: https://lore.kernel.org/all/20251112055841.22984-1-jiawenwu@trustnetic.com/ ==================== Link: https://patch.msgid.link/20251118080259.24676-1-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 12:47:28 +01:00
Jiawen Wu	9b97b6b563	net: txgbe: support getting module EEPROM by page Getting module EEPROM has been supported in TXGBE SP devices, since SFP driver has already implemented it. Now add support to read module EEPROM for AML devices. Towards this, add a new firmware mailbox command to get the page data. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20251118080259.24676-6-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 12:47:26 +01:00
Jiawen Wu	c6e97daec5	net: txgbe: delay to identify modules in .ndo_open For QSFP modules, there is a possibility that the module cannot be identified when read I2C immediately in .ndo_open. So just set the flag WX_FLAG_NEED_MODULE_RESET and do it in the subtask, which always wait 200 ms to identify the module. And this change has no impact on the original adaptation. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20251118080259.24676-5-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 12:47:26 +01:00
Jiawen Wu	57d39faed4	net: txgbe: improve functions of AML 40G devices Support to identify QSFP modules for AML 40G devices. The definition of GPIO pins follows the design of the QSFP modules, and TXGBE_GPIOBIT_4 is used for module present. Meanwhile, implement phylink in XLGMII mode by default, and get the link state from MAC link. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20251118080259.24676-4-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 12:47:26 +01:00
Jiawen Wu	dbba6b7a47	net: txgbe: rename the SFP related QSFP supported will be introduced for AML 40G devices, the code related to identify various modules should be renamed to more appropriate names. And struct txgbe_hic_i2c_read used to get module information is renamed as struct txgbe_hic_get_module_info, because another SW-FW command to read I2C will be added later. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20251118080259.24676-3-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 12:47:26 +01:00
Jiawen Wu	354d128aa7	net: txgbe: support CR modules for AML devices Support to identify 25G/10G CR modules for AML devices. Autoneg is enbaled by default in CR mode. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20251118080259.24676-2-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 12:47:26 +01:00
David Bauer	d70b592551	l2tp: reset skb control buffer on xmit The L2TP stack did not reset the skb control buffer before sending the encapsulated package. In a setup with an ath10k radio and batman-adv over an L2TP tunnel massive fragmentations happen sporadically if the L2TP tunnel is established over IPv4. L2TP might reset some of the fields in the IP control buffer, but L2TP assumes the type of the control buffer to be of an IPv4 packet. In case the L2TP interface is used as a batadv hardif or the packet is an IPv6 packet, this assumption breaks. Clear the entire control buffer to avoid such mishaps altogether. Fixes: `f77ae93904` ("[PPPOL2TP]: Reset meta-data in xmit function") Signed-off-by: David Bauer <mail@david-bauer.net> Link: https://patch.msgid.link/20251118001619.242107-1-mail@david-bauer.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 11:52:24 +01:00
Oleksij Rempel	3ceb6ac211	net: dsa: microchip: lan937x: Fix RGMII delay tuning Correct RGMII delay application logic in lan937x_set_tune_adj(). The function was missing `data16 &= ~PORT_TUNE_ADJ` before setting the new delay value. This caused the new value to be bitwise-OR'd with the existing PORT_TUNE_ADJ field instead of replacing it. For example, when setting the RGMII 2 TX delay on port 4, the intended TUNE_ADJUST value of 0 (RGMII_2_TX_DELAY_2NS) was incorrectly OR'd with the default 0x1B (from register value 0xDA3), leaving the delay at the wrong setting. This patch adds the missing mask to clear the field, ensuring the correct delay value is written. Physical measurements on the RGMII TX lines confirm the fix, showing the delay changing from ~1ns (before change) to ~2ns. While testing on i.MX 8MP showed this was within the platform's timing tolerance, it did not match the intended hardware-characterized value. Fixes: `b19ac41faa` ("net: dsa: microchip: apply rgmii tx and rx delay in phylink mac config") Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20251114090951.4057261-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-20 11:26:14 +01:00
Johannes Berg	0ff8eeafba	Merge tag 'rtw-2025-11-20' of https://github.com/pkshih/rtw Ping-Ke Shih says: ================== rtw patches for v6.18-rc7 Fix firmware goes wrong and causes device unusable after scanning. This issue presents under certain regulatory domain reported from end users. ================== Link: https://patch.msgid.link/8217bee0-96c4-44c1-9593-2e9ca12eccc5@RTKEXHMBS03.realtek.com.tw Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-11-20 09:44:08 +01:00
Jakub Kicinski	d877b1013c	Merge branch 'net-mlx5-move-notifiers-outside-the-devlink-lock' Tariq Toukan says: ==================== net/mlx5: Move notifiers outside the devlink lock This series by Cosmin moves blocking notifier registration in the mlx5 driver outside the devlink lock during probe. This is mostly a no-op refactoring that consists of multiple pieces. It is necessary because upcoming code will introduce a potential locking cycle between the devlink lock and the blocking notifier head mutexes, so these notifiers must move out of the devlink-locked critical section. ==================== Link: https://patch.msgid.link/1763325940-1231508-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:32:30 -08:00
Cosmin Ratiu	64ad6470c8	net/mlx5: Move SF dev table notifier registration outside the PF devlink lock This completes the previous patches by moving notifier registration for SF dev tables outside the devlink locked critical section in mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions. This is only done for non-SFs, since SFs do not have a SF HW table themselves. After this patch, notifiers can grab the PF devlink lock (soon to be necessary) without creating a locking cycle. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763325940-1231508-7-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:32:28 -08:00
Cosmin Ratiu	d4a0acbd94	net/mlx5: Move the SF table notifiers outside the devlink lock Move the SF table notifiers registration/unregistration outside of mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions. This is only done for non-SFs, since SFs do not have a SF table themselves and thus don't need notifiers. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763325940-1231508-6-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:32:28 -08:00
Cosmin Ratiu	e63c9c5f0a	net/mlx5: Move the SF HW table notifier outside the devlink lock Move the SF HW table notifier registration/unregistration outside of mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions. This is only done for non-SFs, since SFs do not have a SF HW table themselves. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763325940-1231508-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:32:27 -08:00
Cosmin Ratiu	d3a356db85	net/mlx5: Move the vhca event notifier outside of the devlink lock The vhca event notifier consists of an atomic notifier for vhca state changes (used for SF events), multiple workqueues and a blocking notifier chain for delivering the vhca state change events for further processing. This patch moves the vhca notifier head outside of mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions. This allows called notifiers to grab the PF devlink lock which was previously impossible because it would create a circular lock dependency. mlx5_vhca_event_stop() is now called earlier in the cleanup phase and flushes the workqueues to ensure that after the call, there are no pending events. This simplifies the cleanup flow for vhca event consumers. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763325940-1231508-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:32:27 -08:00
Cosmin Ratiu	3fee828789	net/mlx5: Move the esw mode notifier chain outside the devlink lock The esw mode change notifier chain is initialized/cleaned up in mlx5_init_one() / mlx5_uninit_one() with the devlink lock held. Move the notifier head from the eswitch struct into mlx5_priv directly, and initialize it outside the critical section. This will allow notifier registration to happen earlier in the init procedure in subsequent patches. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763325940-1231508-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:32:27 -08:00
Cosmin Ratiu	b6b03097f9	net/mlx5: Initialize events outside devlink lock Move event init/cleanup outside of mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions. By doing this, we avoid the events being reinitialized on devlink reload and, more importantly, the events->sw_nh notifier chain becomes available earlier in the init procedure, which will be used in subsequent patches. This makes sense because the events struct is pure software, independent of any HW details. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763325940-1231508-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:32:27 -08:00
Jakub Kicinski	beabc06ffb	Merge branch 'net-adjust-conservative-values-around-napi' Jason Xing says: ==================== net: adjust conservative values around napi This series keeps at least 96 skbs per cpu and frees 32 skbs at one time in conclusion. More initial discussions with Eric can be seen at the link [1]. [1]: https://lore.kernel.org/all/CAL+tcoBEEjO=-yvE7ZJ4sB2smVBzUht1gJN85CenJhOKV==================== Link: https://patch.msgid.link/20251118070646.61344-1-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:29:29 -08:00
Jason Xing	5d7fc63ab8	net: prefetch the next skb in napi_skb_cache_get() After getting the current skb in napi_skb_cache_get(), the next skb in cache is highly likely to be used soon, so prefetch would be helpful. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20251118070646.61344-5-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:29:25 -08:00
Jason Xing	2d67b5c5c6	net: use NAPI_SKB_CACHE_FREE to keep 32 as default to do bulk free - Replace NAPI_SKB_CACHE_HALF with NAPI_SKB_CACHE_FREE - Only free 32 skbs in napi_skb_cache_put() Since the first patch adjusting NAPI_SKB_CACHE_SIZE to 128, the number of packets to be freed in the softirq was increased from 32 to 64. Considering a subsequent net_rx_action() calling napi_poll() a few times can easily consume the 64 available slots and we can afford keeping a higher value of sk_buffs in per-cpu storage, decrease NAPI_SKB_CACHE_FREE to 32 like before. So now the logic is 1) keeping 96 skbs, 2) freeing 32 skbs at one time. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20251118070646.61344-4-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:29:24 -08:00
Jason Xing	01d7385618	net: increase default NAPI_SKB_CACHE_BULK to 32 The previous value 16 is a bit conservative, so adjust it along with NAPI_SKB_CACHE_SIZE, which can minimize triggering memory allocation in napi_skb_cache_get*(). Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20251118070646.61344-3-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:29:24 -08:00
Jason Xing	3505730d90	net: increase default NAPI_SKB_CACHE_SIZE to 128 After commit `b61785852e` ("net: increase skb_defer_max default to 128") changed the value sysctl_skb_defer_max to avoid many calls to kick_defer_list_purge(), the same situation can be applied to NAPI_SKB_CACHE_SIZE that was proposed in 2016. It's a trade-off between using pre-allocated memory in skb_cache and saving more a bit heavy function calls in the softirq context. With this patch applied, we can have more skbs per-cpu to accelerate the sending path that needs to acquire new skbs. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20251118070646.61344-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:29:24 -08:00
Jakub Kicinski	7c9dd38602	Merge branch 'disable-clkout-on-rtl8211f-d-i-vd-cg' Vladimir Oltean says: ==================== Disable CLKOUT on RTL8211F(D)(I)-VD-CG The Realtek RTL8211F(D)(I)-VD-CG is similar to other RTL8211F models in that the CLKOUT signal can be turned off - a feature requested to reduce EMI, and implemented via "realtek,clkout-disable" as documented in Documentation/devicetree/bindings/net/realtek,rtl82xx.yaml. It is also dissimilar to said PHY models because it has no PHYCR2 register, and disabling CLKOUT is done through some other register. The strategy adopted in this 6-patch series is to make the PHY driver not think in terms of "priv->has_phycr2" and "priv->phycr2", but of more high-level features ("priv->disable_clk_out") while maintaining behaviour. Then, the logic is extended for the new PHY. Very loosely based on previous work from Clark Wang, who took a different approach, to pretend that the RTL8211FVD_CLKOUT_REG is actually this PHY's PHYCR2. ==================== Link: https://patch.msgid.link/20251117234033.345679-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:24:25 -08:00
Vladimir Oltean	4465ae435d	net: phy: realtek: create rtl8211f_config_phy_eee() helper To simplify the rtl8211f_config_init() control flow and get rid of "early" returns for PHYs where the PHYCR2 register is absent, move the entire logic sub-block that deals with disabling PHY-mode EEE to a separate function. There, it is much more obvious what the early "return 0" skips, and it becomes more difficult to accidentally skip unintended stuff. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251117234033.345679-7-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:24:23 -08:00
Vladimir Oltean	bb78b71faf	net: phy: realtek: eliminate priv->phycr1 variable Previous changes have replaced the machine-level priv->phycr2 with a high-level priv->disable_clk_out. This created a discrepancy with priv->phycr1 which is resolved here, for uniformity. One advantage of this new implementation is that we don't read priv->phycr1 in rtl821x_probe() if we're never going to modify it. We never test the positive return code from phy_modify_mmd_changed(), so we could just as well use phy_modify_mmd(). I took the ALDPS feature description from commit `d90db36a9e` ("net: phy: realtek: add dt property to enable ALDPS mode") and transformed it into a function comment - the feature is sufficiently non-obvious to deserve that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251117234033.345679-6-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:24:23 -08:00
Vladimir Oltean	e1a31c41be	net: phy: realtek: allow CLKOUT to be disabled on RTL8211F(D)(I)-VD-CG Add CLKOUT disable support for RTL8211F(D)(I)-VD-CG. Like with other PHY variants, this feature might be requested by customers when the clock output is not used, in order to reduce electromagnetic interference (EMI). In the common driver, the CLKOUT configuration is done through PHYCR2. The RTL_8211FVD_PHYID is singled out as not having that register, and execution in rtl8211f_config_init() returns early after commit `2c67301584` ("net: phy: realtek: Avoid PHYCR2 access if PHYCR2 not present"). But actually CLKOUT is configured through a different register for this PHY. Instead of pretending this is PHYCR2 (which it is not), just add some code for modifying this register inside the rtl8211f_disable_clk_out() function, and move that outside the code portion that runs only if PHYCR2 exists. In practice this reorders the PHYCR2 writes to disable PHY-mode EEE and to disable the CLKOUT for the normal RTL8211F variants, but this should be perfectly fine. It was not noted that RTL8211F(D)(I)-VD-CG would need a genphy_soft_reset() call after disabling the CLKOUT. Despite that, we do it out of caution and for symmetry with the other RTL8211F models. Co-developed-by: Clark Wang <xiaoning.wang@nxp.com> Signed-off-by: Clark Wang <xiaoning.wang@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251117234033.345679-5-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:24:23 -08:00
Vladimir Oltean	910ac7bfb1	net: phy: realtek: eliminate has_phycr2 variable This variable is assigned in rtl821x_probe() and used in rtl8211f_config_init(), which is more complex than it needs to be. Simply testing the same condition from rtl821x_probe() in rtl8211f_config_init() yields the same result (the PHY driver ID is a runtime invariant), but with one temporary variable less. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251117234033.345679-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:24:23 -08:00
Vladimir Oltean	27033d0691	net: phy: realtek: eliminate priv->phycr2 variable The RTL8211F(D)(I)-VD-CG PHY also has support for disabling the CLKOUT, and we'd like to introduce the "realtek,clkout-disable" property for that. But it isn't done through the PHYCR2 register, and it becomes awkward to have the driver pretend that it is. So just replace the machine-level "u16 phycr2" variable with a logical "bool disable_clk_out", which scales better to the other PHY as well. The change is a complete functional equivalent. Before, if the device tree property was absent, priv->phycr2 would contain the RTL8211F_CLKOUT_EN bit as read from hardware. Now, we don't save priv->phycr2, but we just don't call phy_modify_paged() on it. Also, we can simply call phy_modify_paged() with the "set" argument to 0. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251117234033.345679-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:24:23 -08:00
Vladimir Oltean	8e982441ba	net: phy: realtek: create rtl8211f_config_rgmii_delay() The control flow in rtl8211f_config_init() has some pitfalls which were probably unintended. Specifically it has an early return: switch (phydev->interface) { ... default: /* the rest of the modes imply leaving delay as is. */ return 0; } which exits the entire config_init() function. This means it also skips doing things such as disabling CLKOUT or disabling PHY-mode EEE. For the RTL8211FS, which uses PHY_INTERFACE_MODE_SGMII, this might be a problem. However, I don't know that it is, so there is no Fixes: tag. The issue was observed through code inspection. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251117234033.345679-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:24:22 -08:00

1 2 3 4 5 ...

1399395 Commits