This moves bnxt_hsi.h contents to a common location so it can be
properly referenced by bnxt_en, bnxt_re, and bnge.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250714170202.39688-1-gospo@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR (net-6.16-rc6-2).
No conflicts.
Adjacent changes:
drivers/net/wireless/mediatek/mt76/mt7925/mcu.c
c701574c54 ("wifi: mt76: mt7925: fix invalid array index in ssid assignment during hw scan")
b3a431fe2e ("wifi: mt76: mt7925: fix off by one in mt7925_mcu_hw_scan()")
drivers/net/wireless/mediatek/mt76/mt7996/mac.c
62da647a2b ("wifi: mt76: mt7996: Add MLO support to mt7996_tx_check_aggr()")
dc66a129ad ("wifi: mt76: add a wrapper for wcid access with validation")
drivers/net/wireless/mediatek/mt76/mt7996/main.c
3dd6f67c66 ("wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl()")
8989d8e90f ("wifi: mt76: mt7996: Do not set wcid.sta to 1 in mt7996_mac_sta_event()")
net/mac80211/cfg.c
58fcb1b428 ("wifi: mac80211: reject VHT opmode for unsupported channel widths")
037dc18ac3 ("wifi: mac80211: add support for storing station S1G capabilities")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bnxt_fill_drv_seg_record() calls bnxt_dbg_hwrm_log_buffer_flush()
to flush the FW trace buffer. This needs to be done before we
call bnxt_copy_ctx_mem() to copy the trace data.
Without this fix, the coredump may not contain all the FW
traces.
Fixes: 3c2179e663 ("bnxt_en: Add FW trace coredump segments to the coredump")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250710213938.1959625-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In bnxt_ets_validate(), the code incorrectly loops over all possible
traffic classes to check and add the ETS settings. Fix it to loop
over the configured traffic classes only.
The unconfigured traffic classes will default to TSA_ETS with 0
bandwidth. Looping over these unconfigured traffic classes may
cause the validation to fail and trigger this error message:
"rejecting ETS config starving a TC\n"
The .ieee_setets() will then fail.
Fixes: 7df4ae9fe8 ("bnxt_en: Implement DCBNL to support host-based DCBX.")
Reviewed-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250710213938.1959625-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a network device with netdev features enabled.
Some features are enabled based on the capabilities
advertised by the firmware. Add the skeleton of minimal
netdev operations. Additionally, initialize the parameters
for rings (TX/RX/Completion).
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-11-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Query resources from the firmware and, based on the
availability of resources, initialize the default
settings. The default settings include:
1. Rings and other resource reservations with the
firmware. This ensures that resources are reserved
before network and auxiliary devices are created.
2. Mapping the BAR, which helps access doorbells since
its size is known after querying the firmware.
3. Retrieving the TCs and hardware CoS queue mappings.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-10-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add irq allocation functions. This will help
to allocate IRQs to both netdev and RoCE aux devices.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-9-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Get the resources and capabilities from the firmware.
Add functions to manage the resources with the firmware.
These functions will help netdev reserve the resources
with the firmware before registering the device in future
patches. The resources and their information, such as
the maximum available and reserved, are part of the members
present in the bnge_hw_resc struct.
The bnge_reserve_rings() function also populates
the RSS table entries once the RX rings are reserved with
the firmware.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-8-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Backing store or context memory on the host helps the
device to manage rings, stats and other resources.
Context memory is allocated with the help of ring
alloc/free functions.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-7-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add ring allocation/free mechanism which help
to allocate rings (TX/RX/Completion) and backing
stores memory on the host for the device.
Future patches will use these functions.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-6-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Query firmware with the help of basic firmware commands and
cache the capabilities. With the help of basic commands
start the initialization process of the driver with the
firmware.
Since basic information is available from the firmware,
register with devlink.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-5-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support to communicate with the firmware.
Future patches will use these functions to send the
messages to the firmware.
Functions support allocating request/response buffers
to send a particular command. Each command has certain
timeout value to which the driver waits for response from
the firmware. In error case, commands may be either timed
out waiting on response from the firmware or may return
a specific error code.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-4-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allocate a base device and devlink interface with minimal
devlink ops.
Add dsn and board related information.
Map PCIe BAR (bar0), which helps to communicate with the
firmware.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-3-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I received a kernel-test-bot report[1] that shows the
[-Wunused-but-set-variable] warning. Since the previous commit I made, as
the 'Fixes' tag shows, gives users an option to turn on and off the
CONFIG_RFS_ACCEL, the issue then can be discovered and reproduced with
GCC specifically.
Like Simon and Jakub suggested, use fewer #ifdefs which leads to fewer
bugs.
[1]
All warnings (new ones prefixed by >>):
drivers/net/ethernet/broadcom/bnxt/bnxt.c: In function 'bnxt_request_irq':
>> drivers/net/ethernet/broadcom/bnxt/bnxt.c:10703:9: warning: variable 'j' set but not used [-Wunused-but-set-variable]
10703 | int i, j, rc = 0;
| ^
Fixes: 9b6a30febd ("net: allow rps/rfs related configs to be switched")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506282102.x1tXt0qz-lkp@intel.com/
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Rx rings are filled with Rx buffers. Which are supposed to fit
packet headers (or MTU if HW-GRO is disabled). The aggregation buffers
are filled with "device pages". Adjust the sizes of the page pool
recycling ring appropriately, based on ratio of the size of the
buffer on given ring vs system page size. Otherwise on a system
with 64kB pages we end up with >700MB of memory sitting in every
single page pool cache.
Correct the size calculation for the head_pool. Since the buffers
there are always small I'm pretty sure I meant to cap the size
at 1k, rather than make it the lowest possible size. With 64k pages
1k cache with a 1k ring is 64x larger than we need.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250626165441.4125047-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Correct spelling as flagged by codespell.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use netmem_dma_*() helpers and declare netmem_tx to support netmem TX.
By this change, all bnxt devices will support the netmem TX.
Unreadable skbs are not going to be handled by the TX push logic.
So, it checks whether a skb is readable or not before the TX push logic.
netmem TX can be tested with ncdevmem.c
Acked-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250619144058.147051-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 325eb217e4.
udp_tunnel infra doesn't need RTNL, should be safe to get back
to only netdev instance lock.
Cc: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com>
Link: https://patch.msgid.link/20250616162117.287806-7-stfomichev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Drivers that are using ops lock and don't depend on RTNL lock
still need to manage it because udp_tunnel's RTNL dependency.
Introduce new udp_tunnel_nic_lock and use it instead of
rtnl_lock. Drop non-UDP_TUNNEL_NIC_INFO_MAY_SLEEP mode from
udp_tunnel infra (udp_tunnel_nic_device_sync_work needs to
grab udp_tunnel_nic_lock mutex and might sleep).
Cover more places in v4:
- netlink
- udp_tunnel_notify_add_rx_port (ndo_open)
- triggers udp_tunnel_nic_device_sync_work
- udp_tunnel_notify_del_rx_port (ndo_stop)
- triggers udp_tunnel_nic_device_sync_work
- udp_tunnel_get_rx_info (__netdev_update_features)
- triggers NETDEV_UDP_TUNNEL_PUSH_INFO
- udp_tunnel_drop_rx_info (__netdev_update_features)
- triggers NETDEV_UDP_TUNNEL_DROP_INFO
- udp_tunnel_nic_reset_ntf (ndo_open)
- notifiers
- udp_tunnel_nic_netdevice_event, depending on the event:
- triggers NETDEV_UDP_TUNNEL_PUSH_INFO
- triggers NETDEV_UDP_TUNNEL_DROP_INFO
- ethnl_tunnel_info_reply_size
- udp_tunnel_nic_set_port_priv (two intel drivers)
Cc: Michael Chan <michael.chan@broadcom.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com>
Link: https://patch.msgid.link/20250616162117.287806-4-stfomichev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Migrate to new callbacks added by commit 9bb00786fc ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250617014555.434790-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Migrate to new callbacks added by commit 9bb00786fc ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
The driver has no other RXNFC functionality so the SET callback can
be now removed.
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/20250617014555.434790-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The disable sequence in bcmgenet_phy_power_set() is updated to
match the inverse sequence and timing (and spacing) of the
enable sequence. This ensures that LEDs driven by the GENET IP
are disabled when the GPHY is powered down.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250614025817.3808354-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Improved wording and grammar in several comments for clarity.
"the must belongs" -> "it must belong"
"mininum" -> "minimum"
"fileds" -> "fields"
Replaced return -1 with -EINVAL in hwrm_ring_alloc_send_msg()
to return a proper error code.
These changes enhance code readability and consistent error handling.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250615154051.1365631-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The commit under the Fixes tag below which updates the VNICs' RSS
and MRU during .ndo_queue_start(), needs to be extended to cover any
non-default RSS contexts which have their own VNICs. Without this
step, packets that are destined to a non-default RSS context may be
dropped after .ndo_queue_start().
We further optimize this scheme by updating the VNIC only if the
RX ring being restarted is in the RSS table of the VNIC. Updating
the VNIC (in particular setting the MRU to 0) will momentarily stop
all traffic to all rings in the RSS table. Any VNIC that has the
RX ring excluded from the RSS table can skip this step and avoid the
traffic disruption.
Note that this scheme is just an improvement. A VNIC with multiple
rings in the RSS table will still see traffic disruptions to all rings
in the RSS table when one of the rings is being restarted. We are
working on a FW scheme that will improve upon this further.
Fixes: 5ac066b7b0 ("bnxt_en: Fix queue start to update vnic RSS table")
Reported-by: David Wei <dw@davidwei.uk>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250613231841.377988-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new helper function that will configure MRU and RSS table
of a VNIC. This will be useful when we configure both on a VNIC
when resetting an RX ring. This function will be used again in
the next bug fix patch where we have to reconfigure VNICs for RSS
contexts.
Suggested-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Wei <dw@davidwei.uk>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250613231841.377988-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Before the commit under the Fixes tag below, bnxt_ulp_stop() and
bnxt_ulp_start() were always invoked in pairs. After that commit,
the new bnxt_ulp_restart() can be invoked after bnxt_ulp_stop()
has been called. This may result in the RoCE driver's aux driver
.suspend() method being invoked twice. The 2nd bnxt_re_suspend()
call will crash when it dereferences a NULL pointer:
(NULL ib_device): Handle device suspend call
BUG: kernel NULL pointer dereference, address: 0000000000000b78
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 20 UID: 0 PID: 181 Comm: kworker/u96:5 Tainted: G S 6.15.0-rc1 #4 PREEMPT(voluntary)
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
Workqueue: bnxt_pf_wq bnxt_sp_task [bnxt_en]
RIP: 0010:bnxt_re_suspend+0x45/0x1f0 [bnxt_re]
Code: 8b 05 a7 3c 5b f5 48 89 44 24 18 31 c0 49 8b 5c 24 08 4d 8b 2c 24 e8 ea 06 0a f4 48 c7 c6 04 60 52 c0 48 89 df e8 1b ce f9 ff <48> 8b 83 78 0b 00 00 48 8b 80 38 03 00 00 a8 40 0f 85 b5 00 00 00
RSP: 0018:ffffa2e84084fd88 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffffb4b6b934 RDI: 00000000ffffffff
RBP: ffffa1760954c9c0 R08: 0000000000000000 R09: c0000000ffffdfff
R10: 0000000000000001 R11: ffffa2e84084fb50 R12: ffffa176031ef070
R13: ffffa17609775000 R14: ffffa17603adc180 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffffa17daa397000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000b78 CR3: 00000004aaa30003 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
bnxt_ulp_stop+0x69/0x90 [bnxt_en]
bnxt_sp_task+0x678/0x920 [bnxt_en]
? __schedule+0x514/0xf50
process_scheduled_works+0x9d/0x400
worker_thread+0x11c/0x260
? __pfx_worker_thread+0x10/0x10
kthread+0xfe/0x1e0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2b/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
Check the BNXT_EN_FLAG_ULP_STOPPED flag and do not proceed if the flag
is already set. This will preserve the original symmetrical
bnxt_ulp_stop() and bnxt_ulp_start().
Also, inside bnxt_ulp_start(), clear the BNXT_EN_FLAG_ULP_STOPPED
flag after taking the mutex to avoid any race condition. And for
symmetry, only proceed in bnxt_ulp_start() if the
BNXT_EN_FLAG_ULP_STOPPED is set.
Fixes: 3c163f35bd ("bnxt_en: Optimize recovery path ULP locking in the driver")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Co-developed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250613231841.377988-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make use of the return value from napi_complete_done(). This allows
users to use the gro_flush_timeout and napi_defer_hard_irqs sysfs
attributes for configuring software interrupt coalescing.
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250611212730.252342-2-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make use of the return value from napi_complete_done(). This allows users to
use the gro_flush_timeout and napi_defer_hard_irqs sysfs attributes for
configuring software interrupt coalescing.
Signed-off-by: Zak Kemble <zakkemble@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250610220403.935-2-zakkemble@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmhAa9EUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vyA3w//aX8d73z/xVxkYLMN/6XQA5fdmd4d
Dv4n0Pjf0WCMKbsgRCdXEYLvcHV8VhH5iCR/b2UsFm9LjxSIRuqE5XosY3bNhrHn
xVKEh2prq2XZOibWrFkJ+RZ0FF7Ogq1Uy5gUBbBHbE1q1byZzrOALaF3FWGaDIZQ
6QLLAFtd3UtqOOUu8J8P9N15uFR8gunyfuM9U7TLMcy4B8txk6T6m/9xAWtRURuJ
I6WN8lO+g8Nl2mL9m27+wyWiVT3tKqoMwp8rVtym/L5JQOmHycYhn0WQAr2dPCMs
Xbgmoeei0je7mZvk5btpt68NAKQ3ZnCVkxbbINBkUxAjI0dbI6h37EhW18ShYVUk
CCo4fmaFtwP8qNN9tSvDN8vZdGB44fN5tIz4lmGzKk5gt+oV50RC/APrzC+PJBQ0
+2SdDVKj71Gr2H1VnI6uLB7oQ+tp7TOdhg+DGV4bdc6QFnsM+BpKWRq5f1UQcau/
XVDmorM/2t6z0DNktAv3NFwSodUjk1loWESr/pRBH1AqAWZTK98PWIg97XYsal59
zbJ3dLrnCqUNozeVgjtZo1LWD2FZaVTvhq2NY7D+QPpnMGhFUhHxNliZUXiQa1q4
boI2hEFdu3IQP/OC2a1zGJyMRLU43d5rhZ1U5xQSVtM0c3lgCY7rn/t26LymQVPA
SYdg2jBcnhe6gXo=
=eWJw
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Print the actual delay time in pci_bridge_wait_for_secondary_bus()
instead of assuming it was 1000ms (Wilfred Mallawa)
- Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI
devices', which broke resume from system sleep on AMD platforms and
has been fixed by other commits (Lukas Wunner)
Resource management:
- Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated
and unnecessary (Philipp Stanner)
- Remove pcim_iounmap_regions() and pcim_request_region_exclusive()
and related flags since all uses have been removed (Philipp
Stanner)
- Rework devres 'request' functions so they are no longer 'hybrid',
i.e., their behavior no longer depends on whether
pcim_enable_device or pci_enable_device() was used, and remove
related code (Philipp Stanner)
- Warn (not BUG()) about failure to assign optional resources (Ilpo
Järvinen)
Error handling:
- Log the DPC Error Source ID only when it's actually valid (when
ERR_FATAL or ERR_NONFATAL was received from a downstream device)
and decode into bus/device/function (Bjorn Helgaas)
- Determine AER log level once and save it so all related messages
use the same level (Karolina Stolarek)
- Use KERN_WARNING, not KERN_ERR, when logging PCIe Correctable
Errors (Karolina Stolarek)
- Ratelimit PCIe Correctable and Non-Fatal error logging, with sysfs
controls on interval and burst count, to avoid flooding logs and
RCU stall warnings (Jon Pan-Doh)
Power management:
- Increment PM usage counter when probing reset methods so we don't
try to read config space of a powered-off device (Alex Williamson)
- Set all devices to D0 during enumeration to ensure ACPI opregion is
connected via _REG (Mario Limonciello)
Power control:
- Rename pwrctrl Kconfig symbols from 'PWRCTL' to 'PWRCTRL' to match
the filename paths. Retain old deprecated symbols for
compatibility, except for the pwrctrl slot driver
(PCI_PWRCTRL_SLOT) (Johan Hovold)
- When unregistering pwrctrl, cancel outstanding rescan work before
cleaning up data structures to avoid use-after-free issues (Brian
Norris)
Bandwidth control:
- Simplify link bandwidth controller by replacing the count of Link
Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN
flag (Ilpo Järvinen)
- Update the Link Speed after retraining, since the Link Speed may
have changed (Ilpo Järvinen)
PCIe native device hotplug:
- Ignore Presence Detect Changed caused by DPC.
pciehp already ignores Link Down/Up events caused by DPC, but on
slots using in-band presence detect, DPC causes a spurious Presence
Detect Changed event (Lukas Wunner)
- Ignore Link Down/Up caused by Secondary Bus Reset.
On hotplug ports using in-band presence detect, the reset causes a
Presence Detect Changed event, which mistakenly caused teardown and
re-enumeration of the device. Drivers may need to annotate code
that resets their device (Lukas Wunner)
Virtualization:
- Add an ACS quirk for Loongson Root Ports that don't advertise ACS
but don't allow peer-to-peer transactions between Root Ports; the
quirk allows each Root Port to be in a separate IOMMU group (Huacai
Chen)
Endpoint framework:
- For fixed-size BARs, retain both the actual size and the possibly
larger size allocated to accommodate iATU alignment requirements
(Jerome Brunet)
- Simplify ctrl/SPAD space allocation and avoid allocating more space
than needed (Jerome Brunet)
- Correct MSI-X PBA offset calculations for DesignWare and Cadence
endpoint controllers (Niklas Cassel)
- Align the return value (number of interrupts) encoding for
pci_epc_get_msi()/pci_epc_ops::get_msi() and
pci_epc_get_msix()/pci_epc_ops::get_msix() (Niklas Cassel)
- Align the nr_irqs parameter encoding for
pci_epc_set_msi()/pci_epc_ops::set_msi() and
pci_epc_set_msix()/pci_epc_ops::set_msix() (Niklas Cassel)
Common host controller library:
- Convert pci-host-common to a library so platforms that don't need
native host controller drivers don't need to include these helper
functions (Manivannan Sadhasivam)
Apple PCIe controller driver:
- Extract ECAM bridge creation helper from pci_host_common_probe() to
separate driver-specific things like MSI from PCI things (Marc
Zyngier)
- Dynamically allocate RID-to_SID bitmap to prepare for SoCs with
varying capabilities (Marc Zyngier)
- Skip ports disabled in DT when setting up ports (Janne Grunau)
- Add t6020 compatible string (Alyssa Rosenzweig)
- Add T602x PCIe support (Hector Martin)
- Directly set/clear INTx mask bits because T602x dropped the
accessors that could do this without locking (Marc Zyngier)
- Move port PHY registers to their own reg items to accommodate
T602x, which moves them around; retain default offsets for existing
DTs that lack phy%d entries with the reg offsets (Hector Martin)
- Stop polling for core refclk, which doesn't work on T602x and the
bootloader has already done anyway (Hector Martin)
- Use gpiod_set_value_cansleep() when asserting PERST# in probe
because we're allowed to sleep there (Hector Martin)
Cadence PCIe controller driver:
- Drop a runtime PM 'put' to resolve a runtime atomic count underflow
(Hans Zhang)
- Make the cadence core buildable as a module (Kishon Vijay Abraham I)
- Add cdns_pcie_host_disable() and cdns_pcie_ep_disable() for use by
loadable drivers when they are removed (Siddharth Vadapalli)
Freescale i.MX6 PCIe controller driver:
- Apply link training workaround only on IMX6Q, IMX6SX, IMX6SP
(Richard Zhu)
- Remove redundant dw_pcie_wait_for_link() from
imx_pcie_start_link(); since the DWC core does this, imx6 only
needs it when retraining for a faster link speed (Richard Zhu)
- Toggle i.MX95 core reset to align with PHY powerup (Richard Zhu)
- Set SYS_AUX_PWR_DET to work around i.MX95 ERR051624 erratum: in
some cases, the controller can't exit 'L23 Ready' through Beacon or
PERST# deassertion (Richard Zhu)
- Clear GEN3_ZRXDC_NONCOMPL to work around i.MX95 ERR051586 erratum:
controller can't meet 2.5 GT/s ZRX-DC timing when operating at 8
GT/s, causing timeouts in L1 (Richard Zhu)
- Wait for i.MX95 PLL lock before enabling controller (Richard Zhu)
- Save/restore i.MX95 LUT for suspend/resume (Richard Zhu)
Mobiveil PCIe controller driver:
- Return bool (not int) for link-up check in
mobiveil_pab_ops.link_up() and layerscape-gen4, mobiveil (Hans
Zhang)
NVIDIA Tegra194 PCIe controller driver:
- Create debugfs directory for 'aspm_state_cnt' only when
CONFIG_PCIEASPM is enabled, since there are no other entries (Hans
Zhang)
Qualcomm PCIe controller driver:
- Add OF support for parsing DT 'eq-presets-<N>gts' property for lane
equalization presets (Krishna Chaitanya Chundru)
- Read Maximum Link Width from the Link Capabilities register if DT
lacks 'num-lanes' property (Krishna Chaitanya Chundru)
- Add Physical Layer 64 GT/s Capability ID and register offsets for
8, 32, and 64 GT/s lane equalization registers (Krishna Chaitanya
Chundru)
- Add generic dwc support for configuring lane equalization presets
(Krishna Chaitanya Chundru)
- Add DT and driver support for PCIe on IPQ5018 SoC (Nitheesh Sekar)
Renesas R-Car PCIe controller driver:
- Describe endpoint BAR 4 as being fixed size (Jerome Brunet)
- Document how to obtain R-Car V4H (r8a779g0) controller firmware
(Yoshihiro Shimoda)
Rockchip PCIe controller driver:
- Reorder rockchip_pci_core_rsts because
reset_control_bulk_deassert() deasserts in reverse order, to fix a
link training regression (Jensen Huang)
- Mark RK3399 as being capable of raising INTx interrupts (Niklas
Cassel)
Rockchip DesignWare PCIe controller driver:
- Check only PCIE_LINKUP, not LTSSM status, to determine whether the
link is up (Shawn Lin)
- Increase N_FTS (used in L0s->L0 transitions) and enable ASPM L0s
for Root Complex and Endpoint modes (Shawn Lin)
- Hide the broken ATS Capability in rockchip_pcie_ep_init() instead
of rockchip_pcie_ep_pre_init() so it stays hidden after PERST#
resets non-sticky registers (Shawn Lin)
- Call phy_power_off() before phy_exit() in rockchip_pcie_phy_deinit()
(Diederik de Haas)
Synopsys DesignWare PCIe controller driver:
- Set PORT_LOGIC_LINK_WIDTH to one lane to make initial link training
more robust; this will not affect the intended link width if all
lanes are functional (Wenbin Yao)
- Return bool (not int) for link-up check in dw_pcie_ops.link_up()
and armada8k, dra7xx, dw-rockchip, exynos, histb, keembay,
keystone, kirin, meson, qcom, qcom-ep, rcar_gen4, spear13xx,
tegra194, uniphier, visconti (Hans Zhang)
- Add debugfs support for exposing DWC device-specific PTM context
(Manivannan Sadhasivam)
TI J721E PCIe driver:
- Make j721e buildable as a loadable and removable module (Siddharth
Vadapalli)
- Fix j721e host/endpoint dependencies that result in link failures
in some configs (Arnd Bergmann)
Device tree bindings:
- Add qcom DT binding for 'global' interrupt (PCIe controller and
link-specific events) for ipq8074, ipq8074-gen3, ipq6018, sa8775p,
sc7280, sc8180x sdm845, sm8150, sm8250, sm8350 (Manivannan
Sadhasivam)
- Add qcom DT binding for 8 MSI SPI interrupts for msm8998, ipq8074,
ipq8074-gen3, ipq6018 (Manivannan Sadhasivam)
- Add dw rockchip DT binding for rk3576 and rk3562 (Kever Yang)
- Correct indentation and style of examples in brcm,stb-pcie,
cdns,cdns-pcie-ep, intel,keembay-pcie-ep, intel,keembay-pcie,
microchip,pcie-host, rcar-pci-ep, rcar-pci-host, xilinx-versal-cpm
(Krzysztof Kozlowski)
- Convert Marvell EBU (dove, kirkwood, armada-370, armada-xp) and
armada8k from text to schema DT bindings (Rob Herring)
- Remove obsolete .txt DT bindings for content that has been moved to
schemas (Rob Herring)
- Add qcom DT binding for MHI registers in IPQ5332, IPQ6018, IPQ8074
and IPQ9574 (Varadarajan Narayanan)
- Convert v3,v360epc-pci from text to DT schema binding (Rob Herring)
- Change microchip,pcie-host DT binding to be 'dma-noncoherent' since
PolarFire may be configured that way (Conor Dooley)
Miscellaneous:
- Drop 'pci' suffix from intel_mid_pci.c filename to match similar
files (Andy Shevchenko)
- All platforms with PCI have an MMU, so add PCI Kconfig dependency
on MMU to simplify build testing and avoid inadvertent build
regressions (Arnd Bergmann)
- Update Krzysztof Wilczyński's email address in MAINTAINERS
(Krzysztof Wilczyński)
- Update Manivannan Sadhasivam's email address in MAINTAINERS
(Manivannan Sadhasivam)"
* tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (147 commits)
MAINTAINERS: Update Manivannan Sadhasivam email address
PCI: j721e: Fix host/endpoint dependencies
PCI: j721e: Add support to build as a loadable module
PCI: cadence-ep: Introduce cdns_pcie_ep_disable() helper for cleanup
PCI: cadence-host: Introduce cdns_pcie_host_disable() helper for cleanup
PCI: cadence: Add support to build pcie-cadence library as a kernel module
MAINTAINERS: Update Krzysztof Wilczyński email address
PCI: Remove unnecessary linesplit in __pci_setup_bridge()
PCI: WARN (not BUG()) when we fail to assign optional resources
PCI: Remove unused pci_printk()
PCI: qcom: Replace PERST# sleep time with proper macro
PCI: dw-rockchip: Replace PERST# sleep time with proper macro
PCI: host-common: Convert to library for host controller drivers
PCI/ERR: Remove misleading TODO regarding kernel panic
PCI: cadence: Remove duplicate message code definitions
PCI: endpoint: Align pci_epc_set_msix(), pci_epc_ops::set_msix() nr_irqs encoding
PCI: endpoint: Align pci_epc_set_msi(), pci_epc_ops::set_msi() nr_irqs encoding
PCI: endpoint: Align pci_epc_get_msix(), pci_epc_ops::get_msix() return value encoding
PCI: endpoint: Align pci_epc_get_msi(), pci_epc_ops::get_msi() return value encoding
PCI: cadence-ep: Correct PBA offset in .set_msix() callback
...
Cross-merge networking fixes after downstream PR (net-6.15-rc8).
Conflicts:
80f2ab46c2 ("irdma: free iwdev->rf after removing MSI-X")
4bcc063939 ("ice, irdma: fix an off by one in error handling code")
c24a65b6a2 ("iidc/ice/irdma: Update IDC to support multiple consumers")
https://lore.kernel.org/20250513130630.280ee6c5@canb.auug.org.au
No extra adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When xdp is attached or detached, dev->ndo_bpf() is called by
do_setlink(), and it acquires netdev_lock() if needed.
Unlike other drivers, the bnxt driver is protected by netdev_lock while
xdp is attached/detached because it sets dev->request_ops_lock to true.
So, the bnxt_xdp(), that is callback of ->ndo_bpf should not acquire
netdev_lock().
But the xdp_features_{set | clear}_redirect_target() was changed to
acquire netdev_lock() internally.
It causes a deadlock.
To fix this problem, bnxt driver should use
xdp_features_{set | clear}_redirect_target_locked() instead.
Splat looks like:
============================================
WARNING: possible recursive locking detected
6.15.0-rc6+ #1 Not tainted
--------------------------------------------
bpftool/1745 is trying to acquire lock:
ffff888131b85038 (&dev->lock){+.+.}-{4:4}, at: xdp_features_set_redirect_target+0x1f/0x80
but task is already holding lock:
ffff888131b85038 (&dev->lock){+.+.}-{4:4}, at: do_setlink.constprop.0+0x24e/0x35d0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&dev->lock);
lock(&dev->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by bpftool/1745:
#0: ffffffffa56131c8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_setlink+0x1fe/0x570
#1: ffffffffaafa75a0 (&net->rtnl_mutex){+.+.}-{4:4}, at: rtnl_setlink+0x236/0x570
#2: ffff888131b85038 (&dev->lock){+.+.}-{4:4}, at: do_setlink.constprop.0+0x24e/0x35d0
stack backtrace:
CPU: 1 UID: 0 PID: 1745 Comm: bpftool Not tainted 6.15.0-rc6+ #1 PREEMPT(undef)
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
Call Trace:
<TASK>
dump_stack_lvl+0x7a/0xd0
print_deadlock_bug+0x294/0x3d0
__lock_acquire+0x153b/0x28f0
lock_acquire+0x184/0x340
? xdp_features_set_redirect_target+0x1f/0x80
__mutex_lock+0x1ac/0x18a0
? xdp_features_set_redirect_target+0x1f/0x80
? xdp_features_set_redirect_target+0x1f/0x80
? __pfx_bnxt_rx_page_skb+0x10/0x10 [bnxt_en
? __pfx___mutex_lock+0x10/0x10
? __pfx_netdev_update_features+0x10/0x10
? bnxt_set_rx_skb_mode+0x284/0x540 [bnxt_en
? __pfx_bnxt_set_rx_skb_mode+0x10/0x10 [bnxt_en
? xdp_features_set_redirect_target+0x1f/0x80
xdp_features_set_redirect_target+0x1f/0x80
bnxt_xdp+0x34e/0x730 [bnxt_en 11cbcce8fa11cff1dddd7ef358d6219e4ca9add3]
dev_xdp_install+0x3f4/0x830
? __pfx_bnxt_xdp+0x10/0x10 [bnxt_en 11cbcce8fa11cff1dddd7ef358d6219e4ca9add3]
? __pfx_dev_xdp_install+0x10/0x10
dev_xdp_attach+0x560/0xf70
dev_change_xdp_fd+0x22d/0x280
do_setlink.constprop.0+0x2989/0x35d0
? __pfx_do_setlink.constprop.0+0x10/0x10
? lock_acquire+0x184/0x340
? find_held_lock+0x32/0x90
? rtnl_setlink+0x236/0x570
? rcu_is_watching+0x11/0xb0
? trace_contention_end+0xdc/0x120
? __mutex_lock+0x946/0x18a0
? __pfx___mutex_lock+0x10/0x10
? __lock_acquire+0xa95/0x28f0
? rcu_is_watching+0x11/0xb0
? rcu_is_watching+0x11/0xb0
? cap_capable+0x172/0x350
rtnl_setlink+0x2cd/0x570
Fixes: 03df156dd3 ("xdp: double protect netdev->xdp_flags with netdev->lock")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250520071155.2462843-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
netdev_lock is already held when calling bnxt_ulp_irq_stop() and
bnxt_ulp_irq_restart(). When converting rtnl_lock to netdev_lock,
the original code was rtnl_dereference() to indicate that rtnl_lock
was already held. rcu_dereference_protected() is the correct
conversion after replacing rtnl_lock with netdev_lock.
Add a new helper netdev_lock_dereference() similar to
rtnl_dereference().
Fixes: 004b500801 ("eth: bnxt: remove most dependencies on RTNL")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250519204130.3097027-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hardware discarded packets are now counted in their own missed stat
instead of being lumped in with general errors.
Signed-off-by: Zak Kemble <zakkemble@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250519113257.1031-3-zakkemble@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update the driver to use ndo_get_stats64, rtnl_link_stats64 and
u64_stats_t counters for statistics.
Signed-off-by: Zak Kemble <zakkemble@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250519113257.1031-2-zakkemble@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All callers pass PHY_POLL, therefore remove irq argument from
fixed_phy_register().
Note: I keep the irq argument in fixed_phy_add_gpiod() for now,
for the case that somebody may want to use a GPIO interrupt in
the future, by e.g. adding a call to fwnode_irq_get() to
fixed_phy_get_gpiod().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/31cdb232-a5e9-4997-a285-cb9a7d208124@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Error recovery, PCIe AER, resume, and TX timeout will invoke bnxt_open()
with netdev_lock only. This will cause RTNL assert failure in
netif_set_real_num_tx_queues(), netif_set_real_num_tx_queues(),
and netif_set_real_num_tx_queues().
Example error recovery assert:
RTNL: assertion failed at net/core/dev.c (3178)
WARNING: CPU: 3 PID: 3392 at net/core/dev.c:3178 netif_set_real_num_tx_queues+0x1fd/0x210
Call Trace:
<TASK>
? __pfx_bnxt_msix+0x10/0x10 [bnxt_en]
__bnxt_open_nic+0x1ef/0xb20 [bnxt_en]
bnxt_open+0xda/0x130 [bnxt_en]
bnxt_fw_reset_task+0x21f/0x780 [bnxt_en]
process_scheduled_works+0x9d/0x400
For now, bring back rtnl_lock() in all these code paths that can invoke
bnxt_open(). In the bnxt_queue_start() error path, we don't have
rtnl_lock held so we just change it to call netif_close() instead of
bnxt_reset_task() for simplicity. This error path is unlikely so it
should be fine.
Fixes: 004b500801 ("eth: bnxt: remove most dependencies on RTNL")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250514062908.2766677-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The calculation done by calc_crc() is equivalent to
~crc32(~0, buf, len), so just use that instead.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://patch.msgid.link/20250513041402.541527-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>