Required to adopt x540 specific functions by E610 device.
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add low level support for EEPROM dump for the specified network device.
Co-developed-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Signed-off-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add low level support for accessing NVM in E610 device. NVM operations are
handled via the Admin Command Interface. Add the following NVM specific
operations:
- acquire, release, read
- validate checksum
- read shadow ram
Co-developed-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Signed-off-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Co-developed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add low level link management support for E610 device. Link management
operations are handled via the Admin Command Interface. Add the following
link management operations:
- get link capabilities
- set up link
- get media type
- get link status, link status events
- link power management
Co-developed-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Signed-off-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Co-developed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Jan Glaza <jan.glaza@intel.com>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add low level support for E610 device capabilities detection. The
capabilities are discovered via the Admin Command Interface. Discover the
following capabilities:
- function caps: vmdq, dcb, rss, rx/tx qs, msix, nvm, orom, reset
- device caps: vsi, fdir, 1588
- phy caps
Co-developed-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Signed-off-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Co-developed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add low level support for Admin Command Interface (ACI). ACI is the
Firmware interface used by a driver to communicate with E610 adapter. Add
the following ACI features:
- data structures, macros, register definitions
- commands handling
- events handling
Co-developed-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Signed-off-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Co-developed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently, xsk_buff_add_frag() only adds the frag to pool's linked list,
not doing anything with the &xdp_buff. The drivers do that manually and
the logic is the same.
Make it really add an skb frag, just like xdp_buff_add_frag() does that,
and freeing frags on error if needed. This allows to remove repeating
code from i40e and ice and not add the same code again and again.
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20241218174435.1445282-5-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a race condition between exiting wb_on_itr and completion write
backs. For example, we are in wb_on_itr mode and a Tx completion is
generated by HW, ready to be written back, as we are re-enabling
interrupts:
HW SW
| |
| | idpf_tx_splitq_clean_all
| | napi_complete_done
| |
| tx_completion_wb | idpf_vport_intr_update_itr_ena_irq
That tx_completion_wb happens before the vector is fully re-enabled.
Continuing with this example, it is a UDP stream and the
tx_completion_wb is the last one in the flow (there are no rx packets).
Because the HW generated the completion before the interrupt is fully
enabled, the HW will not fire the interrupt once the timer expires and
the write back will not happen. NAPI poll won't be called. We have
indicated we're back in interrupt mode but nothing else will trigger the
interrupt. Therefore, the completion goes unprocessed, triggering a Tx
timeout.
To mitigate this, fire a SW triggered interrupt upon exiting wb_on_itr.
This interrupt will catch the rogue completion and avoid the timeout.
Add logic to set the appropriate bits in the vector's dyn_ctl register.
Fixes: 9c4a27da0e ("idpf: enable WB_ON_ITR")
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
SW triggered interrupts are guaranteed to fire after their timer
expires, unlike Tx and Rx interrupts which will only fire after the
timer expires _and_ a descriptor write back is available to be processed
by the driver.
Add the necessary fields, defines, and initializations to enable a SW
triggered interrupt in the vector's dyn_ctl register.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add a devlink health reporter for MDD events. The 'dump' handler will
return the information captured in each call to ice_handle_mdd_event().
A device reset (CORER/PFR) will put the reporter back in healthy state.
Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Co-developed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add Tx hang devlink health reporter, see struct ice_tx_hang_event to see
what exactly is reported. For now dump descriptors with little metadata
and skb diagnostic information.
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Drop "devlink_" prefix from files that sit in devlink/.
I'm going to add more files there, and repeating "devlink" does not feel
good. This is also the scheme used in most other places, most notably the
devlink core files are named like that.
devlink.[ch] stays as is.
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_copy_rxq_ctx_to_hw() and ice_write_rxq_ctx() functions perform some
defensive checks which are typically frowned upon by kernel style
guidelines.
In particular, NULL checks on buffers which point to the stack are
discouraged, especially when the functions are static and only called once.
Checks of this sort only serve to hide potential programming error, as we
will not produce the normal crash dump on a NULL access.
In addition, ice_copy_rxq_ctx_to_hw() cannot fail in another way, so could
be made void.
Future support for VF Live Migration will need to introduce an inverse
function for reading Rx queue context from HW registers to unpack it, as
well as functions to pack and unpack Tx queue context from HW.
Rather than copying these style issues into the new functions, lets first
cleanup the existing code.
For the ice_copy_rxq_ctx_to_hw() function:
* Move the Rx queue index check out of this function.
* Convert the function to a void return.
* Use a simple int variable instead of a u8 for the for loop index, and
initialize it inside the for loop.
* Update the function description to better align with kernel doc style.
For the ice_write_rxq_ctx() function:
* Move the Rx queue index check into this function.
* Update the function description with a Returns: to align with kernel doc
style.
These changes align the existing write functions to current kernel
style, and will align with the style of the new functions added when we
implement live migration in a future series.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241210-packing-pack-fields-and-ice-implementation-v10-10-ee56a47479ac@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ice_write_rxq_ctx() function is responsible for programming the Rx
Queue context into hardware. It receives the configuration in unpacked form
via the ice_rlan_ctx structure.
This function unconditionally modifies the context to set the prefetch
enable bit. This was done by commit c31a5c25bb ("ice: Always set prefena
when configuring an Rx queue"). Setting this bit makes sense, since
prefetching descriptors is almost always the preferred behavior.
However, the ice_write_rxq_ctx() function is not the place that actually
defines the queue context. We initialize the Rx Queue context in
ice_setup_rx_ctx(). It is surprising to have the Rx queue context changed
by a function who's responsibility is to program the given context to
hardware.
Following the principle of least surprise, move the setting of the prefetch
enable bit out of ice_write_rxq_ctx() and into the ice_setup_rx_ctx().
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241210-packing-pack-fields-and-ice-implementation-v10-9-ee56a47479ac@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ice_rlan_ctx and ice_tlan_ctx structures have some fields which are
intentionally sized larger than necessary relative to the packed sizes the
data must fit into. This was done because the original ice_set_ctx()
function and its helpers did not correctly handle packing when the packed
bits straddled a byte. This is no longer the case with the use of the
<linux/packing.h> implementation.
Save some bytes in these structures by sizing the variables to the number
of bytes the actual bitpacked fields fit into.
There are a couple of gaps left in the structure, which is a result of the
fields being in the order they appear in the packed bit layout, but where
alignment forces some extra gaps. We could fix this, saving ~8 bytes from
each structure. However, these structures are not used heavily, and the
resulting savings is minimal:
$ bloat-o-meter ice-before-reorder.ko ice-after-reorder.ko
add/remove: 0/0 grow/shrink: 1/1 up/down: 26/-70 (-44)
Function old new delta
ice_vsi_cfg_txq 1873 1899 +26
ice_setup_rx_ctx.constprop 1529 1459 -70
Total: Before=1459555, After=1459511, chg -0.00%
Thus, the fields are left in the same order as the packed bit layout,
despite the gaps this causes.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241210-packing-pack-fields-and-ice-implementation-v10-8-ee56a47479ac@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ice driver needs to write the Tx and Rx queue context when programming
Tx and Rx queues. This is currently done using some bespoke custom logic
via the ice_set_ctx() and its helper functions, along with bit position
definitions in the ice_tlan_ctx_info and ice_rlan_ctx_info structures.
This logic does work, but is problematic for several reasons:
1) ice_set_ctx requires a helper function for each byte size being packed,
as it uses a separate function to pack u8, u16, u32, and u64 fields.
This requires 4 functions which contain near-duplicate logic with the
types changed out.
2) The logic in the ice_pack_ctx_word, ice_pack_ctx_dword, and
ice_pack_ctx_qword does not handle values which straddle alignment
boundaries very well. This requires that several fields in the
ice_tlan_ctx_info and ice_rlan_ctx_info be a size larger than their bit
size should require.
3) Future support for live migration will require adding unpacking
functions to take the packed hardware context and unpack it into the
ice_rlan_ctx and ice_tlan_ctx structures. Implementing this would
require implementing ice_get_ctx, and its associated helper functions,
which essentially doubles the amount of code required.
The Linux kernel has had a packing library that can handle this logic since
commit 554aae3500 ("lib: Add support for generic packing operations").
The library was recently extended with support for packing or unpacking an
array of fields, with a similar structure as the ice_ctx_ele structure.
Replace the ice-specific ice_set_ctx() logic with the recently added
pack_fields and packed_field_s infrastructure from <linux/packing.h>
For API simplicity, the Tx and Rx queue context are programmed using
separate ice_pack_txq_ctx() and ice_pack_rxq_ctx(). This avoids needing to
export the packed_field_s arrays. The functions can pointers to the
appropriate ice_txq_ctx_buf_t and ice_rxq_ctx_buf_t types, ensuring that
only buffers of the appropriate size are passed.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241210-packing-pack-fields-and-ice-implementation-v10-7-ee56a47479ac@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ice Tx and Rx queue context are currently stored as arrays of bytes
with defined size (ICE_RXQ_CTX_SZ and ICE_TXQ_CTX_SZ). The packed queue
context is often passed to other functions as a simple u8 * pointer, which
does not allow tracking the size. This makes the queue context API easy to
misuse, as you can pass an arbitrary u8 array or pointer.
Introduce wrapper typedefs which use a __packed structure that has the
proper fixed size for the Tx and Rx context buffers. This enables the
compiler to track the size of the value and ensures that passing the wrong
buffer size will be detected by the compiler.
The existing APIs do not benefit much from this change, however the
wrapping structures will be used to simplify the arguments of new packing
functions based on the recently introduced pack_fields API.
Co-developed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241210-packing-pack-fields-and-ice-implementation-v10-6-ee56a47479ac@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The int_q_state field of the ice_tlan_ctx structure represents the internal
queue state. However, we never actually need to assign this or read this
during normal operation. In fact, trying to unpack it would not be possible
as it is larger than a u64. Remove this field from the ice_tlan_ctx
structure, and remove its packing field from the ice_tlan_ctx_info array.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241210-packing-pack-fields-and-ice-implementation-v10-5-ee56a47479ac@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current release - regressions:
- rtnetlink: fix double call of rtnl_link_get_net_ifla()
- tcp: populate XPS related fields of timewait sockets
- ethtool: fix access to uninitialized fields in set RXNFC command
- selinux: use sk_to_full_sk() in selinux_ip_output()
Current release - new code bugs:
- net: make napi_hash_lock irq safe
- eth: bnxt_en: support header page pool in queue API
- eth: ice: fix NULL pointer dereference in switchdev
Previous releases - regressions:
- core: fix icmp host relookup triggering ip_rt_bug
- ipv6:
- avoid possible NULL deref in modify_prefix_route()
- release expired exception dst cached in socket
- smc: fix LGR and link use-after-free issue
- hsr: avoid potential out-of-bound access in fill_frame_info()
- can: hi311x: fix potential use-after-free
- eth: ice: fix VLAN pruning in switchdev mode
Previous releases - always broken:
- netfilter:
- ipset: hold module reference while requesting a module
- nft_inner: incorrect percpu area handling under softirq
- can: j1939: fix skb reference counting
- eth: mlxsw: use correct key block on Spectrum-4
- eth: mlx5: fix memory leak in mlx5hws_definer_calc_layout
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmdRve4SHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOk/TUQAItDVkTQDiAUUqd0TDH2SSnboxXozcMF
fKVrdWulFAOe7qcajUqjkzvTDjAOw9Xbfh8rEiFBLaXmUzSUT2eqbm2VahvWIR5/
k08v7fTTuCNzEwhnbQlsZ47Nd26LJVPwOvbtE/4V8d50pU/serjWuI/74tUCWAjn
DQOjyyqRjKgKKY+WWCQ6g4tVpD//DB4Sj15GiE3MhlW1f08AAnPJSe2oTaq0RZBG
nXo7abOGn8x3RYilrlp/ZwWYuNpVI4oF+qmp+t/46NV+7ER1JgrC97k0kFyyCYVD
g7vBvFjvA7vDmiuzfsOW2n7IRdcfBtkfi8UJYOIvVgJg7KDF0DXoxj3qD4FagI35
RWRMJw+PoNlFFkPprlp0we/19jPJWOO6rx+AOMEQD78jrH7NoFQ/+eeBf/nppfjy
wX0LD1aQsgPk2ju0I8GbcM/qaJ81EJUiYYVLJHieH0+vvmqts8cMDRLZf5t08EHa
myXcRGB9N8gjguGp5mdR5KmtY82zASlNC0PbDp3nlPYc3e/opCmMRYGjBohO+vqn
n7u250WThPwiBtOwYmSbcK7zpS034/VX0ufnTT2X3MWnFGGNDv6XVmho/OBuCHqJ
m/EiJo/D9qVGAIql/vAxN9a4lQKrcZgaMzCyEPYCf7rtLmzx7sfyHWbf/GZlUAU/
9dUUfWqCbZcL
=nkPk
-----END PGP SIGNATURE-----
Merge tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can and netfilter.
Current release - regressions:
- rtnetlink: fix double call of rtnl_link_get_net_ifla()
- tcp: populate XPS related fields of timewait sockets
- ethtool: fix access to uninitialized fields in set RXNFC command
- selinux: use sk_to_full_sk() in selinux_ip_output()
Current release - new code bugs:
- net: make napi_hash_lock irq safe
- eth:
- bnxt_en: support header page pool in queue API
- ice: fix NULL pointer dereference in switchdev
Previous releases - regressions:
- core: fix icmp host relookup triggering ip_rt_bug
- ipv6:
- avoid possible NULL deref in modify_prefix_route()
- release expired exception dst cached in socket
- smc: fix LGR and link use-after-free issue
- hsr: avoid potential out-of-bound access in fill_frame_info()
- can: hi311x: fix potential use-after-free
- eth: ice: fix VLAN pruning in switchdev mode
Previous releases - always broken:
- netfilter:
- ipset: hold module reference while requesting a module
- nft_inner: incorrect percpu area handling under softirq
- can: j1939: fix skb reference counting
- eth:
- mlxsw: use correct key block on Spectrum-4
- mlx5: fix memory leak in mlx5hws_definer_calc_layout"
* tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
net :mana :Request a V2 response version for MANA_QUERY_GF_STAT
net: avoid potential UAF in default_operstate()
vsock/test: verify socket options after setting them
vsock/test: fix parameter types in SO_VM_SOCKETS_* calls
vsock/test: fix failures due to wrong SO_RCVLOWAT parameter
net/mlx5e: Remove workaround to avoid syndrome for internal port
net/mlx5e: SD, Use correct mdev to build channel param
net/mlx5: E-Switch, Fix switching to switchdev mode in MPV
net/mlx5: E-Switch, Fix switching to switchdev mode with IB device disabled
net/mlx5: HWS: Properly set bwc queue locks lock classes
net/mlx5: HWS: Fix memory leak in mlx5hws_definer_calc_layout
bnxt_en: handle tpa_info in queue API implementation
bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap()
bnxt_en: refactor tpa_info alloc/free into helpers
geneve: do not assume mac header is set in geneve_xmit_skb()
mlxsw: spectrum_acl_flex_keys: Use correct key block on Spectrum-4
ethtool: Fix wrong mod state in case of verbose and no_mask bitset
ipmr: tune the ipmr_can_free_table() checks.
netfilter: nft_set_hash: skip duplicated elements pending gc run
netfilter: ipset: Hold module reference while requesting a module
...
The pci_register_driver() can fail and when this happened, the dca_notifier
needs to be unregistered, otherwise the dca_notifier can be called when
igb fails to install, resulting to invalid memory access.
Fixes: bbd98fe48a ("igb: Fix DCA errors and do not use context index for 82576")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
SFF-8472 (section 5.4 Transceiver Compliance Codes) defines bit 6 as
BASE-BX10. Bit 6 means a value of 0x40 (decimal 64).
The current value in the source code is 0x64, which appears to be a
mix-up of hex and decimal values. A value of 0x64 (binary 01100100)
incorrectly sets bit 2 (1000BASE-CX) and bit 5 (100BASE-FX) as well.
Fixes: 1b43e0d20f ("ixgbe: Add 1000BASE-BX support")
Signed-off-by: Tore Amundsen <tore@amundsen.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: Ernesto Castellotti <ernesto@castellotti.net>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ixgbe PF driver logs an info message when a VF attempts to negotiate an
API version which it does not support:
VF 0 requested invalid api version 6
The ixgbevf driver attempts to load with mailbox API v1.5, which is
required for best compatibility with other hosts such as the ESX VMWare PF.
The Linux PF only supports API v1.4, and does not currently have support
for the v1.5 API.
The logged message can confuse users, as the v1.5 API is valid, but just
happens to not currently be supported by the Linux PF.
Downgrade the info message to a debug message, and fix the language to
use 'unsupported' instead of 'invalid' to improve message clarity.
Long term, we should investigate whether the improvements in the v1.5 API
make sense for the Linux PF, and if so implement them properly. This may
require yet another API version to resolve issues with negotiating IPSEC
offload support.
Fixes: 339f289641 ("ixgbevf: Add support for new mailbox communication between PF and VF")
Reported-by: Yifei Liu <yifei.l.liu@oracle.com>
Link: https://lore.kernel.org/intel-wired-lan/20240301235837.3741422-1-yifei.l.liu@oracle.com/
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Commit 339f289641 ("ixgbevf: Add support for new mailbox communication
between PF and VF") added support for v1.5 of the PF to VF mailbox
communication API. This commit mistakenly enabled IPSEC offload for API
v1.5.
No implementation of the v1.5 API has support for IPSEC offload. This
offload is only supported by the Linux PF as mailbox API v1.4. In fact, the
v1.5 API is not implemented in any Linux PF.
Attempting to enable IPSEC offload on a PF which supports v1.5 API will not
work. Only the Linux upstream ixgbe and ixgbevf support IPSEC offload, and
only as part of the v1.4 API.
Fix the ixgbevf Linux driver to stop attempting IPSEC offload when
the mailbox API does not support it.
The existing API design choice makes it difficult to support future API
versions, as other non-Linux hosts do not implement IPSEC offload. If we
add support for v1.5 to the Linux PF, then we lose support for IPSEC
offload.
A full solution likely requires a new mailbox API with a proper negotiation
to check that IPSEC is actually supported by the host.
Fixes: 339f289641 ("ixgbevf: Add support for new mailbox communication between PF and VF")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Commit d9028db618 ("idpf: convert to libeth Tx buffer completion")
inadvertently removed code that was necessary for the tx buffer cleaning
routine to iterate over all buffers associated with a packet.
When a frag is too large for a single data descriptor, it will be split
across multiple data descriptors. This means the frag will span multiple
buffers in the buffer ring in order to keep the descriptor and buffer
ring indexes aligned. The buffer entries in the ring are technically
empty and no cleaning actions need to be performed. These empty buffers
can precede other frags associated with the same packet. I.e. a single
packet on the buffer ring can look like:
buf[0]=skb0.frag0
buf[1]=skb0.frag1
buf[2]=empty
buf[3]=skb0.frag2
The cleaning routine iterates through these buffers based on a matching
completion tag. If the completion tag is not set for buf2, the loop will
end prematurely. Frag2 will be left uncleaned and next_to_clean will be
left pointing to the end of packet, which will break the cleaning logic
for subsequent cleans. This consequently leads to tx timeouts.
Assign the empty bufs the same completion tag for the packet to ensure
the cleaning routine iterates over all of the buffers associated with
the packet.
Fixes: d9028db618 ("idpf: convert to libeth Tx buffer completion")
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Acked-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Madhu chittim <madhu.chittim@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In switchdev mode the uplink VSI should receive all unmatched packets,
including VLANs. Therefore, VLAN pruning should be disabled if uplink is
in switchdev mode. It is already being done in ice_eswitch_setup_env(),
however the addition of ice_up() in commit 44ba608db5 ("ice: do
switchdev slow-path Rx using PF VSI") caused VLAN pruning to be
re-enabled after disabling it.
Add a check to ice_set_vlan_filtering_features() to ensure VLAN
filtering will not be enabled if uplink is in switchdev mode. Note that
ice_is_eswitch_mode_switchdev() is being used instead of
ice_is_switchdev_running(), as the latter would only return true after
the whole switchdev setup completes.
Fixes: 44ba608db5 ("ice: do switchdev slow-path Rx using PF VSI")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Priya Singh <priyax.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
To check if PHY Clock Recovery mechanic is available for a device, there
is a need to verify if given PHY is available within the netlist, but the
netlist node type used for the search is wrong, also the search context
shall be specified.
Modify the search function to allow specifying the context in the
search.
Use the PHY node type instead of CLOCK CONTROLLER type, also use proper
search context which for PHY search is PORT, as defined in E810
Datasheet [1] ('3.3.8.2.4 Node Part Number and Node Options (0x0003)' and
'Table 3-105. Program Topology Device NVM Admin Command').
[1] https://cdrdv2.intel.com/v1/dl/getContent/613875?explicitVersion=true
Fixes: 91e43ca009 ("ice: fix linking when CONFIG_PTP_1588_CLOCK=n")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-11-05 (ice, ixgbe, igc. igb, igbvf, e1000)
For ice:
Mateusz refactors and adds additional SerDes configuration values to be
output.
Przemek refactors processing of DDP and adds support for a flag field in
the DDP's signature segment header.
Joe Damato adds support for persistent NAPI config.
Brett adjusts setting of Tx promiscuous based on unicast/multicast
setting.
Jake moves setting of pf->supported_rxdids to occur directly after DDP
load and changes a small struct to use stack memory.
Frederic Weisbecker adds WQ_UNBOUND flag to the workqueue.
For ixgbe:
Diomidis Spinellis removes a circular dependency.
For igc:
Vitaly removes an unneeded autoneg parameter.
For igb:
Johnny Park fixes a couple of typos.
For igbvf:
Wander Lairson Costa removes an unused spinlock.
For e1000:
Joe Damato adds RTNL lock to some calls where it is expected to be held.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
e1000: Hold RTNL when e1000_down can be called
igbvf: remove unused spinlock
igb: Fix 2 typos in comments in igb_main.c
igc: remove autoneg parameter from igc_mac_info
ixgbe: Break include dependency cycle
ice: Unbind the workqueue
ice: use stack variable for virtchnl_supported_rxdids
ice: initialize pf->supported_rxdids immediately after loading DDP
ice: only allow Tx promiscuous for multicast
ice: Add support for persistent NAPI config
ice: support optional flags in signature segment header
ice: refactor "last" segment of DDP pkg
ice: extend dump serdes equalizer values feature
ice: rework of dump serdes equalizer values feature
====================
Link: https://patch.msgid.link/20241113185431.1289708-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In a similar fashion to ndo_fdb_add, which was covered in the previous
patch, add the bool *notified argument to ndo_fdb_del. Callees that send a
notification on their own set the flag to true.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/06b1acf4953ef0a5ed153ef1f32d7292044f2be6.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently when FDB entries are added to or deleted from a VXLAN netdevice,
the VXLAN driver emits one notification, including the VXLAN-specific
attributes. The core however always sends a notification as well, a generic
one. Thus two notifications are unnecessarily sent for these operations. A
similar situation comes up with bridge driver, which also emits
notifications on its own:
# ip link add name vx type vxlan id 1000 dstport 4789
# bridge monitor fdb &
[1] 1981693
# bridge fdb add de:ad:be:ef:13:37 dev vx self dst 192.0.2.1
de:ad:be:ef:13:37 dev vx dst 192.0.2.1 self permanent
de:ad:be:ef:13:37 dev vx self permanent
In order to prevent this duplicity, add a paremeter to ndo_fdb_add,
bool *notified. The flag is primed to false, and if the callee sends a
notification on its own, it sets it to true, thus informing the core that
it should not generate another notification.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/cbf6ae8195e85cbf922f8058ce4eba770f3b71ed.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After assembling the new private flags on a PF, the operation to determine
the changed flags uses the wrong bitmaps. Instead of xor-ing orig_flags
with new_flags, it uses the still unchanged pf->flags, thus changed_flags
is always 0.
Fix it by using the correct bitmaps.
The issue was discovered while debugging why disabling source pruning
stopped working with release 6.7. Although the new flags will be copied to
pf->flags later on in that function, disabling source pruning requires
a reset of the PF, which was skipped due to this bug.
Disabling source pruning:
$ sudo ethtool --set-priv-flags eno1 disable-source-pruning on
$ sudo ethtool --show-priv-flags eno1
Private flags for eno1:
MFP : off
total-port-shutdown : off
LinkPolling : off
flow-director-atr : on
veb-stats : off
hw-atr-eviction : off
link-down-on-close : off
legacy-rx : off
disable-source-pruning: on
disable-fw-lldp : off
rs-fec : off
base-r-fec : off
vf-vlan-pruning : off
Regarding reproducing:
I observed the issue with a rather complicated lab setup, where
* two VLAN interfaces are created on eno1
* each with a different MAC address assigned
* each moved into a separate namespace
* both VLANs are bridged externally, so they form a single layer 2 network
The external bridge is done via a channel emulator adding packet loss and
delay and the application in the namespaces tries to send/receive traffic
and measure the performance. Sender and receiver are separated by
namespaces, yet the network card "sees its own traffic" send back to it.
To make that work, source pruning has to be disabled.
Cc: stable@vger.kernel.org
Fixes: 70756d0a47 ("i40e: Use DECLARE_BITMAP for flags and hw_features fields in i40e_pf")
Signed-off-by: Peter Große <pegro@friiks.de>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20241113210705.1296408-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
e1000_down calls netif_queue_set_napi, which assumes that RTNL is held.
There are a few paths for e1000_down to be called in e1000 where RTNL is
not currently being held:
- e1000_shutdown (pci shutdown)
- e1000_suspend (power management)
- e1000_reinit_locked (via e1000_reset_task delayed work)
- e1000_io_error_detected (via pci error handler)
Hold RTNL in three places to fix this issue:
- e1000_reset_task: igc, igb, and e100e all hold rtnl in this path.
- e1000_io_error_detected (pci error handler): e1000e and ixgbe hold
rtnl in this path. A patch has been posted for igc to do the same
[1].
- __e1000_shutdown (which is called from both e1000_shutdown and
e1000_suspend): igb, ixgbe, and e1000e all hold rtnl in the same
path.
The other paths which call e1000_down seemingly hold RTNL and are OK:
- e1000_close (ndo_stop)
- e1000_change_mtu (ndo_change_mtu)
Based on the above analysis and mailing list discussion [2], I believe
adding rtnl in the three places mentioned above is correct.
Fixes: 8f7ff18a5e ("e1000: Link NAPI instances to queues and IRQs")
Reported-by: Dmitry Antipov <dmantipov@yandex.ru>
Closes: https://lore.kernel.org/netdev/8cf62307-1965-46a0-a411-ff0080090ff9@yandex.ru/
Link: https://lore.kernel.org/netdev/20241022215246.307821-3-jdamato@fastly.com/ [1]
Link: https://lore.kernel.org/netdev/ZxgVRX7Ne-lTjwiJ@LQ3V64L9R2/ [2]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
tx_queue_lock and stats_lock are declared and initialized, but never
used. Remove them.
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix 2 spelling mistakes in comments in `igb_main.c`.
Signed-off-by: Johnny Park <pjohnny0508@gmail.com>
Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Since the igc driver doesn't support forced speed configuration and
its current related hardware doesn't support it either, there is no
use of the mac.autoneg parameter. Moreover, in one case this usage
might result in a NULL pointer dereference due to an uninitialized
function pointer, phy.ops.force_speed_duplex.
Therefore, remove this parameter from the igc code.
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Header ixgbe_type.h includes ixgbe_mbx.h. Also, header
ixgbe_mbx.h included ixgbe_type.h, thus introducing a circular
dependency.
- Remove ixgbe_mbx.h inclusion from ixgbe_type.h.
- ixgbe_mbx.h requires the definition of struct ixgbe_mbx_operations
so move its definition there. While at it, add missing argument
identifier names.
- Add required forward structure declarations.
- Include ixgbe_mbx.h in the .c files that need it, for the
following reasons:
ixgbe_sriov.c uses ixgbe_check_for_msg
ixgbe_main.c uses ixgbe_init_mbx_params_pf
ixgbe_82599.c uses mbx_ops_generic
ixgbe_x540.c uses mbx_ops_generic
ixgbe_x550.c uses mbx_ops_generic
Signed-off-by: Diomidis Spinellis <dds@aueb.gr>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice workqueue doesn't seem to rely on any CPU locality and should
therefore be able to run on any CPU. In practice this is already
happening through the unbound ice_service_timer that may fire anywhere
and queue the workqueue accordingly to any CPU.
Make this official so that the ice workqueue is only ever queued to
housekeeping CPUs on nohz_full.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_vc_query_rxdid() function allocates memory to store the
virtchnl_supported_rxdids structure used to communicate the bitmap of
supported RXDIDs to a VF.
This structure is only 8 bytes in size. The function must hold the
allocated length on the stack as well as the pointer to the structure which
itself is 8 bytes. Allocating this storage on the heap adds unnecessary
overhead including a potential error path that must be handled in case
kzalloc fails. Because this structure is so small, we're not saving stack
space. Additionally, because we must ensure that we free the allocated
memory, the return value from ice_vc_send_msg_to_vf() must also be saved in
the stack ret variable. Depending on compiler optimization, this means
allocating the 8-byte structure is requiring up to 16-bytes of stack
memory!
Simplify this function to keep the rxdid variable on the stack, saving
memory and removing a potential failure exit path from this function.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The pf->supported_rxdids field is used to populate the list of valid RXDIDs
that a VF may use when negotiating VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC.
The set of supported RXDIDs is dependent on the DDP, and can be read from
the GLXFLXP_RXDID_FLAGS register. The PF needs to send this list to the
VF upon receiving the VIRTCHNL_OP_GET_SUPPORTED_RXDIDs. It also needs to
use this list to validate the requested descriptor ID from the VF when
programming the Rx queues.
A future update to support VF live migration will also want to validate
that the target VF can support the same descriptor ID when migrating.
Currently, pf->supported_rxdids is initialized inside the
ice_vc_query_rxdid() function. This means that it is only ever initialized
if at least one VF actually tries to negotiate
VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. It is also unnecessarily re-initialized
every time the VF loads and requests the descriptor list. This worked
before because the PF only checks pf->suppported_rxdids when programming
the Rx queue if the VF actually negotiates the
VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC feature.
This will be problematic for VF live migration. We need the list of
supported Rx descriptor IDs when migrating. It is possible that no VF on
the target PF has ever actually issued a VIRTCHNL_OP_GET_SUPPORTED_RXDIDs.
Refactor the driver to initialize pf->supported_rxdids during driver
initialization after the DDP is loaded. This is simpler, avoids unnecessary
duplicate work, and avoids issues with the live migration process.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently when any VF is trusted and true promiscuous mode is enabled on
the PF, the VF will receive all unicast traffic directed to the device's
internal switch. This includes traffic external to the NIC and also from
other VSI (i.e. VFs). This does not match the expected behavior as
unicast traffic should only be visible from external sources in this
case. Disable the Tx promiscuous mode bits for unicast promiscuous mode.
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
An optional flag field has been added to the signature segment header.
The field contains two flags, a "valid" bit, and a "last segment" bit
that indicates whether the segment is the last segment that will be
sent to firmware.
If the flag field's valid bit is NOT set, then as was done before,
assume that this is the last segment being downloaded.
However, if the flag field's valid bit IS set, then use the last segment
flag to determine if this segment is the last segment to download.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add ice_ddp_send_hunk() that buffers "sent FW hunk" calls to AQ in order
to mark the "last" one in more elegant way. Next commit will add even
more complicated "sent FW" flow, so it's better to untangle a bit before.
Note that metadata buffers were not skipped for NOT-@indicate_last
segments, this is fixed now.
Minor:
+ use ice_is_buffer_metadata() instead of open coding it in
ice_dwnld_cfg_bufs();
+ ice_dwnld_cfg_bufs_no_lock() + dependencies were moved up a bit to have
better git-diff, as this function was rewritten (in terms of git-blame)
CC: Paul Greenwalt <paul.greenwalt@intel.com>
CC: Dan Nowlin <dan.nowlin@intel.com>
CC: Ahmed Zaki <ahmed.zaki@intel.com>
CC: Simon Horman <horms@kernel.org>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Extend the work done in commit 70838938e8 ("ice: Implement driver
functionality to dump serdes equalizer values") by adding the new set of
Rx registers that can be read using command:
$ ethtool -d interface_name
Rx equalization parameters are E810 PHY registers used by end user to
gather information about configuration and status to debug link and
connection issues in the field.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Refactor function ice_get_tx_rx_equa() to iterate over new table of
params instead of multiple calls to ice_aq_get_phy_equalization().
Subsequent commit will extend that function by add more serdes equalizer
values to dump.
Shorten the fields of struct ice_serdes_equalization_to_ethtool for
readability purposes.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This reverts commit 338c4d3902.
Sebastian noticed the ISR indirectly acquires spin_locks, which are
sleeping locks under PREEMPT_RT, which leads to kernel splats.
Fixes: 338c4d3902 ("igb: Disable threaded IRQ for igb_msix_other")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241106111427.7272-1-wander@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is a partial revert to commit 76a0a3f9cc ("e1000e: fix force smbus
during suspend flow"). That commit fixed a sporadic PHY access issue but
introduced a regression in runtime suspend flows.
The original issue on Meteor Lake systems was rare in terms of the
reproduction rate and the number of the systems affected.
After the integration of commit 0a6ad4d9e1 ("e1000e: avoid failing the
system during pm_suspend"), PHY access loss can no longer cause a
system-level suspend failure. As it only occurs when the LAN cable is
disconnected, and is recovered during system resume flow. Therefore, its
functional impact is low, and the priority is given to stabilizing
runtime suspend.
Fixes: 76a0a3f9cc ("e1000e: fix force smbus during suspend flow")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix a race condition in the i40e driver that leads to MAC/VLAN filters
becoming corrupted and leaking. Address the issue that occurs under
heavy load when multiple threads are concurrently modifying MAC/VLAN
filters by setting mac and port VLAN.
1. Thread T0 allocates a filter in i40e_add_filter() within
i40e_ndo_set_vf_port_vlan().
2. Thread T1 concurrently frees the filter in __i40e_del_filter() within
i40e_ndo_set_vf_mac().
3. Subsequently, i40e_service_task() calls i40e_sync_vsi_filters(), which
refers to the already freed filter memory, causing corruption.
Reproduction steps:
1. Spawn multiple VFs.
2. Apply a concurrent heavy load by running parallel operations to change
MAC addresses on the VFs and change port VLANs on the host.
3. Observe errors in dmesg:
"Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX,
please set promiscuous on manually for VF XX".
Exact code for stable reproduction Intel can't open-source now.
The fix involves implementing a new intermediate filter state,
I40E_FILTER_NEW_SYNC, for the time when a filter is on a tmp_add_list.
These filters cannot be deleted from the hash list directly but
must be removed using the full process.
Fixes: 278e7d0b9d ("i40e: store MAC/VLAN filters in a hash with the MAC Address as key")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In an event where the platform running the device control plane
is rebooted, reset is detected on the driver. It releases
all the resources and waits for the reset to complete. Once the
reset is done, it tries to build the resources back. At this
time if the device control plane is not yet started, then
the driver timeouts on the virtchnl message and retries to
establish the mailbox again.
In the retry flow, mailbox is deinitialized but the mailbox
workqueue is still alive and polling for the mailbox message.
This results in accessing the released control queue leading to
null-ptr-deref. Fix it by unrolling the work queue cancellation
and mailbox deinitialization in the reverse order which they got
initialized.
Fixes: 4930fbf419 ("idpf: add core init and interrupt request")
Fixes: 34c21fa894 ("idpf: implement virtchnl transaction manager")
Cc: stable@vger.kernel.org # 6.9+
Reviewed-by: Tarun K Singh <tarun.k.singh@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When the device control plane is removed or the platform
running device control plane is rebooted, a reset is detected
on the driver. On driver reset, it releases the resources and
waits for the reset to complete. If the reset fails, it takes
the error path and releases the vport lock. At this time if the
monitoring tools tries to access link settings, it call traces
for accessing released vport pointer.
To avoid it, move link_speed_mbps to netdev_priv structure
which removes the dependency on vport pointer and the vport lock
in idpf_get_link_ksettings. Also use netif_carrier_ok()
to check the link status and adjust the offsetof to use link_up
instead of link_speed_mbps.
Fixes: 02cbfba1ad ("idpf: add ethtool callbacks")
Cc: stable@vger.kernel.org # 6.7+
Reviewed-by: Tarun K Singh <tarun.k.singh@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix Flow Director not allowing to re-map traffic to 0th queue when action
is configured to drop (and vice versa).
The current implementation of ethtool callback in the ice driver forbids
change Flow Director action from 0 to -1 and from -1 to 0 with an error,
e.g:
# ethtool -U eth2 flow-type tcp4 src-ip 1.1.1.1 loc 1 action 0
# ethtool -U eth2 flow-type tcp4 src-ip 1.1.1.1 loc 1 action -1
rmgr: Cannot insert RX class rule: Invalid argument
We set the value of `u16 q_index = 0` at the beginning of the function
ice_set_fdir_input_set(). In case of "drop traffic" action (which is
equal to -1 in ethtool) we store the 0 value. Later, when want to change
traffic rule to redirect to queue with index 0 it returns an error
caused by duplicate found.
Fix this behaviour by change of the type of field `q_index` from u16 to s16
in `struct ice_fdir_fltr`. This allows to store -1 in the field in case
of "drop traffic" action. What is more, change the variable type in the
function ice_set_fdir_input_set() and assign at the beginning the new
`#define ICE_FDIR_NO_QUEUE_IDX` which is -1. Later, if the action is set
to another value (point specific queue index) the variable value is
overwritten in the function.
Fixes: cac2a27cd9 ("ice: Support IPv4 Flow Director filters")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Unloading the ice driver while switchdev port representors are added to
a bridge can lead to kernel panic. Reproducer:
modprobe ice
devlink dev eswitch set $PF1_PCI mode switchdev
ip link add $BR type bridge
ip link set $BR up
echo 2 > /sys/class/net/$PF1/device/sriov_numvfs
sleep 2
ip link set $PF1 master $BR
ip link set $VF1_PR master $BR
ip link set $VF2_PR master $BR
ip link set $PF1 up
ip link set $VF1_PR up
ip link set $VF2_PR up
ip link set $VF1 up
rmmod irdma ice
When unloading the driver, ice_eswitch_detach() is eventually called as
part of VF freeing. First, it removes a port representor from xarray,
then unregister_netdev() is called (via repr->ops.rem()), finally
representor is deallocated. The problem comes from the bridge doing its
own deinit at the same time. unregister_netdev() triggers a notifier
chain, resulting in ice_eswitch_br_port_deinit() being called. It should
set repr->br_port = NULL, but this does not happen since repr has
already been removed from xarray and is not found. Regardless, it
finishes up deallocating br_port. At this point, repr is still not freed
and an fdb event can happen, in which ice_eswitch_br_fdb_event_work()
takes repr->br_port and tries to use it, which causes a panic (use after
free).
Note that this only happens with 2 or more port representors added to
the bridge, since with only one representor port, the bridge deinit is
slightly different (ice_eswitch_br_port_deinit() is called via
ice_eswitch_br_ports_flush(), not ice_eswitch_br_port_unlink()).
Trace:
Oops: general protection fault, probably for non-canonical address 0xf129010fd1a93284: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: maybe wild-memory-access in range [0x8948287e8d499420-0x8948287e8d499427]
(...)
Workqueue: ice_bridge_wq ice_eswitch_br_fdb_event_work [ice]
RIP: 0010:__rht_bucket_nested+0xb4/0x180
(...)
Call Trace:
(...)
ice_eswitch_br_fdb_find+0x3fa/0x550 [ice]
? __pfx_ice_eswitch_br_fdb_find+0x10/0x10 [ice]
ice_eswitch_br_fdb_event_work+0x2de/0x1e60 [ice]
? __schedule+0xf60/0x5210
? mutex_lock+0x91/0xe0
? __pfx_ice_eswitch_br_fdb_event_work+0x10/0x10 [ice]
? ice_eswitch_br_update_work+0x1f4/0x310 [ice]
(...)
A workaround is available: brctl setageing $BR 0, which stops the bridge
from adding fdb entries altogether.
Change the order of operations in ice_eswitch_detach(): move the call to
unregister_netdev() before removing repr from xarray. This way
repr->br_port will be correctly set to NULL in
ice_eswitch_br_port_deinit(), preventing a panic.
Fixes: fff292b47a ("ice: add VF representors one by one")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
net_dim() is currently passed a struct dim_sample argument by value.
struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64
passes it on the stack. All callers have already initialized dim_sample
on the stack, so passing it by value requires pushing a duplicated copy
to the stack. Either witing to the stack and immediately reading it, or
perhaps dereferencing addresses relative to the stack pointer in a chain
of push instructions, seems to perform quite poorly.
In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time,
94% of which is attributed to the first push instruction to copy
dim_sample on the stack for the call to net_dim():
// Call ktime_get()
0.26 |4ead2: call 4ead7 <mlx5e_handle_rx_dim+0x47>
// Pass the address of struct dim in %rdi
|4ead7: lea 0x3d0(%rbx),%rdi
// Set dim_sample.pkt_ctr
|4eade: mov %r13d,0x8(%rsp)
// Set dim_sample.byte_ctr
|4eae3: mov %r12d,0xc(%rsp)
// Set dim_sample.event_ctr
0.15 |4eae8: mov %bp,0x10(%rsp)
// Duplicate dim_sample on the stack
94.16 |4eaed: push 0x10(%rsp)
2.79 |4eaf1: push 0x10(%rsp)
0.07 |4eaf5: push %rax
// Call net_dim()
0.21 |4eaf6: call 4eafb <mlx5e_handle_rx_dim+0x6b>
To allow the caller to reuse the struct dim_sample already on the stack,
pass the struct dim_sample by reference to net_dim().
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The E810 Lan On Motherboard (LOM) design is vendor specific. Intel
provides the reference design, but it is up to vendor on the final
product design. For some cases, like Linux DPLL support, the static
values defined in the driver does not reflect the actual LOM design.
Current implementation of dpll pins is causing the crash on probe
of the ice driver for such DPLL enabled E810 LOM designs:
WARNING: (...) at drivers/dpll/dpll_core.c:495 dpll_pin_get+0x2c4/0x330
...
Call Trace:
<TASK>
? __warn+0x83/0x130
? dpll_pin_get+0x2c4/0x330
? report_bug+0x1b7/0x1d0
? handle_bug+0x42/0x70
? exc_invalid_op+0x18/0x70
? asm_exc_invalid_op+0x1a/0x20
? dpll_pin_get+0x117/0x330
? dpll_pin_get+0x2c4/0x330
? dpll_pin_get+0x117/0x330
ice_dpll_get_pins.isra.0+0x52/0xe0 [ice]
...
The number of dpll pins enabled by LOM vendor is greater than expected
and defined in the driver for Intel designed NICs, which causes the crash.
Prevent the crash and allow generic pin initialization within Linux DPLL
subsystem for DPLL enabled E810 LOM designs.
Newly designed solution for described issue will be based on "per HW
design" pin initialization. It requires pin information dynamically
acquired from the firmware and is already in progress, planned for
next-tree only.
Fixes: d7999f5ea6 ("ice: implement dpll interface to control cgu")
Reviewed-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There is no support for SF in legacy mode. Reflect it in the code.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Fixes: eda69d654c ("ice: add basic devlink subfunctions support")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
During testing of SR-IOV, Red Hat QE encountered an issue where the
ip link up command intermittently fails for the igbvf interfaces when
using the PREEMPT_RT variant. Investigation revealed that
e1000_write_posted_mbx returns an error due to the lack of an ACK
from e1000_poll_for_ack.
The underlying issue arises from the fact that IRQs are threaded by
default under PREEMPT_RT. While the exact hardware details are not
available, it appears that the IRQ handled by igb_msix_other must
be processed before e1000_poll_for_ack times out. However,
e1000_write_posted_mbx is called with preemption disabled, leading
to a scenario where the IRQ is serviced only after the failure of
e1000_write_posted_mbx.
To resolve this, we set IRQF_NO_THREAD for the affected interrupt,
ensuring that the kernel handles it immediately, thereby preventing
the aforementioned error.
Reproducer:
#!/bin/bash
# echo 2 > /sys/class/net/ens14f0/device/sriov_numvfs
ipaddr_vlan=3
nic_test=ens14f0
vf=${nic_test}v0
while true; do
ip link set ${nic_test} mtu 1500
ip link set ${vf} mtu 1500
ip link set $vf up
ip link set ${nic_test} vf 0 vlan ${ipaddr_vlan}
ip addr add 172.30.${ipaddr_vlan}.1/24 dev ${vf}
ip addr add 2021:db8:${ipaddr_vlan}::1/64 dev ${vf}
if ! ip link show $vf | grep 'state UP'; then
echo 'Error found'
break
fi
ip link set $vf down
done
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Fixes: 9d5c824399 ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Reported-by: Yuying Ma <yuma@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
During driver initialization VF determines QOS capability is allowed
by PF and receives QOS parameters. After which quanta size for queues
is configured which is not configurable and is set to 1KB currently.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/72cbeb9c88d40e557053c57d7531c96bed490576.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement net_shaper_ops support for IAVF. This enables configuration
of rate limiting on per queue basis. Customer intends to enforce
bandwidth limit on Tx traffic steered to the queue by configuring
rate limits on the queue.
To set rate limiting for a queue, update shaper object of given queues
in driver and send VIRTCHNL_OP_CONFIG_QUEUE_BW to PF to update HW
configuration.
Deleting shaper configured for queue is nothing but configuring shaper
with bw_max 0. The PF restores the default rate limiting config
when bw_max is zero.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/5a882cb51998c4c2c3d21fed521498eba1c8f079.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support to configure VF queue rate limit and quanta size.
For quanta size configuration, the quanta profiles are divided evenly
by PF numbers. For each port, the first quanta profile is reserved for
default. When VF is asked to set queue quanta size, PF will search for
an available profile, change the fields and assigned this profile to the
queue.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/fddefc2c1ec3ab32b241ce444af401da19e834dd.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for netdev-genl, allowing users to query IRQ, NAPI, and queue
information.
After this patch is applied, note the IRQs assigned to my NIC:
$ cat /proc/interrupts | grep ens | cut -f1 --delimiter=':'
50
51
52
While e1000e allocates 3 IRQs (RX, TX, and other), it looks like e1000e
only has a single NAPI, so I've associated the NAPI with the RX IRQ (50
on my system, seen above).
Note the output from the cli:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 2}'
[{'id': 145, 'ifindex': 2, 'irq': 50}]
This device supports only 1 rx and 1 tx queue. so querying that:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 145, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'napi-id': 145, 'type': 'tx'}]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Duplicated register initialization codes exist in e1000_configure_tx()
and e1000_configure_rx().
For example, writel(0, tx_ring->head) writes 0 to tx_ring->head, which
is adapter->hw.hw_addr + E1000_TDH(0).
This initialization is already done in ew32(TDH(0), 0).
ew32(TDH(0), 0) is equivalent to __ew32(hw, E1000_TDH(0), 0). It
executes writel(0, hw->hw_addr + E1000_TDH(0)). Since variable hw is
set to &adapter->hw, it is equal to writel(0, tx_ring->head).
We can remove similar four writel() in e1000_configure_tx() and
e1000_configure_rx().
commit 0845d45e90 ("e1000e: Modify Tx/Rx configurations to avoid
null pointer dereferences in e1000_open") has introduced these
writel(). This commit moved register writing to
e1000_configure_tx/rx(), and as result, it caused duplication in
e1000_configure_tx/rx().
This patch modifies the sequence of register writing, but removing
these writes is safe because the same writes were already there before
the commit.
I also have checked the datasheets [0] [1] and have not found any
description that we need to write RDH, RDT, TDH and TDT registers
twice at initialization. Furthermore, we have tested this patch on an
I219-V device physically.
Link: https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82577-gbe-phy-datasheet.pdf [0]
Link: https://www.intel.com/content/www/us/en/content-details/613460/intel-82583v-gbe-controller-datasheet.html [1]
Tested-by: Kohei Enju <enjuk@amazon.com>
Signed-off-by: Takamitsu Iwai <takamitz@amazon.co.jp>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
e1000_init_function_pointers_82575() is never implemented and used since
commit 9d5c824399 ("igb: PCI-Express 82575 Gigabit Ethernet driver").
And commit 9835fd7321 ("igb: Add new function to read part number from
EEPROM in string format") removed igb_read_part_num() implementation.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There is no caller and implementation in tree.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Since commit fff292b47a ("ice: add VF representors one by one")
ice_eswitch_configure() is not used anymore.
Commit 1b8f15b64a ("ice: refactor filter functions") removed
ice_vsi_cfg_mac_fltr() but leave declaration.
Commit a24b4c6e9a ("ice: xsk: Do not convert to buff to frame for
XDP_TX") leave ice_xmit_xdp_buff() declaration.
Commit 7cab44f1c3 ("ice: Introduce ETH56G PHY model for E825C
products") declared ice_phy_cfg_{rx,tx}_offset_eth56g(),
commit a1ffafb0b4 ("ice: Support configuring the device to Double
VLAN Mode") declared ice_pkg_buf_get_free_space(), and
commit 8a3a565ff2 ("ice: add admin commands to access cgu
configuration") declared ice_is_pca9575_present(), but all these never
be implemented.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Sporadic issues, such as PHY access loss, have been observed on I219 (19)
devices. It was found that these devices have hardware more closely
related to ADP than MTP and the issues were caused by taking MTP-specific
flows.
Change the MAC and board types of these devices from MTP to ADP to
correctly reflect the LAN hardware, and flows, of these devices.
Fixes: db2d737d63 ("e1000e: Separate MTP board type from ADP")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This patch addresses a macvlan leak issue in the i40e driver caused by
concurrent access to vsi->mac_filter_hash. The leak occurs when multiple
threads attempt to modify the mac_filter_hash simultaneously, leading to
inconsistent state and potential memory leaks.
To fix this, we now wrap the calls to i40e_del_mac_filter() and zeroing
vf->default_lan_addr.addr with spin_lock/unlock_bh(&vsi->mac_filter_hash_lock),
ensuring atomic operations and preventing concurrent access.
Additionally, we add lockdep_assert_held(&vsi->mac_filter_hash_lock) in
i40e_add_mac_filter() to help catch similar issues in the future.
Reproduction steps:
1. Spawn VFs and configure port vlan on them.
2. Trigger concurrent macvlan operations (e.g., adding and deleting
portvlan and/or mac filters).
3. Observe the potential memory leak and inconsistent state in the
mac_filter_hash.
This synchronization ensures the integrity of the mac_filter_hash and prevents
the described leak.
Fixes: fed0d9f132 ("i40e: Fix VF's MAC Address change on VM")
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add jump targets so that a bit of exception handling can be better reused
at the end of two function implementations.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
We have for some time the assign_bit() API to replace open coded
if (foo)
set_bit(n, bar);
else
clear_bit(n, bar);
Use this API to clean the code. No functional change intended.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The max_frame and rx_buf_len fields of the VSI set the maximum frame size
for packets on the wire, and configure the size of the Rx buffer. In the
hardware, these are per-queue configuration. Most VSI types use a simple
method to determine the size of the buffers for all queues.
However, VFs may potentially configure different values for each queue.
While the Linux iAVF driver does not do this, it is allowed by the virtchnl
interface.
The current virtchnl code simply sets the per-VSI fields inbetween calls to
ice_vsi_cfg_single_rxq(). This technically works, as these fields are only
ever used when programming the Rx ring, and otherwise not checked again.
However, it is confusing to maintain.
The Rx ring also already has an rx_buf_len field in order to access the
buffer length in the hotpath. It also has extra unused bytes in the ring
structure which we can make use of to store the maximum frame size.
Drop the VSI max_frame and rx_buf_len fields. Add max_frame to the Rx ring,
and slightly re-order rx_buf_len to better fit into the gaps in the
structure layout.
Change the ice_vsi_cfg_frame_size function so that it writes to the ring
fields. Call this function once per ring in ice_vsi_cfg_rxqs(). This is
done over calling it inside the ice_vsi_cfg_rxq(), because
ice_vsi_cfg_rxq() is called in the virtchnl flow where the max_frame and
rx_buf_len have already been configured.
Change the accesses for rx_buf_len and max_frame to all point to the ring
structure. This has the added benefit that ice_vsi_cfg_rxq() no longer has
the surprise side effect of updating ring->rx_buf_len based on the VSI
field.
Update the virtchnl ice_vc_cfg_qs_msg() function to set the ring values
directly, and drop references to the removed VSI fields.
This now makes the VF logic clear, as the ring fields are obviously
per-queue. This reduces the required cognitive load when reasoning about
this logic.
Note that removing the VSI fields does leave a 4 byte gap, but the ice_vsi
structure has many gaps, and its layout is not as critical in the hot path.
The structure may benefit from a more thorough repacking, but no attempt
was made in this change.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_vc_cfg_qs_msg() function is used to configure VF queues in response
to a VIRTCHNL_OP_CONFIG_VSI_QUEUES command.
The virtchnl command contains an array of queue pair data for configuring
Tx and Rx queues. This data includes a queue ID. When configuring the
queues, the driver generally uses this queue ID to determine which Tx and
Rx ring to program. However, a handful of places use the index into the
queue pair data from the VF. While most VF implementations appear to send
this data in order, it is not mandated by the virtchnl and it is not
verified that the queue pair data comes in order.
Fix the driver to consistently use the q_idx field instead of the 'i'
iterator value when accessing the rings. For the Rx case, introduce a local
ring variable to keep lines short.
Fixes: 7ad15440ac ("ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
E830 adds hardware support to prevent the VF from overflowing the PF
mailbox with VIRTCHNL messages. E830 will use the hardware feature
(ICE_F_MBX_LIMIT) instead of the software solution ice_is_malicious_vf().
To prevent a VF from overflowing the PF, the PF sets the number of
messages per VF that can be in the PF's mailbox queue
(ICE_MBX_OVERFLOW_WATERMARK). When the PF processes a message from a VF,
the PF decrements the per VF message count using the E830_MBX_VF_DEC_TRIG
register.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Increasing MSI-X value on a VF leads to invalid memory operations. This
is caused by not reallocating some arrays.
Reproducer:
modprobe ice
echo 0 > /sys/bus/pci/devices/$PF_PCI/sriov_drivers_autoprobe
echo 1 > /sys/bus/pci/devices/$PF_PCI/sriov_numvfs
echo 17 > /sys/bus/pci/devices/$VF0_PCI/sriov_vf_msix_count
Default MSI-X is 16, so 17 and above triggers this issue.
KASAN reports:
BUG: KASAN: slab-out-of-bounds in ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
Read of size 8 at addr ffff8888b937d180 by task bash/28433
(...)
Call Trace:
(...)
? ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
kasan_report+0xed/0x120
? ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
ice_vsi_cfg_def+0x3360/0x4770 [ice]
? mutex_unlock+0x83/0xd0
? __pfx_ice_vsi_cfg_def+0x10/0x10 [ice]
? __pfx_ice_remove_vsi_lkup_fltr+0x10/0x10 [ice]
ice_vsi_cfg+0x7f/0x3b0 [ice]
ice_vf_reconfig_vsi+0x114/0x210 [ice]
ice_sriov_set_msix_vec_count+0x3d0/0x960 [ice]
sriov_vf_msix_count_store+0x21c/0x300
(...)
Allocated by task 28201:
(...)
ice_vsi_cfg_def+0x1c8e/0x4770 [ice]
ice_vsi_cfg+0x7f/0x3b0 [ice]
ice_vsi_setup+0x179/0xa30 [ice]
ice_sriov_configure+0xcaa/0x1520 [ice]
sriov_numvfs_store+0x212/0x390
(...)
To fix it, use ice_vsi_rebuild() instead of ice_vf_reconfig_vsi(). This
causes the required arrays to be reallocated taking the new queue count
into account (ice_vsi_realloc_stat_arrays()). Set req_txq and req_rxq
before ice_vsi_rebuild(), so that realloc uses the newly set queue
count.
Additionally, ice_vsi_rebuild() does not remove VSI filters
(ice_fltr_remove_all()), so ice_vf_init_host_cfg() is no longer
necessary.
Reported-by: Jacob Keller <jacob.e.keller@intel.com>
Fixes: 2a2cb4c6c1 ("ice: replace ice_vf_recreate_vsi() with ice_vf_reconfig_vsi()")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Triggering the reset while in switchdev mode causes
errors[1]. Rules are already removed by this time
because switch content is flushed in case of the reset.
This means that rules were deleted from HW but SW
still thinks they exist so when we get
SWITCHDEV_FDB_DEL_TO_DEVICE notification we try to
delete not existing rule.
We can avoid these errors by clearing the rules
early in the reset flow before they are removed from HW.
Switchdev API will get notified that the rule was removed
so we won't get SWITCHDEV_FDB_DEL_TO_DEVICE notification.
Remove unnecessary ice_clear_sw_switch_recipes.
[1]
ice 0000:01:00.0: Failed to delete FDB forward rule, err: -2
ice 0000:01:00.0: Failed to delete FDB guard rule, err: -2
Fixes: 7c945a1a8e ("ice: Switchdev FDB events support")
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
netif_is_ice() works by checking the pointer to netdev ops. However, it
only checks for the default ice_netdev_ops, not ice_netdev_safe_mode_ops,
so in Safe Mode it always returns false, which is unintuitive. While it
doesn't look like netif_is_ice() is currently being called anywhere in Safe
Mode, this could change and potentially lead to unexpected behaviour.
Fixes: df006dd4b1 ("ice: Add initial support framework for LAG")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
If DDP package is missing or corrupted, the driver should enter Safe Mode.
Instead, an error is returned and probe fails.
To fix this, don't exit init if ice_init_ddp_config() returns an error.
Repro:
* Remove or rename DDP package (/lib/firmware/intel/ice/ddp/ice.pkg)
* Load ice
Fixes: cc5776fe18 ("ice: Enable switching default Tx scheduler topology")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The sizeof(struct napi_struct) can change. Don't hardcode the size to
400 bytes and instead use "sizeof(struct napi_struct)".
Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20241004105407.73585-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-10-01 (ice)
This series contains updates to ice driver only.
Karol cleans up current PTP GPIO pin handling, fixes minor bugs,
refactors implementation for all products, introduces SDP (Software
Definable Pins) for E825C and implements reading SDP section from NVM
for E810 products.
Sergey replaces multiple aux buses and devices used in the PTP support
code with struct ice_adapter holding the necessary shared data.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Drop auxbus use for PTP to finalize ice_adapter move
ice: Use ice_adapter for PTP shared data instead of auxdev
ice: Initial support for E825C hardware in ice_adapter
ice: Add ice_get_ctrl_ptp() wrapper to simplify the code
ice: Introduce ice_get_phy_model() wrapper
ice: Enable 1PPS out from CGU for E825C products
ice: Read SDP section from NVM for pin definitions
ice: Disable shared pin on E810 on setfunc
ice: Cache perout/extts requests and check flags
ice: Align E810T GPIO to other products
ice: Add SDPs support for E825C
ice: Implement ice_ptp_pin_desc
====================
Link: https://patch.msgid.link/20241001201702.3252954-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-09-30 (ice, idpf)
This series contains updates to ice and idpf drivers:
For ice:
Michal corrects setting of dst VSI on LAN filters and adds clearing of
port VLAN configuration during reset.
Gui-Dong Han corrects failures to decrement refcount in some error
paths.
Przemek resolves a memory leak in ice_init_tx_topology().
Arkadiusz prevents setting of DPLL_PIN_STATE_SELECTABLE to an improper
value.
Dave stops clearing of VLAN tracking bit to allow for VLANs to be properly
restored after reset.
For idpf:
Ahmed sets uninitialized dyn_ctl_intrvl_s value.
Josh corrects use and reporting of mailbox size.
Larysa corrects order of function calls during de-initialization.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
idpf: deinit virtchnl transaction manager after vport and vectors
idpf: use actual mbx receive payload length
idpf: fix VF dynamic interrupt ctl register initialization
ice: fix VLAN replay after reset
ice: disallow DPLL_PIN_STATE_SELECTABLE for dpll output pins
ice: fix memleak in ice_init_tx_topology()
ice: clear port vlan config during reset
ice: Fix improper handling of refcount in ice_sriov_set_msix_vec_count()
ice: Fix improper handling of refcount in ice_dpll_init_rclk_pins()
ice: set correct dst VSI in only LAN filters
====================
Link: https://patch.msgid.link/20240930223601.3137464-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
Drop unused auxbus/auxdev support from the PTP code due to
move to the ice_adapter.
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use struct ice_adapter to hold shared PTP data and control PTP
related actions instead of auxbus. This allows significant code
simplification and faster access to the container fields used in
the PTP support code.
Move the PTP port list to the ice_adapter container to simplify
the code and avoid race conditions which could occur due to the
synchronous nature of the initialization/access and
certain memory saving can be achieved by moving PTP data into
the ice_adapter itself.
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Address E825C devices by PCI ID since dual IP core configurations
need 1 ice_adapter for both devices.
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add ice_get_ctrl_ptp() wrapper to simplify the PTP support code
in the functions that do not use ctrl_pf directly.
Add the control PF pointer to struct ice_adapter
Rearrange fields in struct ice_adapter
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement configuring 1PPS signal output from CGU. Use maximal amplitude
because Linux PTP pin API does not have any way for user to set signal
level.
This change is necessary for E825C products to properly output any
signal from 1PPS pin.
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
PTP pins assignment and their related SDPs (Software Definable Pins) are
currently hardcoded.
Fix that by reading NVM section instead on products supporting this,
which are E810 products.
If SDP section is not defined in NVM, the driver continues to use the
hardcoded table.
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Yochai Hagvi <yochai.hagvi@intel.com>
Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When setting a new supported function for a pin on E810, disable other
enabled pin that shares the same GPIO.
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Cache original PTP GPIO requests instead of saving each parameter in
internal structures for periodic output or external timestamp request.
Factor out all periodic output register writes from ice_ptp_cfg_clkout
to a separate function to improve readability.
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Instead of having separate PTP GPIO implementation for E810T, use
existing one from all other products.
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add support of PTP SDPs (Software Definable Pins) for E825C products.
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add a new internal structure describing PTP pins.
Use the new structure for all non-E810T products.
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When the device is removed, idpf is supposed to make certain virtchnl
requests e.g. VIRTCHNL2_OP_DEALLOC_VECTORS and VIRTCHNL2_OP_DESTROY_VPORT.
However, this does not happen due to the referenced commit introducing
virtchnl transaction manager and placing its deinitialization before those
messages are sent. Then the sending is impossible due to no transactions
being available.
Lack of cleanup can lead to the FW becoming unresponsive from e.g.
unloading-loading the driver and creating-destroying VFs afterwards.
Move transaction manager deinitialization to after other virtchnl-related
cleanup is done.
Fixes: 34c21fa894 ("idpf: implement virtchnl transaction manager")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When a mailbox message is received, the driver is checking for a non 0
datalen in the controlq descriptor. If it is valid, the payload is
attached to the ctlq message to give to the upper layer. However, the
payload response size given to the upper layer was taken from the buffer
metadata which is _always_ the max buffer size. This meant the API was
returning 4K as the payload size for all messages. This went unnoticed
since the virtchnl exchange response logic was checking for a response
size less than 0 (error), not less than exact size, or not greater than
or equal to the max mailbox buffer size (4K). All of these checks will
pass in the success case since the size provided is always 4K. However,
this breaks anyone that wants to validate the exact response size.
Fetch the actual payload length from the value provided in the
descriptor data_len field (instead of the buffer metadata).
Unfortunately, this means we lose some extra error parsing for variable
sized virtchnl responses such as create vport and get ptypes. However,
the original checks weren't really helping anyways since the size was
_always_ 4K.
Fixes: 34c21fa894 ("idpf: implement virtchnl transaction manager")
Cc: stable@vger.kernel.org # 6.9+
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There is a bug currently when there are more than one VLAN defined
and any reset that affects the PF is initiated, after the reset rebuild
no traffic will pass on any VLAN but the last one created.
This is caused by the iteration though the VLANs during replay each
clearing the vsi_map bitmap of the VSI that is being replayed. The
problem is that during rhe replay, the pointer to the vsi_map bitmap
is used by each successive vlan to determine if it should be replayed
on this VSI.
The logic was that the replay of the VLAN would replace the bit in the map
before the next VLAN would iterate through. But, since the replay copies
the old bitmap pointer to filt_replay_rules and creates a new one for the
recreated VLANS, it does not do this, and leaves the old bitmap broken
to be used to replay the remaining VLANs.
Since the old bitmap will be cleaned up in post replay cleanup, there is
no need to alter it and break following VLAN replay, so don't clear the
bit.
Fixes: 334cb0626d ("ice: Implement VSI replay framework")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently the user may request DPLL_PIN_STATE_SELECTABLE for an output
pin, and this would actually set the DISCONNECTED state instead.
It doesn't make any sense. SELECTABLE is valid only in case of input pins
(on AUTOMATIC type dpll), where dpll itself would select best valid input.
For the output pin only CONNECTED/DISCONNECTED are expected.
Fixes: d7999f5ea6 ("ice: implement dpll interface to control cgu")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix leak of the FW blob (DDP pkg).
Make ice_cfg_tx_topo() const-correct, so ice_init_tx_topology() can avoid
copying whole FW blob. Copy just the topology section, and only when
needed. Reuse the buffer allocated for the read of the current topology.
This was found by kmemleak, with the following trace for each PF:
[<ffffffff8761044d>] kmemdup_noprof+0x1d/0x50
[<ffffffffc0a0a480>] ice_init_ddp_config+0x100/0x220 [ice]
[<ffffffffc0a0da7f>] ice_init_dev+0x6f/0x200 [ice]
[<ffffffffc0a0dc49>] ice_init+0x29/0x560 [ice]
[<ffffffffc0a10c1d>] ice_probe+0x21d/0x310 [ice]
Constify ice_cfg_tx_topo() @buf parameter.
This cascades further down to few more functions.
Fixes: cc5776fe18 ("ice: Enable switching default Tx scheduler topology")
CC: Larysa Zaremba <larysa.zaremba@intel.com>
CC: Jacob Keller <jacob.e.keller@intel.com>
CC: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
CC: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Since commit 2a2cb4c6c1 ("ice: replace ice_vf_recreate_vsi() with
ice_vf_reconfig_vsi()") VF VSI is only reconfigured instead of
recreated. The context configuration from previous setting is still the
same. If any of the config needs to be cleared it needs to be cleared
explicitly.
Previously there was assumption that port vlan will be cleared
automatically. Now, when VSI is only reconfigured we have to do it in the
code.
Not clearing port vlan configuration leads to situation when the driver
VSI config is different than the VSI config in HW. Traffic can't be
passed after setting and clearing port vlan, because of invalid VSI
config in HW.
Example reproduction:
> ip a a dev $(VF) $(VF_IP_ADDRESS)
> ip l s dev $(VF) up
> ping $(VF_IP_ADDRESS)
ping is working fine here
> ip link set eth5 vf 0 vlan 100
> ip link set eth5 vf 0 vlan 0
> ping $(VF_IP_ADDRESS)
ping isn't working
Fixes: 2a2cb4c6c1 ("ice: replace ice_vf_recreate_vsi() with ice_vf_reconfig_vsi()")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Piotr Tyda <piotr.tyda@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This patch addresses an issue with improper reference count handling in the
ice_sriov_set_msix_vec_count() function.
First, the function calls ice_get_vf_by_id(), which increments the
reference count of the vf pointer. If the subsequent call to
ice_get_vf_vsi() fails, the function currently returns an error without
decrementing the reference count of the vf pointer, leading to a reference
count leak. The correct behavior, as implemented in this patch, is to
decrement the reference count using ice_put_vf(vf) before returning an
error when vsi is NULL.
Second, the function calls ice_sriov_get_irqs(), which sets
vf->first_vector_idx. If this call returns a negative value, indicating an
error, the function returns an error without decrementing the reference
count of the vf pointer, resulting in another reference count leak. The
patch addresses this by adding a call to ice_put_vf(vf) before returning
an error when vf->first_vector_idx < 0.
This bug was identified by an experimental static analysis tool developed
by our team. The tool specializes in analyzing reference count operations
and identifying potential mismanagement of reference counts. In this case,
the tool flagged the missing decrement operation as a potential issue,
leading to this patch.
Fixes: 4035c72dc1 ("ice: reconfig host after changing MSI-X on VF")
Fixes: 4d38cb44bd ("ice: manage VFs MSI-X using resource tracking")
Cc: stable@vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02@outlook.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This patch addresses a reference count handling issue in the
ice_dpll_init_rclk_pins() function. The function calls ice_dpll_get_pins(),
which increments the reference count of the relevant resources. However,
if the condition WARN_ON((!vsi || !vsi->netdev)) is met, the function
currently returns an error without properly releasing the resources
acquired by ice_dpll_get_pins(), leading to a reference count leak.
To resolve this, the check has been moved to the top of the function. This
ensures that the function verifies the state before any resources are
acquired, avoiding the need for additional resource management in the
error path.
This bug was identified by an experimental static analysis tool developed
by our team. The tool specializes in analyzing reference count operations
and detecting potential issues where resources are not properly managed.
In this case, the tool flagged the missing release operation as a
potential problem, which led to the development of this patch.
Fixes: d7999f5ea6 ("ice: implement dpll interface to control cgu")
Cc: stable@vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02@outlook.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The filters set that will reproduce the problem:
$ tc filter add dev $VF0_PR ingress protocol arp prio 0 flower \
skip_sw dst_mac ff:ff:ff:ff:ff:ff action mirred egress \
redirect dev $PF0
$ tc filter add dev $VF0_PR ingress protocol arp prio 0 flower \
skip_sw dst_mac ff:ff:ff:ff:ff:ff src_mac 52:54:00:00:00:10 \
action mirred egress mirror dev $VF1_PR
Expected behaviour is to set all broadcast from VF0 to the LAN. If the
src_mac match the value from filters, send packet to LAN and to VF1.
In this case both LAN_EN and LB_EN flags in switch is set in case of
packet matching both filters. As dst VSI for the only LAN enable bit is
PF VSI, the packet is being seen on PF. To fix this change dst VSI to
the source VSI. It will block receiving any packet even when LB_EN is
set by switch, because local loopback is clear on VF VSI during normal
operation.
Side note: if the second filters action is redirect instead of mirror
LAN_EN is clear, because switch is AND-ing LAN_EN from each matched
filters and OR-ing LB_EN.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Fixes: 73b483b790 ("ice: Manage act flags for switchdev offloads")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_allocate_sf() function returns error pointers on error. It
doesn't return NULL. Update the check to match.
Fixes: 177ef7f1e2 ("ice: base subfunction aux driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/6951d217-ac06-4482-a35d-15d757fd90a3@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ice_repr_create() function returns error pointers. It never returns
NULL. Fix the callers to check for IS_ERR().
Fixes: 977514fb0f ("ice: create port representor for SF")
Fixes: 415db8399d ("ice: make representor code generic")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/7f7aeb91-8771-47b8-9275-9d9f64f947dd@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR.
No conflicts (sort of) and no adjacent changes.
This merge reverts commit b3c9e65eb2 ("net: hsr: remove seqnr_lock")
from net, as it was superseded by
commit 430d67bdcb ("net: hsr: Use the seqnr lock for frames received via interlink port.")
in net-next.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
idpf: XDP chapter II: convert Tx completion to libeth
Alexander Lobakin says:
XDP for idpf is currently 5 chapters:
* convert Rx to libeth;
* convert Tx completion to libeth (this);
* generic XDP and XSk code changes;
* actual XDP for idpf via libeth_xdp;
* XSk for idpf (^).
Part II does the following:
* adds generic libeth Tx completion routines;
* converts idpf to use generic libeth Tx comp routines;
* fixes Tx queue timeouts and robustifies Tx completion in general;
* fixes Tx event/descriptor flushes (writebacks).
Most idpf patches again remove more lines than adds.
Generic Tx completion helpers and structs are needed as libeth_xdp
(Ch. III) makes use of them. WB_ON_ITR is needed since XDPSQs don't
want to work without it at all. Tx queue timeouts fixes are needed
since without them, it's way easier to catch a Tx timeout event when
WB_ON_ITR is enabled.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
idpf: enable WB_ON_ITR
idpf: fix netdev Tx queue stop/wake
idpf: refactor Tx completion routines
netdevice: add netdev_tx_reset_subqueue() shorthand
idpf: convert to libeth Tx buffer completion
libeth: add Tx buffer completion helpers
====================
Link: https://patch.msgid.link/20240909205323.3110312-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tell hardware to write back completed descriptors even when interrupts
are disabled. Otherwise, descriptors might not be written back until
the hardware can flush a full cacheline of descriptors. This can cause
unnecessary delays when traffic is light (or even trigger Tx queue
timeout).
The example scenario to reproduce the Tx timeout if the fix is not
applied:
- configure at least 2 Tx queues to be assigned to the same q_vector,
- generate a huge Tx traffic on the first Tx queue
- try to send a few packets using the second Tx queue.
In such a case Tx timeout will appear on the second Tx queue because no
completion descriptors are written back for that queue while interrupts
are disabled due to NAPI polling.
Fixes: c2d548cad1 ("idpf: add TX splitq napi poll support")
Fixes: a5ab9ee0df ("idpf: add singleq start_xmit and napi poll")
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
netif_txq_maybe_stop() returns -1, 0, or 1, while
idpf_tx_maybe_stop_common() says it returns 0 or -EBUSY. As a result,
there sometimes are Tx queue timeout warnings despite that the queue
is empty or there is at least enough space to restart it.
Make idpf_tx_maybe_stop_common() inline and returning true or false,
handling the return of netif_txq_maybe_stop() properly. Use a correct
goto in idpf_tx_maybe_stop_splitq() to avoid stopping the queue or
incrementing the stops counter twice.
Fixes: 6818c4d5b3 ("idpf: add splitq start_xmit")
Fixes: a5ab9ee0df ("idpf: add singleq start_xmit and napi poll")
Cc: stable@vger.kernel.org # 6.7+
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add a mechanism to guard against stashing partial packets into the hash
table to make the driver more robust, with more efficient decision
making when cleaning.
Don't stash partial packets. This can happen when an RE (Report Event)
completion is received in flow scheduling mode, or when an out of order
RS (Report Status) completion is received. The first buffer with the skb
is stashed, but some or all of its frags are not because the stack is
out of reserve buffers. This leaves the ring in a weird state since
the frags are still on the ring.
Use the field libeth_sqe::nr_frags to track the number of
fragments/tx_bufs representing the packet. The clean routines check to
make sure there are enough reserve buffers on the stack before stashing
any part of the packet. If there are not, next_to_clean is left pointing
to the first buffer of the packet that failed to be stashed. This leaves
the whole packet on the ring, and the next time around, cleaning will
start from this packet.
An RS completion is still expected for this packet in either case. So
instead of being cleaned from the hash table, it will be cleaned from
the ring directly. This should all still be fine since the DESC_UNUSED
and BUFS_UNUSED will reflect the state of the ring. If we ever fall
below the thresholds, the TxQ will still be stopped, giving the
completion queue time to catch up. This may lead to stopping the queue
more frequently, but it guarantees the Tx ring will always be in a good
state.
Also, always use the idpf_tx_splitq_clean function to clean descriptors,
i.e. use it from clean_buf_ring as well. This way we avoid duplicating
the logic and make sure we're using the same reserve buffers guard rail.
This does require a switch from the s16 next_to_clean overflow
descriptor ring wrap calculation to u16 and the normal ring size check.
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
&idpf_tx_buffer is almost identical to the previous generations, as well
as the way it's handled. Moreover, relying on dma_unmap_addr() and
!!buf->skb instead of explicit defining of buffer's type was never good.
Use the newly added libeth helpers to do it properly and reduce the
copy-paste around the Tx code.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Always call igb_xdp_ring_update_tail() under __netif_tx_lock, add a comment
and lockdep assert to indicate that. This is needed to share the same TX
ring between XDP, XSK and slow paths. Furthermore, the current XDP
implementation is racy on tail updates.
Fixes: 9cbc948b5a ("igb: add XDP support")
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
[Kurt: Add lockdep assert and fixes tag]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The description of function ice_find_vsi_list_entry says:
Search VSI list map with VSI count 1
However, since the blamed commit (see Fixes below), the function no
longer checks vsi_count. This causes a problem in ice_add_vlan_internal,
where the decision to share VSI lists between filter rules relies on the
vsi_count of the found existing VSI list being 1.
The reproducing steps:
1. Have a PF and two VFs.
There will be a filter rule for VLAN 0, referring to a VSI list
containing VSIs: 0 (PF), 2 (VF#0), 3 (VF#1).
2. Add VLAN 1234 to VF#0.
ice will make the wrong decision to share the VSI list with the new
rule. The wrong behavior may not be immediately apparent, but it can
be observed with debug prints.
3. Add VLAN 1234 to VF#1.
ice will unshare the VSI list for the VLAN 1234 rule. Due to the
earlier bad decision, the newly created VSI list will contain
VSIs 0 (PF) and 3 (VF#1), instead of expected 2 (VF#0) and 3 (VF#1).
4. Try pinging a network peer over the VLAN interface on VF#0.
This fails.
Reproducer script at:
https://gitlab.com/mschmidt2/repro/-/blob/master/RHEL-46814/test-vlan-vsi-list-confusion.sh
Commented debug trace:
https://gitlab.com/mschmidt2/repro/-/blob/master/RHEL-46814/ice-vlan-vsi-lists-debug.txt
Patch adding the debug prints:
f8a8814623
(Unsafe, by the way. Lacks rule_lock when dumping in ice_remove_vlan.)
Michal Swiatkowski added to the explanation that the bug is caused by
reusing a VSI list created for VLAN 0. All created VFs' VSIs are added
to VLAN 0 filter. When a non-zero VLAN is created on a VF which is already
in VLAN 0 (normal case), the VSI list from VLAN 0 is reused.
It leads to a problem because all VFs (VSIs to be specific) that are
subscribed to VLAN 0 will now receive a new VLAN tag traffic. This is
one bug, another is the bug described above. Removing filters from
one VF will remove VLAN filter from the previous VF. It happens a VF is
reset. Example:
- creation of 3 VFs
- we have VSI list (used for VLAN 0) [0 (pf), 2 (vf1), 3 (vf2), 4 (vf3)]
- we are adding VLAN 100 on VF1, we are reusing the previous list
because 2 is there
- VLAN traffic works fine, but VLAN 100 tagged traffic can be received
on all VSIs from the list (for example broadcast or unicast)
- trust is turning on VF2, VF2 is resetting, all filters from VF2 are
removed; the VLAN 100 filter is also removed because 3 is on the list
- VLAN traffic to VF1 isn't working anymore, there is a need to recreate
VLAN interface to readd VLAN filter
One thing I'm not certain about is the implications for the LAG feature,
which is another caller of ice_find_vsi_list_entry. I don't have a
LAG-capable card at hand to test.
Fixes: 23ccae5ce1 ("ice: changes to the interface with the HW and FW for SRIOV_VF+LAG")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Dave Ertman <David.m.ertman@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Our driver uses devres to manage resources, in particular we call
pcim_enable_device(), what also means we express the intent to get
automatic pci_disable_device() call at driver removal. Manual calls to
pci_disable_device() misuse the API.
Recent commit (see "Fixes" tag) has changed the removal action from
conditional (silent ignore of double call to pci_disable_device()) to
unconditional, but able to catch unwanted redundant calls; see cited
"Fixes" commit for details.
Since that, unloading the driver yields following warn+splat:
[70633.628490] ice 0000:af:00.7: disabling already-disabled device
[70633.628512] WARNING: CPU: 52 PID: 33890 at drivers/pci/pci.c:2250 pci_disable_device+0xf4/0x100
...
[70633.628744] ? pci_disable_device+0xf4/0x100
[70633.628752] release_nodes+0x4a/0x70
[70633.628759] devres_release_all+0x8b/0xc0
[70633.628768] device_unbind_cleanup+0xe/0x70
[70633.628774] device_release_driver_internal+0x208/0x250
[70633.628781] driver_detach+0x47/0x90
[70633.628786] bus_remove_driver+0x80/0x100
[70633.628791] pci_unregister_driver+0x2a/0xb0
[70633.628799] ice_module_exit+0x11/0x3a [ice]
Note that this is the only Intel ethernet driver that needs such fix.
Fixes: f748a07a0b ("PCI: Remove legacy pcim_release()")
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When adding a switch filter (such as a MAC or VLAN filter), it is expected
that the driver will detect the case where the filter already exists, and
return -EEXIST. This is used by calling code such as ice_vc_add_mac_addr,
and ice_vsi_add_vlan to avoid incrementing the accounting fields such as
vsi->num_vlan or vf->num_mac.
This logic works correctly for the case where only a single VSI has added a
given switch filter.
When a second VSI adds the same switch filter, the driver converts the
existing filter from an ICE_FWD_TO_VSI filter into an ICE_FWD_TO_VSI_LIST
filter. This saves switch resources, by ensuring that multiple VSIs can
re-use the same filter.
The ice_add_update_vsi_list() function is responsible for doing this
conversion. When first converting a filter from the FWD_TO_VSI into
FWD_TO_VSI_LIST, it checks if the VSI being added is the same as the
existing rule's VSI. In such a case it returns -EEXIST.
However, when the switch rule has already been converted to a
FWD_TO_VSI_LIST, the logic is different. Adding a new VSI in this case just
requires extending the VSI list entry. The logic for checking if the rule
already exists in this case returns 0 instead of -EEXIST.
This breaks the accounting logic mentioned above, so the counters for how
many MAC and VLAN filters exist for a given VF or VSI no longer accurately
reflect the actual count. This breaks other code which relies on these
counts.
In typical usage this primarily affects such filters generally shared by
multiple VSIs such as VLAN 0, or broadcast and multicast MAC addresses.
Fix this by correctly reporting -EEXIST in the case of adding the same VSI
to a switch rule already converted to ICE_FWD_TO_VSI_LIST.
Fixes: 9daf8208dd ("ice: Add support for switch filter programming")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After vsi setup refactor commit 6624e780a5 ("ice: split ice_vsi_setup
into smaller functions") ice_cfg_sw_lldp function which removes rx rule
directing LLDP packets to vsi is moved from ice_vsi_release to
ice_vsi_decfg function. ice_vsi_decfg is used in more cases than just in
vsi_release resulting in unnecessary removal of rx lldp packets handling
switch rule. This leads to lldp packets being dropped after a change number
of channels via ethtool.
This patch moves ice_cfg_sw_lldp function that removes rx lldp sw rule back
to ice_vsi_release function.
Fixes: 6624e780a5 ("ice: split ice_vsi_setup into smaller functions")
Reported-by: Matěj Grégr <mgregr@netx.as>
Closes: https://lore.kernel.org/intel-wired-lan/1be45a76-90af-4813-824f-8398b69745a9@netx.as/T/#u
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use previously implemented SF aux driver. It is probe during SF
activation and remove after deactivation.
Implement set/get hw_address and set/get state as basic devlink ops for
subfunction.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement add / delete vlan for subfunction type VSI.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Flow for creating Tx topology is the same as for VF port representors,
but the devlink port is stored in different place (sf->devlink_port).
When creating VF devlink lock isn't taken, when creating subfunction it
is. Setting Tx topology function needs to take this lock, check if it
was taken before to not do it twice.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Subfunction port representor needs the basic netdevice ops to work
correctly. Create them.
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Now there is another type of port representor. Correct checking if
parent device is ready to reflect also new PR type.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add check for subfunction before setting target VSI. It is needed for PF
in switchdev mode but not for subfunction (even in switchdev mode).
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement attaching and detaching SF port representor. It is done in the
same way as the VF port representor.
SF port representor is always added or removed with devlink
lock taken.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Keep the same flow of port representor creation, but instead of general
attach function create helpers for specific representor type.
Store function pointer for add and remove representor.
Type of port representor can be also known based on VSI type, but it
is more clean to have it directly saved in port representor structure.
Add devlink lock for whole port representor creation and destruction.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Configure netdevice for subfunction usecase. Mostly it is reusing ops
from the PF netdevice.
SF netdev is linked to devlink port registered after SF activation.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement subfunction driver. It is probe when subfunction port is
activated.
VSI is already created. During the probe VSI is being configured.
MAC unicast and broadcast filter is added to allow traffic to pass.
Store subfunction pointer in VSI struct. The same is done for VF
pointer. Make union of subfunction and VF pointer as only one of them
can be set with one VSI.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Allocate devlink for subfunction instance.
Create header file for subfunction device. Define subfunction device
structure there as it is needed for devlink allocation.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When subfunction VSI is open the same code as for PF VSI should be
executed. Also when up is complete. Reflect that in code by adding
subfunction VSI to consideration.
In case of stopping, PF doesn't have additional tasks, so the same
is with subfunction VSI.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement devlink port handlers responsible for ethernet type devlink
subfunctions. Create subfunction devlink port and setup all resources
needed for a subfunction netdev to operate. Configure new VSI for each
new subfunction, initialize and configure interrupts and Tx/Rx resources.
Set correct MAC filters and create new netdev.
For now, subfunction is limited to only one Tx/Rx queue pair.
Only allocate new subfunction VSI with devlink port new command.
Allocate and free subfunction MSIX interrupt vectors using new API
calls with pci_msix_alloc_irq_at and pci_msix_free_irq.
Support both automatic and manual subfunction numbers. If no subfunction
number is provided, use xa_alloc to pick a number automatically. This
will find the first free index and use that as the number. This reduces
burden on users in the simple case where a specific number is not
required. It may also be slightly faster to check that a number exists
since xarray lookup should be faster than a linear scan of the dyn_ports
xarray.
Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Make some of the netdevice_ops functions visible from outside for
another VSI type created netdev.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add required plumbing for new VSI type dedicated to devlink subfunctions.
Make sure that the vsi is properly configured and destroyed. Also allow
loading XDP and AF_XDP sockets.
The first implementation of devlink subfunctions supports only one Tx/Rx
queue pair per given subfunction.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-08-30 (igc, e1000e, i40e)
This series contains updates to igc, e1000e, and i40 drivers.
Kurt Kanzenbach adds support for MQPRIO offloads and stops unintended,
excess interrupts on igc.
Sasha adds reporting of EEE (Energy Efficient Ethernet) ability and
moves a register define to a better suited file for igc.
Vitaly stops reporting errors on shutdown and suspend as they are not
fatal for e1000e.
Alex adds reporting of EEE to i40e.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
i40e: Add Energy Efficient Ethernet ability for X710 Base-T/KR/KX cards
e1000e: avoid failing the system during pm_suspend
igc: Move the MULTI GBT AN Control Register to _regs file
igc: Add Energy Efficient Ethernet ability
igc: Get rid of spurious interrupts
igc: Add MQPRIO offload support
====================
Link: https://patch.msgid.link/20240830210451.2375215-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After XDP configuration is completed, we bring the interface up
unconditionally, regardless of its state before the call to .ndo_bpf().
Preserve the information whether the interface had to be brought down and
later bring it up only in such case.
Fixes: efc2214b60 ("ice: Add support for XDP")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Locking used in ice_qp_ena() and ice_qp_dis() does pretty much nothing,
because ICE_CFG_BUSY is a state flag that is supposed to be set in a PF
state, not VSI one. Therefore it does not protect the queue pair from
e.g. reset.
Remove ICE_CFG_BUSY locking from ice_qp_dis() and ice_qp_ena().
Fixes: 2d4238f556 ("ice: Add support for AF_XDP")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Consider the following scenario:
.ndo_bpf() | ice_prepare_for_reset() |
________________________|_______________________________________|
rtnl_lock() | |
ice_down() | |
| test_bit(ICE_VSI_DOWN) - true |
| ice_dis_vsi() returns |
ice_up() | |
| proceeds to rebuild a running VSI |
.ndo_bpf() is not the only rtnl-locked callback that toggles the interface
to apply new configuration. Another example is .set_channels().
To avoid the race condition above, act only after reading ICE_VSI_DOWN
under rtnl_lock.
Fixes: 0f9d5027a7 ("ice: Refactor VSI allocation, deletion and rebuild flow")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on
VSI without applying new ring configuration. When unconfiguring the VSI, we
can encounter the state in which there is an XDP program but no XDP rings
to destroy or there will be XDP rings that need to be destroyed, but no XDP
program to indicate their presence.
When unconfiguring, rely on the presence of XDP rings rather then XDP
program, as they better represent the current state that has to be
destroyed.
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The main threat to data consistency in ice_xdp() is a possible asynchronous
PF reset. It can be triggered by a user or by TX timeout handler.
XDP setup and PF reset code access the same resources in the following
sections:
* ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
* ice_vsi_rebuild() for the PF VSI - not protected
* ice_vsi_open() - already rtnl-locked
With an unfortunate timing, such accesses can result in a crash such as the
one below:
[ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14
[ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
[Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms
[ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001
[ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14
[ +0.394718] ice 0000:b1:00.0: PTP reset successful
[ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098
[ +0.000045] #PF: supervisor read access in kernel mode
[ +0.000023] #PF: error_code(0x0000) - not-present page
[ +0.000023] PGD 0 P4D 0
[ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1
[ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
[ +0.000036] Workqueue: ice ice_service_task [ice]
[ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]
[...]
[ +0.000013] Call Trace:
[ +0.000016] <TASK>
[ +0.000014] ? __die+0x1f/0x70
[ +0.000029] ? page_fault_oops+0x171/0x4f0
[ +0.000029] ? schedule+0x3b/0xd0
[ +0.000027] ? exc_page_fault+0x7b/0x180
[ +0.000022] ? asm_exc_page_fault+0x22/0x30
[ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]
[ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]
[ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]
[ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]
[ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
[ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
[ +0.000145] ice_rebuild+0x18c/0x840 [ice]
[ +0.000145] ? delay_tsc+0x4a/0xc0
[ +0.000022] ? delay_tsc+0x92/0xc0
[ +0.000020] ice_do_reset+0x140/0x180 [ice]
[ +0.000886] ice_service_task+0x404/0x1030 [ice]
[ +0.000824] process_one_work+0x171/0x340
[ +0.000685] worker_thread+0x277/0x3a0
[ +0.000675] ? preempt_count_add+0x6a/0xa0
[ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50
[ +0.000679] ? __pfx_worker_thread+0x10/0x10
[ +0.000653] kthread+0xf0/0x120
[ +0.000635] ? __pfx_kthread+0x10/0x10
[ +0.000616] ret_from_fork+0x2d/0x50
[ +0.000612] ? __pfx_kthread+0x10/0x10
[ +0.000604] ret_from_fork_asm+0x1b/0x30
[ +0.000604] </TASK>
The previous way of handling this through returning -EBUSY is not viable,
particularly when destroying AF_XDP socket, because the kernel proceeds
with removal anyway.
There is plenty of code between those calls and there is no need to create
a large critical section that covers all of them, same as there is no need
to protect ice_vsi_rebuild() with rtnl_lock().
Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().
Leaving unprotected sections in between would result in two states that
have to be considered:
1. when the VSI is closed, but not yet rebuild
2. when VSI is already rebuild, but not yet open
The latter case is actually already handled through !netif_running() case,
we just need to adjust flag checking a little. The former one is not as
trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of
hardware interaction happens, this can make adding/deleting rings exit
with an error. Luckily, VSI rebuild is pending and can apply new
configuration for us in a managed fashion.
Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
indicate that ice_xdp() can just hot-swap the program.
Also, as ice_vsi_rebuild() flow is touched in this patch, make it more
consistent by deconfiguring VSI when coalesce allocation fails.
Fixes: 2d4238f556 ("ice: Add support for AF_XDP")
Fixes: efc2214b60 ("ice: Add support for XDP")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is
not rtnl-locked when called from the reset. This creates the need to take
the rtnl_lock just for a single function and complicates the
synchronization with .ndo_bpf. At the same time, there no actual need to
fill napi-to-queue information at this exact point.
Fill napi-to-queue information when opening the VSI and clear it when the
VSI is being closed. Those routines are already rtnl-locked.
Also, rewrite napi-to-queue assignment in a way that prevents inclusion of
XDP queues, as this leads to out-of-bounds writes, such as one below.
[ +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0
[ +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047
[ +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2
[ +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
[ +0.000003] Call Trace:
[ +0.000003] <TASK>
[ +0.000002] dump_stack_lvl+0x60/0x80
[ +0.000007] print_report+0xce/0x630
[ +0.000007] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ +0.000007] ? __virt_addr_valid+0x1c9/0x2c0
[ +0.000005] ? netif_queue_set_napi+0x1c2/0x1e0
[ +0.000003] kasan_report+0xe9/0x120
[ +0.000004] ? netif_queue_set_napi+0x1c2/0x1e0
[ +0.000004] netif_queue_set_napi+0x1c2/0x1e0
[ +0.000005] ice_vsi_close+0x161/0x670 [ice]
[ +0.000114] ice_dis_vsi+0x22f/0x270 [ice]
[ +0.000095] ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice]
[ +0.000086] ice_prepare_for_reset+0x299/0x750 [ice]
[ +0.000087] pci_dev_save_and_disable+0x82/0xd0
[ +0.000006] pci_reset_function+0x12d/0x230
[ +0.000004] reset_store+0xa0/0x100
[ +0.000006] ? __pfx_reset_store+0x10/0x10
[ +0.000002] ? __pfx_mutex_lock+0x10/0x10
[ +0.000004] ? __check_object_size+0x4c1/0x640
[ +0.000007] kernfs_fop_write_iter+0x30b/0x4a0
[ +0.000006] vfs_write+0x5d6/0xdf0
[ +0.000005] ? fd_install+0x180/0x350
[ +0.000005] ? __pfx_vfs_write+0x10/0xA10
[ +0.000004] ? do_fcntl+0x52c/0xcd0
[ +0.000004] ? kasan_save_track+0x13/0x60
[ +0.000003] ? kasan_save_free_info+0x37/0x60
[ +0.000006] ksys_write+0xfa/0x1d0
[ +0.000003] ? __pfx_ksys_write+0x10/0x10
[ +0.000002] ? __x64_sys_fcntl+0x121/0x180
[ +0.000004] ? _raw_spin_lock+0x87/0xe0
[ +0.000005] do_syscall_64+0x80/0x170
[ +0.000007] ? _raw_spin_lock+0x87/0xe0
[ +0.000004] ? __pfx__raw_spin_lock+0x10/0x10
[ +0.000003] ? file_close_fd_locked+0x167/0x230
[ +0.000005] ? syscall_exit_to_user_mode+0x7d/0x220
[ +0.000005] ? do_syscall_64+0x8c/0x170
[ +0.000004] ? do_syscall_64+0x8c/0x170
[ +0.000003] ? do_syscall_64+0x8c/0x170
[ +0.000003] ? fput+0x1a/0x2c0
[ +0.000004] ? filp_close+0x19/0x30
[ +0.000004] ? do_dup2+0x25a/0x4c0
[ +0.000004] ? __x64_sys_dup2+0x6e/0x2e0
[ +0.000002] ? syscall_exit_to_user_mode+0x7d/0x220
[ +0.000004] ? do_syscall_64+0x8c/0x170
[ +0.000003] ? __count_memcg_events+0x113/0x380
[ +0.000005] ? handle_mm_fault+0x136/0x820
[ +0.000005] ? do_user_addr_fault+0x444/0xa80
[ +0.000004] ? clear_bhb_loop+0x25/0x80
[ +0.000004] ? clear_bhb_loop+0x25/0x80
[ +0.000002] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000005] RIP: 0033:0x7f2033593154
Fixes: 080b0c8d6d ("ice: Fix ASSERT_RTNL() warning during certain scenarios")
Fixes: 91fdbce7e8 ("ice: Add support in the driver for associating queue with napi")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Ability to handle maximum FCoE frames of 2158 bytes can never be changed
and thus more of an attribute, not a toggleable feature.
Move it from netdev_features_t to "cold" priv flags (bitfield bool) and
free yet another feature bit.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Call rtnl_unlock() on this error path, before returning.
Fixes: bc23aa949a ("igc: Add pcie error handler support")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add "EEE: Enabled/Disabled" to dmesg for supported X710 Base-T/KR/KX
cards. According to the IEEE standard report the EEE ability and the
EEE Link Partner ability. Use the kernel's 'ethtool_keee' structure
and report EEE link modes.
Example:
dmesg | grep 'NIC Link is'
ethtool --show-eee <device>
Before:
NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Supported EEE link modes: Not reported
Advertised EEE link modes: Not reported
Link partner advertised EEE link modes: Not reported
After:
NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None, EEE: Enabled
Supported EEE link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Advertised EEE link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Link partner advertised EEE link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Occasionally when the system goes into pm_suspend, the suspend might fail
due to a PHY access error on the network adapter. Previously, this would
have caused the whole system to fail to go to a low power state.
An example of this was reported in the following Bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=205015
[ 1663.694828] e1000e 0000:00:19.0 eth0: Failed to disable ULP
[ 1664.731040] asix 2-3:1.0 eth1: link up, 100Mbps, full-duplex, lpa 0xC1E1
[ 1665.093513] e1000e 0000:00:19.0 eth0: Hardware Error
[ 1665.596760] e1000e 0000:00:19.0: pci_pm_resume+0x0/0x80 returned 0 after 2975399 usecs
and then the system never recovers from it, and all the following suspend failed due to this
[22909.393854] PM: pci_pm_suspend(): e1000e_pm_suspend+0x0/0x760 [e1000e] returns -2
[22909.393858] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x160 returns -2
[22909.393861] PM: Device 0000:00:1f.6 failed to suspend async: error -2
This can be avoided by changing the return values of __e1000_shutdown and
e1000e_pm_suspend functions so that they always return 0 (success). This
is consistent with what other drivers do.
If the e1000e driver encounters a hardware error during suspend, potential
side effects include slightly higher power draw or non-working wake on
LAN. This is preferred to a system-level suspend failure, and a warning
message is written to the system log, so that the user can be aware that
the LAN controller experienced a problem during suspend.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=205015
Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
MULTI GBT AN Control Register is IEEE Standard Register 7.32 (not a mask).
The right place should be in igc_reg.h file. In accordance with the
registers naming convention added IGC_' prefix.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
According to the IEEE standard report the EEE ability (registers 7.60 and
7.62) and the EEE Link Partner ability (registers 7.61 and 7.63). Use the
kernel's 'ethtool_keee' structure and report EEE link modes.
Example:
ethtool --show-eee <device>
Before:
Advertised EEE link modes: Not reported
Link partner advertised EEE link modes: Not reported
After:
Advertised EEE link modes: 100baseT/Full
1000baseT/Full
2500baseT/Full
Link partner advertised EEE link modes: 100baseT/Full
1000baseT/Full
2500baseT/Full
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When running the igc with XDP/ZC in busy polling mode with deferral of hard
interrupts, interrupts still happen from time to time. That is caused by
the igc task watchdog which triggers Rx interrupts periodically.
That mechanism has been introduced to overcome skb/memory allocation
failures [1]. So the Rx clean functions stop processing the Rx ring in case
of such failure. The task watchdog triggers Rx interrupts periodically in
the hope that memory became available in the mean time.
The current behavior is undesirable for real time applications, because the
driver induced Rx interrupts trigger also the softirq processing. However,
all real time packets should be processed by the application which uses the
busy polling method.
Therefore, only trigger the Rx interrupts in case of real allocation
failures. Introduce a new flag for signaling that condition.
[1] - https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=3be507547e6177e5c808544bd6a2efa2c7f1d436
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add support for offloading MQPRIO. The hardware has four priorities as well
as four queues. Each queue must be a assigned with a unique priority.
However, the priorities are only considered in TSN Tx mode. There are two
TSN Tx modes. In case of MQPRIO the Qbv capability is not required.
Therefore, use the legacy TSN Tx mode, which performs strict priority
arbitration.
Example for mqprio with hardware offload:
|tc qdisc replace dev ${INTERFACE} handle 100 parent root mqprio num_tc 4 \
| map 0 0 0 0 0 1 2 3 0 0 0 0 0 0 0 0 \
| queues 1@0 1@1 1@2 1@3 \
| hw 1
The mqprio Qdisc also allows to configure the `preemptible_tcs'. However,
frame preemption is not supported yet.
Tested on Intel i225 and implemented by following data sheet section 7.5.2,
Transmit Scheduling.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Ethtool callbacks can be executed while reset is in progress and try to
access deleted resources, e.g. getting coalesce settings can result in a
NULL pointer dereference seen below.
Reproduction steps:
Once the driver is fully initialized, trigger reset:
# echo 1 > /sys/class/net/<interface>/device/reset
when reset is in progress try to get coalesce settings using ethtool:
# ethtool -c <interface>
BUG: kernel NULL pointer dereference, address: 0000000000000020
PGD 0 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 11 PID: 19713 Comm: ethtool Tainted: G S 6.10.0-rc7+ #7
RIP: 0010:ice_get_q_coalesce+0x2e/0xa0 [ice]
RSP: 0018:ffffbab1e9bcf6a8 EFLAGS: 00010206
RAX: 000000000000000c RBX: ffff94512305b028 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9451c3f2e588 RDI: ffff9451c3f2e588
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff9451c3f2e580 R11: 000000000000001f R12: ffff945121fa9000
R13: ffffbab1e9bcf760 R14: 0000000000000013 R15: ffffffff9e65dd40
FS: 00007faee5fbe740(0000) GS:ffff94546fd80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000020 CR3: 0000000106c2e005 CR4: 00000000001706f0
Call Trace:
<TASK>
ice_get_coalesce+0x17/0x30 [ice]
coalesce_prepare_data+0x61/0x80
ethnl_default_doit+0xde/0x340
genl_family_rcv_msg_doit+0xf2/0x150
genl_rcv_msg+0x1b3/0x2c0
netlink_rcv_skb+0x5b/0x110
genl_rcv+0x28/0x40
netlink_unicast+0x19c/0x290
netlink_sendmsg+0x222/0x490
__sys_sendto+0x1df/0x1f0
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x82/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7faee60d8e27
Calling netif_device_detach() before reset makes the net core not call
the driver when ethtool command is issued, the attempt to execute an
ethtool command during reset will result in the following message:
netlink error: No such device
instead of NULL pointer dereference. Once reset is done and
ice_rebuild() is executing, the netif_device_attach() is called to allow
for ethtool operations to occur again in a safe manner.
Fixes: fcea6f3da5 ("ice: Add stats and ethtool support")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
82580 NICs have a hardware bug that makes it
necessary to write into the TSICR (TimeSync Interrupt Cause) register
to clear it:
https://lore.kernel.org/all/CDCB8BE0.1EC2C%25matthew.vick@intel.com/
Add a conditional so only for 82580 we write into the TSICR register,
so we don't risk losing events for other models.
Without this change, when running ptp4l with an Intel 82580 card,
I get the following output:
> timed out while polling for tx timestamp increasing tx_timestamp_timeout or
> increasing kworker priority may correct this issue, but a driver bug likely
> causes it
This goes away with this change.
This (partially) reverts commit ee14cc9ea1 ("igb: Fix missing time sync events").
Fixes: ee14cc9ea1 ("igb: Fix missing time sync events")
Closes: https://lore.kernel.org/intel-wired-lan/CAN0jFd1kO0MMtOh8N2Ztxn6f7vvDKp2h507sMryobkBKe=xk=w@mail.gmail.com/
Tested-by: Daiwei Li <daiweili@google.com>
Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Daiwei Li <daiweili@google.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-08-26 (ice)
This series contains updates to ice driver only.
Jake implements and uses rd32_poll_timeout to replace a jiffies loop for
calling ice_sq_done. The rd32_poll_timeout() function is designed to allow
simplifying other places in the driver where we need to read a register
until it matches a known value.
Jake, Bruce, and Przemek update ice_debug_cq() to be more robust, and more
useful for tracing control queue messages sent and received by the device
driver.
Jake rewords several commands in the ice_control.c file which previously
referred to the "Admin queue" when they were actually generic functions
usable on any control queue.
Jake removes the unused and unnecessary cmd_buf array allocation for send
queues. This logic originally was going to be useful if we ever implemented
asynchronous completion of transmit messages. This support is unlikely to
materialize, so the overhead of allocating a command buffer is unnecessary.
Sergey improves the log messages when the ice driver reports that the NVM
version on the device is not supported by the driver. Now, these messages
include both the discovered NVM version and the requested/expected NVM
version.
Aleksandr Mishin corrects overallocation of memory related to adding
scheduler nodes.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Adjust over allocation of memory in ice_sched_add_root_node() and ice_sched_add_node()
ice: Report NVM version numbers on mismatch during load
ice: remove unnecessary control queue cmd_buf arrays
ice: reword comments referring to control queues
ice: stop intermixing AQ commands/responses debug dumps
ice: do not clutter debug logs with unused data
ice: improve debug print for control queue messages
ice: implement and use rd32_poll_timeout for ice_sq_done timeout
====================
Link: https://patch.msgid.link/20240826224655.133847-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow the user to get and set configuration of Embedded SYNC feature
on the ice driver dpll pins.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20240822222513.255179-3-arkadiusz.kubalewski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In ice_sched_add_root_node() and ice_sched_add_node() there are calls to
devm_kcalloc() in order to allocate memory for array of pointers to
'ice_sched_node' structure. But incorrect types are used as sizeof()
arguments in these calls (structures instead of pointers) which leads to
over allocation of memory.
Adjust over allocation of memory by correcting types in devm_kcalloc()
sizeof() arguments.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Report NVM version numbers (both detected and expected) when a mismatch b/w
driver and firmware is detected. This provides more useful information
about which NVM version the driver expects, rather than requiring manual
code inspection.
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The driver allocates a cmd_buf array in addition to the desc_buf array.
This array stores an ice_sq_cd command details structure for each entry in
the control queue ring.
The contents of the structure are copied from the value passed in via
ice_sq_send_cmd, and include only a pointer to storage for the write back
descriptor contents.
Originally this array was intended to support asynchronous completion
including features such as a callback function. This support was never
implemented. All that exists today is needless copying and resetting of a
cmd_buf array that is otherwise functionally unused.
Since we do not plan to implement asynchronous completions, drop this
unnecessary memory and logic. This saves memory for each control queue, and
avoids the pointless copying and memset.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Many comments in ice_controlq.c use the term "Admin queue" despite the code
being intended for arbitrary control queues, not just the Admin queue.
Reword the comments to make it clear that this code is the generic control
queue logic that is shared by all of the control queues, and is not
specific to the Admin queue.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_debug_cq() function is called to generate a debug log of control
queue messages both sent and received. It currently does this over a
potential total of 6 different printk invocations.
The main logic prints over 4 calls to ice_debug():
1. The metadata including opcode, flags, datalength and return value.
2. The cookie in the descriptor.
3. The parameter values.
4. The address for the databuffer.
In addition, if the descriptor has a data buffer, it can be logged with two
additional prints:
5. A message indicating the start of the data buffer.
6. The actual data buffer, printed using print_hex_dump_debug.
This can lead to trouble in the event that two different PFs are logging
messages. The messages become intermixed and it may not be possible to
determine which part of the output belongs to which control queue message.
To fix this, it needs to be possible to unambiguously determine which
messages belong together. This is trivial for the messages that comprise
the main printing. Combine them together into a single invocation of
ice_debug().
The message containing a hex-dump of the data buffer is a bit more
complicated. This is printed separately as part of print_hex_dump_debug.
This function takes a prefix, which is currently always set to
KBUILD_MODNAME. Extend this prefix to include the buffer address for the
databuffer, which is printed as part of the main print, and which is
guaranteed to be unique for each buffer.
Refactor the ice_debug_array(), introducing an ice_debug_array_w_prefix().
Build the prefix by combining KBUILD_MODNAME with the databuffer address
using snprintf().
These changes make it possible to unambiguously determine what data belongs
to what control queue message.
Reported-by: Jacek Wierzbicki <jacek.wierzbicki@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently, debug logs are unnecessarily cluttered with the contents of
command data buffers even if the receiver of that command (i.e. FW or MBX)
are not told to read the buffer. Change to only log command data buffers
when the RD flag (indicates receiver needs to read the buffer) is set.
Continue to log response data buffer when the returned datalen is non-zero.
Also, rename a local variable to reflect what is in the hardware
specification and how it is used elsewhere in the code, use local variables
instead of duplicating endian conversions unnecessarily and remove an
unnecessary assignment.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_debug_cq function is called to print debug data for a control queue
descriptor in multiple places. This includes both before we send a message
on a transmit queue, after the writeback completion of a message on the
transmit queue, and when we receive a message on a receive queue.
This function does not include data about *which* control queue the message
is on, nor whether it was what we sent to the queue or what we received
from the queue.
Modify ice_debug_cq to take two extra parameters, a pointer to the control
queue and a boolean indicating if this was a response or a command. Improve
the debug messages by replacing "CQ CMD" with a string indicating which
specific control queue (based on cq->qtype) and whether this was a command
sent by the PF or a response from the queue.
This helps make the log output easier to understand and consume when
debugging.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_sq_done function is used to check the control queue head register
and determine whether or not the control queue processing is done. This
function is called in a loop checking against jiffies for a specified
timeout.
The pattern of reading a register in a loop until a condition is true or a
timeout is reached is a relatively common pattern. In fact, the kernel
provides a read_poll_timeout function implementing this behavior in
<linux/iopoll.h>
Use of read_poll_timeout is preferred over directly coding these loops.
However, using it in the ice driver is a bit more difficult because of the
rd32 wrapper. Implement a rd32_poll_timeout wrapper based on
read_poll_timeout.
Refactor ice_sq_done to use rd32_poll_timeout, replacing the loop calling
ice_sq_done in ice_sq_send_cmd. This simplifies the logic down to a single
ice_sq_done() call.
The implementation of rd32_poll_timeout uses microseconds for its timeout
value, so update the CQ timeout macros used to be specified in microseconds
units as well instead of using HZ for jiffies.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/broadcom/bnxt/bnxt.h
c948c0973d ("bnxt_en: Don't clear ntuple filters and rss contexts during ethtool ops")
f2878cdeb7 ("bnxt_en: Add support to call FW to update a VNIC")
Link: https://patch.msgid.link/20240822210125.1542769-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-08-20 (ice)
This series contains updates to ice driver only.
Maciej fixes issues with Rx data path on architectures with
PAGE_SIZE >= 8192; correcting page reuse usage and calculations for
last offset and truesize.
Michal corrects assignment of devlink port number to use PF id.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: use internal pf id instead of function number
ice: fix truesize operations for PAGE_SIZE >= 8192
ice: fix ICE_LAST_OFFSET formula
ice: fix page reuse when PAGE_SIZE is over 8k
====================
Link: https://patch.msgid.link/20240820215620.1245310-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
BIT() is unsigned long but ->pu.flg_msk and ->pu.flg_val are u64 type.
On 32 bit systems, unsigned long is a u32 and the mismatch between u32
and u64 will break things for the high 32 bits.
Fixes: 9a4c07aaa0 ("ice: add parser execution main loop")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/ddc231a8-89c1-4ff4-8704-9198bcb41f8d@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina reports that the igb driver does not cope well with large
MAX_SKB_FRAG values: setting MAX_SKB_FRAG to 45 causes payload
corruption on TX.
An easy reproducer is to run ssh to connect to the machine. With
MAX_SKB_FRAGS=17 it works, with MAX_SKB_FRAGS=45 it fails. This has
been reported originally in
https://bugzilla.redhat.com/show_bug.cgi?id=2265320
The root cause of the issue is that the driver does not take into
account properly the (possibly large) shared info size when selecting
the ring layout, and will try to fit two packets inside the same 4K
page even when the 1st fraglist will trump over the 2nd head.
Address the issue by checking if 2K buffers are insufficient.
Fixes: 3948b05950 ("net: introduce a config option to tweak MAX_SKB_FRAGS")
Reported-by: Jan Tluka <jtluka@redhat.com>
Reported-by: Jirka Hladky <jhladky@redhat.com>
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Link: https://patch.msgid.link/20240816152034.1453285-1-vinschen@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use always the same pf id in devlink port number. When doing
pass-through the PF to VM bus info func number can be any value.
Fixes: 2ae0aa4758 ("ice: Move devlink port to PF/VF struct")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When working on multi-buffer packet on arch that has PAGE_SIZE >= 8192,
truesize is calculated and stored in xdp_buff::frame_sz per each
processed Rx buffer. This means that frame_sz will contain the truesize
based on last received buffer, but commit 1dc1a7e7f4 ("ice:
Centrallize Rx buffer recycling") assumed this value will be constant
for each buffer, which breaks the page recycling scheme and mess up the
way we update the page::page_offset.
To fix this, let us work on constant truesize when PAGE_SIZE >= 8192
instead of basing this on size of a packet read from Rx descriptor. This
way we can simplify the code and avoid calculating truesize per each
received frame and on top of that when using
xdp_update_skb_shared_info(), current formula for truesize update will
be valid.
This means ice_rx_frame_truesize() can be removed altogether.
Furthermore, first call to it within ice_clean_rx_irq() for 4k PAGE_SIZE
was redundant as xdp_buff::frame_sz is initialized via xdp_init_buff()
in ice_vsi_cfg_rxq(). This should have been removed at the point where
xdp_buff struct started to be a member of ice_rx_ring and it was no
longer a stack based variable.
There are two fixes tags as my understanding is that the first one
exposed us to broken truesize and page_offset handling and then second
introduced broken skb_shared_info update in ice_{construct,build}_skb().
Reported-and-tested-by: Luiz Capitulino <luizcap@redhat.com>
Closes: https://lore.kernel.org/netdev/8f9e2a5c-fd30-4206-9311-946a06d031bb@redhat.com/
Fixes: 1dc1a7e7f4 ("ice: Centrallize Rx buffer recycling")
Fixes: 2fba7dc515 ("ice: Add support for XDP multi-buffer on Rx side")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
For bigger PAGE_SIZE archs, ice driver works on 3k Rx buffers.
Therefore, ICE_LAST_OFFSET should take into account ICE_RXBUF_3072, not
ICE_RXBUF_2048.
Fixes: 7237f5b0db ("ice: introduce legacy Rx flag")
Suggested-by: Luiz Capitulino <luizcap@redhat.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Architectures that have PAGE_SIZE >= 8192 such as arm64 should act the
same as x86 currently, meaning reuse of a page should only take place
when no one else is busy with it.
Do two things independently of underlying PAGE_SIZE:
- store the page count under ice_rx_buf::pgcnt
- then act upon its value vs ice_rx_buf::pagecnt_bias when making the
decision regarding page reuse
Fixes: 2b245cb294 ("ice: Implement transmit and NAPI support")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tony Nguyen says:
====================
ice: iavf: add support for TC U32 filters on VFs
Ahmed Zaki says:
The Intel Ethernet 800 Series is designed with a pipeline that has
an on-chip programmable capability called Dynamic Device Personalization
(DDP). A DDP package is loaded by the driver during probe time. The DDP
package programs functionality in both the parser and switching blocks in
the pipeline, allowing dynamic support for new and existing protocols.
Once the pipeline is configured, the driver can identify the protocol and
apply any HW action in different stages, for example, direct packets to
desired hardware queues (flow director), queue groups or drop.
Patches 1-8 introduce a DDP package parser API that enables different
pipeline stages in the driver to learn the HW parser capabilities from
the DDP package that is downloaded to HW. The parser library takes raw
packet patterns and masks (in binary) indicating the packet protocol fields
to be matched and generates the final HW profiles that can be applied at
the required stage. With this API, raw flow filtering for FDIR or RSS
could be done on new protocols or headers without any driver or Kernel
updates (only need to update the DDP package). These patches were submitted
before [1] but were not accepted mainly due to lack of a user.
Patches 9-11 extend the virtchnl support to allow the VF to request raw
flow director filters. Upon receiving the raw FDIR filter request, the PF
driver allocates and runs a parser lib instance and generates the hardware
profile definitions required to program the FDIR stage. These were also
submitted before [2].
Finally, patches 12 and 13 add TC U32 filter support to the iavf driver.
Using the parser API, the ice driver runs the raw patterns sent by the
user and then adds a new profile to the FDIR stage associated with the VF's
VSI. Refer to examples in patch 13 commit message.
[1]: https://lore.kernel.org/netdev/20230904021455.3944605-1-junfeng.guo@intel.com/
[2]: https://lore.kernel.org/intel-wired-lan/20230818064703.154183-1-junfeng.guo@intel.com/
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
iavf: add support for offloading tc U32 cls filters
iavf: refactor add/del FDIR filters
ice: enable FDIR filters from raw binary patterns for VFs
ice: add method to disable FDIR SWAP option
virtchnl: support raw packet in protocol header
ice: add API for parser profile initialization
ice: add UDP tunnels support to the parser
ice: support turning on/off the parser's double vlan mode
ice: add parser execution main loop
ice: add parser internal helper functions
ice: add debugging functions for the parser sections
ice: parse and init various DDP parser sections
ice: add parser create and destroy skeleton
====================
Link: https://patch.msgid.link/20240813222249.3708070-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
'req_vec_chunks' is used to store the vector info received
from the device control plane. The memory for it is allocated
in idpf_send_alloc_vectors_msg and returns an error if the memory
allocation fails.
'req_vec_chunks' cannot be NULL in the later code flow. So remove
the conditional check to extract the vector ids received from
the device control plane.
Smatch static checker warning:
drivers/net/ethernet/intel/idpf/idpf_lib.c:417 idpf_intr_req()
error: we previously assumed 'adapter->req_vec_chunks'
could be null (see line 360)
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/intel-wired-lan/a355ae8a-9011-4a85-a4d1-5b2793bb5f7b@stanley.mountain/
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240814175903.4166390-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for offloading cls U32 filters. Only "skbedit queue_mapping"
and "drop" actions are supported. Also, only "ip" and "802_3" tc
protocols are allowed. The PF must advertise the VIRTCHNL_VF_OFFLOAD_TC_U32
capability flag.
Since the filters will be enabled via the FD stage at the PF, a new type
of FDIR filters is added and the existing list and state machine are used.
The new filters can be used to configure flow directors based on raw
(binary) pattern in the rx packet.
Examples:
0. # tc qdisc add dev enp175s0v0 ingress
1. Redirect UDP from src IP 192.168.2.1 to queue 12:
# tc filter add dev <dev> protocol ip ingress u32 \
match u32 0x45000000 0xff000000 at 0 \
match u32 0x00110000 0x00ff0000 at 8 \
match u32 0xC0A80201 0xffffffff at 12 \
match u32 0x00000000 0x00000000 at 24 \
action skbedit queue_mapping 12 skip_sw
2. Drop all ICMP:
# tc filter add dev <dev> protocol ip ingress u32 \
match u32 0x45000000 0xff000000 at 0 \
match u32 0x00010000 0x00ff0000 at 8 \
match u32 0x00000000 0x00000000 at 24 \
action drop skip_sw
3. Redirect ICMP traffic from MAC 3c:fd:fe:a5:47:e0 to queue 7
(note proto: 802_3):
# tc filter add dev <dev> protocol 802_3 ingress u32 \
match u32 0x00003CFD 0x0000ffff at 4 \
match u32 0xFEA547E0 0xffffffff at 8 \
match u32 0x08004500 0xffffff00 at 12 \
match u32 0x00000001 0x000000ff at 20 \
match u32 0x0000 0x0000 at 40 \
action skbedit queue_mapping 7 skip_sw
Notes on matches:
1 - All intermediate fields that are needed to parse the correct PTYPE
must be provided (in e.g. 3: Ethernet Type 0x0800 in MAC, IP version
and IP length: 0x45 and protocol: 0x01 (ICMP)).
2 - The last match must provide an offset that guarantees all required
headers are accounted for, even if the last header is not matched.
For example, in #2, the last match is 4 bytes at offset 24 starting
from IP header, so the total is 14 (MAC) + 24 + 4 = 42, which is the
sum of MAC+IP+ICMP headers.
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In preparation for a second type of FDIR filters that can be added by
tc-u32, move the add/del of the FDIR logic to be entirely contained in
iavf_fdir.c.
The iavf_find_fdir_fltr_by_loc() is renamed to iavf_find_fdir_fltr()
to be more agnostic to the filter ID parameter (for now @loc, which is
relevant only to current FDIR filters added via ethtool).
The FDIR filter deletion is moved from iavf_del_fdir_ethtool() in
ethtool.c to iavf_fdir_del_fltr(). While at it, fix a minor bug where
the "fltr" is accessed out of the fdir_fltr_lock spinlock protection.
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Enable VFs to create FDIR filters from raw binary patterns.
The corresponding processes for raw flow are added in the
Parse / Create / Destroy stages.
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The SWAP Flag in the FDIR Programming Descriptor doesn't work properly,
it is always set and cannot be unset (hardware bug). Thus, add a method
to effectively disable the FDIR SWAP option by setting the FDSWAP instead
of FDINSET registers.
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add API ice_parser_profile_init() to init a parser profile based on
a parser result and a mask buffer. The ice_parser_profile struct is used
by the low level FXP engine to create HW profile/field vectors.
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add support for the vxlan, geneve, ecpri UDP tunnels through the
following APIs:
- ice_parser_vxlan_tunnel_set()
- ice_parser_geneve_tunnel_set()
- ice_parser_ecpri_tunnel_set()
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add API ice_parser_dvm_set() to support turning on/off the parser's double
vlan mode.
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add the following internal helper functions:
- ice_bst_tcam_match():
to perform ternary match on boost TCAM.
- ice_pg_cam_match():
to perform parse graph key match in cam table.
- ice_pg_nm_cam_match():
to perform parse graph key no match in cam table.
- ice_ptype_mk_tcam_match():
to perform ptype markers match in tcam table.
- ice_flg_redirect():
to redirect parser flags to packet flags.
- ice_xlt_kb_flag_get():
to aggregate 64 bit packet flag into 16 bit key builder flags.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Parse the following DDP sections:
- ICE_SID_RXPARSER_IMEM into an array of struct ice_imem_item
- ICE_SID_RXPARSER_METADATA_INIT into an array of struct ice_metainit_item
- ICE_SID_RXPARSER_CAM or ICE_SID_RXPARSER_PG_SPILL into an array of
struct ice_pg_cam_item
- ICE_SID_RXPARSER_NOMATCH_CAM or ICE_SID_RXPARSER_NOMATCH_SPILL into an
array of struct ice_pg_nm_cam_item
- ICE_SID_RXPARSER_CAM into an array of ice_bst_tcam_item
- ICE_SID_LBL_RXPARSER_TMEM into an array of ice_lbl_item
- ICE_SID_RXPARSER_MARKER_PTYPE into an array of ice_ptype_mk_tcam_item
- ICE_SID_RXPARSER_MARKER_GRP into an array of ice_mk_grp_item
- ICE_SID_RXPARSER_PROTO_GRP into an array of ice_proto_grp_item
- ICE_SID_RXPARSER_FLAG_REDIR into an array of ice_flg_rd_item
- ICE_SID_XLT_KEY_BUILDER_SW, ICE_SID_XLT_KEY_BUILDER_ACL,
ICE_SID_XLT_KEY_BUILDER_FD and ICE_SID_XLT_KEY_BUILDER_RSS into
struct ice_xlt_kb
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add new parser module which can parse a packet in binary and generate
information like ptype, protocol/offset pairs and flags which can be later
used to feed the FXP profile creation directly.
Add skeleton of the create and destroy APIs:
ice_parser_create()
ice_parser_destroy()
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
marvell/otx2 and mvpp2 do not support setting different
keys for different RSS contexts. Contexts have separate
indirection tables but key is shared with all other contexts.
This is likely fine, indirection table is the most important
piece.
Don't report the key-related parameters from such drivers.
This prevents driver-errors, e.g. otx2 always writes
the main key, even when user asks to change per-context key.
The second reason is that without this change tracking
the keys by the core gets complicated. Even if the driver
correctly reject setting key with rss_context != 0,
change of the main key would have to be reflected in
the XArray for all additional contexts.
Since the additional contexts don't have their own keys
not including the attributes (in Netlink speak) seems
intuitive. ethtool CLI seems to deal with it just fine.
Having to set the flag in majority of the drivers is
a bit tedious but not reporting the key is a safer
default.
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ueue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-08-07 (igc)
This series contains updates to igc driver only.
Faizal adjusts the size of the MAC internal buffer on i226 devices to
resolve an errata for leaking packet transmits. He also corrects a
condition in which qbv_config_change_errors are incorrectly counted.
Lastly, he adjusts the conditions for resetting the adapter when
changing TSN Tx mode and corrects the conditions in which gtxoffset
register is set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>