Tony Nguyen says:
====================
Add RDMA support for Intel IPU E2000 in idpf
Tatyana Nikolova says:
This idpf patch series is the second part of the staged submission for
introducing RDMA RoCEv2 support for the IPU E2000 line of products,
referred to as GEN3.
To support RDMA GEN3 devices, the idpf driver uses common definitions
of the IIDC interface and implements specific device functionality in
iidc_rdma_idpf.h.
The IPU model can host one or more logical network endpoints called
vPorts per PCI function that are flexibly associated with a physical
port or an internal communication port.
Other features as it pertains to GEN3 devices include:
* MMIO learning
* RDMA capability negotiation
* RDMA vectors discovery between idpf and control plane
These patches are split from the submission "Add RDMA support for Intel
IPU E2000 (GEN3)" [1]. The patches have been tested on a range of hosts
and platforms with a variety of general RDMA applications which include
standalone verbs (rping, perftest, etc.), storage and HPC applications.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
[1] https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/
This idpf patch series is the second part of the staged submission for
introducing RDMA RoCEv2 support for the IPU E2000 line of products,
referred to as GEN3.
To support RDMA GEN3 devices, the idpf driver uses common definitions
of the IIDC interface and implements specific device functionality in
iidc_rdma_idpf.h.
The IPU model can host one or more logical network endpoints called
vPorts per PCI function that are flexibly associated with a physical
port or an internal communication port.
Other features as it pertains to GEN3 devices include:
* MMIO learning
* RDMA capability negotiation
* RDMA vectors discovery between idpf and control plane
These patches are split from the submission "Add RDMA support for Intel
IPU E2000 (GEN3)" [1]. The patches have been tested on a range of hosts
and platforms with a variety of general RDMA applications which include
standalone verbs (rping, perftest, etc.), storage and HPC applications.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
[1] https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/
IWL reviews:
v3: https://lore.kernel.org/all/20250708210554.1662-1-tatyana.e.nikolova@intel.com/
v2: https://lore.kernel.org/all/20250612220002.1120-1-tatyana.e.nikolova@intel.com/
v1 (split from previous series):
https://lore.kernel.org/all/20250523170435.668-1-tatyana.e.nikolova@intel.com/
v3: https://lore.kernel.org/all/20250207194931.1569-1-tatyana.e.nikolova@intel.com/
RFC v2: https://lore.kernel.org/all/20240824031924.421-1-tatyana.e.nikolova@intel.com/
RFC: https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux:
idpf: implement get LAN MMIO memory regions
idpf: implement IDC vport aux driver MTU change handler
idpf: implement remaining IDC RDMA core callbacks and handlers
idpf: implement RDMA vport auxiliary dev create, init, and destroy
idpf: implement core RDMA auxiliary dev create, init, and destroy
idpf: use reserved RDMA vectors from control plane
====================
Link: https://patch.msgid.link/20250714181002.2865694-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The RDMA driver needs to map its own MMIO regions for the sake of
performance, meaning the IDPF needs to avoid mapping portions of the BAR
space. However, to be HW agnostic, the IDPF cannot assume where
these are and must avoid mapping hard coded regions as much as possible.
The IDPF maps the bare minimum to load and communicate with the
control plane, i.e., the mailbox registers and the reset state
registers. Because of how and when mailbox register offsets are
initialized, it is easier to adjust the existing defines to be relative
to the mailbox region starting address. Use a specific mailbox register
write function that uses these relative offsets. The reset state
register addresses are calculated the same way as for other registers,
described below.
The IDPF then calls a new virtchnl op to fetch a list of MMIO regions
that it should map. The addresses for the registers in these regions are
calculated by determining what region the register resides in, adjusting
the offset to be relative to that region, and then adding the
register's offset to that region's mapped address.
If the new virtchnl op is not supported, the IDPF will fallback to
mapping the whole bar. However, it will still map them as separate
regions outside the mailbox and reset state registers. This way we can
use the same logic in both cases to access the MMIO space.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The only event an RDMA vport aux driver cares about right now is an MTU
change on its underlying vport. Implement and plumb the handler to
signal the pre MTU change event and post MTU change events to the RDMA
vport aux driver.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement the idpf_idc_request_reset and idpf_idc_rdma_vc_send_sync
callbacks for the rdma core auxiliary driver to issue reset events to
the idpf and send (synchronous) virtchnl messages to the control plane
respectively.
Implement and plumb the reset handler for the opposite flow as well,
i.e. when the idpf is resetiing and needs to notify the rdma core
auxiliary driver.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement the functions to create, initialize, and destroy an RDMA vport
auxiliary device. The vport aux dev creation is dependent on the
core aux device to call idpf_idc_vport_dev_ctrl to signal that it is
ready for vport aux devices. Implement that core callback to either
create and initialize the vport aux dev or deinitialize.
RDMA vport aux dev creation is also dependent on the control plane to
tell us the vport is RDMA enabled. Add a flag in the create vport
message to signal individual vport RDMA capabilities.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add the initial idpf_idc.c file with the functions to kick off the IDC
initialization, create and initialize a core RDMA auxiliary device, and
destroy said device.
The RDMA core has a dependency on the vports being created by the
control plane before it can be initialized. Therefore, once all the
vports are up after a hard reset (either during driver load a function
level reset), the core RDMA device info will be created. It is populated
with the function type (as distinguished by the IDC initialization
function pointer), the core idc_ops function points (just stubs for
now), the reserved RDMA MSIX table, and various other info the core RDMA
auxiliary driver will need. It is then plugged on to the bus.
During a function level reset or driver unload, the device will be
unplugged from the bus and destroyed.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fetch the number of reserved RDMA vectors from the control plane.
Adjust the number of reserved LAN vectors if necessary. Adjust the
minimum number of vectors the OS should reserve to include RDMA; and
fail if the OS cannot reserve enough vectors for the minimum number of
LAN and RDMA vectors required. Create a separate msix table for the
reserved RDMA vectors, which will just get handed off to the RDMA core
device to do with what it will.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Back when the libeth Rx core was initially written, devmem was a draft
and netmem_ref didn't exist in the mainline. Now that it's here, make
libeth MP-agnostic before introducing any new code or any new library
users.
When it's known that the created PP/FQ is for header buffers, use faster
"unsafe" underscored netmem <--> virt accessors as netmem_is_net_iov()
is always false in that case, but consumes some cycles (bit test +
true branch).
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Mailbox operations are not possible while the driver is in reset.
Operations that require MBX exchange with the control plane will result
in long delays if executed while a reset is in progress:
ethtool -L <inf> combined 8& echo 1 > /sys/class/net/<inf>/device/reset
idpf 0000:83:00.0: HW reset detected
idpf 0000:83:00.0: Device HW Reset initiated
idpf 0000:83:00.0: Transaction timed-out (op:504 cookie:be00 vc_op:504 salt:be timeout:2000ms)
idpf 0000:83:00.0: Transaction timed-out (op:508 cookie:bf00 vc_op:508 salt:bf timeout:2000ms)
idpf 0000:83:00.0: Transaction timed-out (op:512 cookie:c000 vc_op:512 salt:c0 timeout:2000ms)
idpf 0000:83:00.0: Transaction timed-out (op:510 cookie:c100 vc_op:510 salt:c1 timeout:2000ms)
idpf 0000:83:00.0: Transaction timed-out (op:509 cookie:c200 vc_op:509 salt:c2 timeout:60000ms)
idpf 0000:83:00.0: Transaction timed-out (op:509 cookie:c300 vc_op:509 salt:c3 timeout:60000ms)
idpf 0000:83:00.0: Transaction timed-out (op:505 cookie:c400 vc_op:505 salt:c4 timeout:60000ms)
idpf 0000:83:00.0: Failed to configure queues for vport 0, -62
Disable mailbox communication in case of a reset, unless it's done during
a driver load, where the virtchnl operations are needed to configure the
device.
Fixes: 8077c72756 ("idpf: add controlq init and reset checks")
Co-developed-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add a helper function to correctly handle the lockless
synchronization when the sender needs to block. The paradigm is
if (no_resources()) {
stop_queue();
barrier();
if (!no_resources())
restart_queue();
}
netif_subqueue_maybe_stop already handles the paradigm correctly, but
the code split the check for resources in three parts, the first one
(descriptors) followed the protocol, but the other two (completions and
tx_buf) were only doing the first part and so race prone.
Luckily netif_subqueue_maybe_stop macro already allows you to use a
function to evaluate the start/stop conditions so the fix only requires
the right helper function to evaluate all the conditions at once.
The patch removes idpf_tx_maybe_stop_common since it's no longer needed
and instead adjusts separately the conditions for singleq and splitq.
Note that idpf_tx_buf_hw_update doesn't need to check for resources
since that will be covered in idpf_tx_splitq_frame.
To reproduce:
Reduce the threshold for pending completions to increase the chances of
hitting this pause by changing your kernel:
drivers/net/ethernet/intel/idpf/idpf_txrx.h
-#define IDPF_TX_COMPLQ_OVERFLOW_THRESH(txcq) ((txcq)->desc_count >> 1)
+#define IDPF_TX_COMPLQ_OVERFLOW_THRESH(txcq) ((txcq)->desc_count >> 4)
Use pktgen to force the host to push small pkts very aggressively:
./pktgen_sample02_multiqueue.sh -i eth1 -s 100 -6 -d $IP -m $MAC \
-p 10000-10000 -t 16 -n 0 -v -x -c 64
Fixes: 6818c4d5b3 ("idpf: add splitq start_xmit")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Josh Hay <joshua.a.hay@intel.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Luigi Rizzo <lrizzo@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Cross-merge networking fixes after downstream PR (net-6.15-rc8).
Conflicts:
80f2ab46c2 ("irdma: free iwdev->rf after removing MSI-X")
4bcc063939 ("ice, irdma: fix an off by one in error handling code")
c24a65b6a2 ("iidc/ice/irdma: Update IDC to support multiple consumers")
https://lore.kernel.org/20250513130630.280ee6c5@canb.auug.org.au
No extra adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
idpf_vport_splitq_napi_poll() can incorrectly return @budget
after napi_complete_done() has been called.
This violates NAPI rules, because after napi_complete_done(),
current thread lost napi ownership.
Move the test against POLL_MODE before the napi_complete_done().
Fixes: c2d548cad1 ("idpf: add TX splitq napi poll support")
Reported-by: Peter Newman <peternewman@google.com>
Closes: https://lore.kernel.org/netdev/20250520121908.1805732-1-edumazet@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Joshua Hay <joshua.a.hay@intel.com>
Cc: Alan Brady <alan.brady@intel.com>
Cc: Madhu Chittim <madhu.chittim@intel.com>
Cc: Phani Burra <phani.r.burra@intel.com>
Cc: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Link: https://patch.msgid.link/20250520124030.1983936-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
idpf_features_check is used to validate the TX packet. skb header
length is compared with the hardware supported value received from
the device control plane. The value is stored in the adapter structure
and to access it, vport pointer is used. During reset all the vports
are released and the vport pointer that the netdev private structure
points to is NULL.
To avoid null-ptr-deref, store the max header length value in netdev
private structure. This also helps to cache the value and avoid
accessing adapter pointer in hot path.
BUG: kernel NULL pointer dereference, address: 0000000000000068
...
RIP: 0010:idpf_features_check+0x6d/0xe0 [idpf]
Call Trace:
<TASK>
? __die+0x23/0x70
? page_fault_oops+0x154/0x520
? exc_page_fault+0x76/0x190
? asm_exc_page_fault+0x26/0x30
? idpf_features_check+0x6d/0xe0 [idpf]
netif_skb_features+0x88/0x310
validate_xmit_skb+0x2a/0x2b0
validate_xmit_skb_list+0x4c/0x70
sch_direct_xmit+0x19d/0x3a0
__dev_queue_xmit+0xb74/0xe70
...
Fixes: a251eee621 ("idpf: add SRIOV support and other ndo_ops")
Reviewed-by: Madhu Chititm <madhu.chittim@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add Rx timestamp function when the Rx timestamp value is read directly
from the Rx descriptor. In order to extend the Rx timestamp value to 64
bit in hot path, the PHC time is cached in the receive groups.
Add supported Rx timestamp modes.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Tested-by: YiFei Zhu <zhuyifei@google.com>
Tested-by: Mina Almasry <almasrymina@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add functions to request Tx timestamp for the PTP packets, read the Tx
timestamp when the completion tag for that packet is being received,
extend the Tx timestamp value and set the supported timestamping modes.
Tx timestamp is requested for the PTP packets by setting a TSYN bit and
index value in the Tx context descriptor. The driver assumption is that
the Tx timestamp value is ready to be read when the completion tag is
received. Then the driver schedules delayed work and the Tx timestamp
value read is requested through virtchnl message. At the end, the Tx
timestamp value is extended to 64-bit and provided back to the skb.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Co-developed-by: Josh Hay <joshua.a.hay@intel.com>
Signed-off-by: Josh Hay <joshua.a.hay@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tx timestamp capabilities are negotiated for the uplink Vport.
Driver receives information about the number of available Tx timestamp
latches, the size of Tx timestamp value and the set of indexes used
for Tx timestamping.
Add function to get the Tx timestamp capabilities and parse the uplink
vport flag.
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Co-developed-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
PTP clock configuration operations - set time, adjust time and adjust
frequency are required to control the clock and maintain synchronization
process.
Extend get PTP capabilities function to request for the clock adjustments
and add functions to enable these actions using dedicated virtchnl
messages.
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Tested-by: Mina Almasry <almasrymina@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When the access to read PTP clock is specified as mailbox, the driver
needs to send virtchnl message to perform PTP actions. Message is sent
using idpf_mbq_opc_send_msg_to_peer_drv mailbox opcode, with the parameters
received during PTP capabilities negotiation.
Add functions to recognize PTP messages, move them to dedicated secondary
mailbox, read the PTP clock time and cross timestamp using mailbox
messages.
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
PTP capabilities are negotiated using virtchnl command. Add get
capabilities function, direct access to read the PTP clock.
Set initial PTP capabilities exposed to the stack.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Tested-by: Mina Almasry <almasrymina@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Move virtchnl structures to the header file to expose them for the PTP
virtchnl file.
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Tested-by: Mina Almasry <almasrymina@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
PTP capabilities are negotiated using virtchnl commands. There are two
available modes of the PTP support: direct and mailbox. When the direct
access to PTP resources is negotiated, virtchnl messages returns a set
of registers that allow read/write directly. When the mailbox access to
PTP resources is negotiated, virtchnl messages are used to access
PTP clock and to read the timestamp values.
Virtchnl API covers both modes and exposes a set of PTP capabilities.
Using virtchnl API, the driver recognizes also HW abilities - maximum
adjustment of the clock and the basic increment value.
Additionally, API allows to configure the secondary mailbox, dedicated
exclusively for PTP purposes.
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Tested-by: Mina Almasry <almasrymina@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
PTP feature is supported if the VIRTCHNL2_CAP_PTP is negotiated during the
capabilities recognition. Initial PTP support includes PTP initialization
and registration of the clock.
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Tested-by: Mina Almasry <almasrymina@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Since workqueues are created per CPU, the works scheduled to this
workqueues are run on the CPU they were assigned. It may result in
overloaded CPU that is not able to handle virtchnl messages in
relatively short time. Allocating workqueue with WQ_UNBOUND and
WQ_HIGHPRI flags allows scheduler to queue virtchl messages on less loaded
CPUs, what eliminates delays.
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-04-29 (igb, igc, ixgbe, idpf)
For igb:
Kurt Kanzenbach adds linking of IRQs and queues to NAPI instances and
adds persistent NAPI config. Lastly, he removes undesired IRQs that
occur while busy polling.
For igc:
Kurt Kanzenbach switches the Tx mode for MQPRIO offload to harmonize the
current implementation with TAPRIO.
For ixgbe:
Jedrzej adds separate ethtool ops for E610 devices to account for device
differences.
Slawomir adds devlink region support for E610 devices.
For idpf:
Mateusz assigns and utilizes the ptype field out of libeth_rqe_info.
Michal removes unreachable code.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
idpf: remove unreachable code from setting mailbox
idpf: assign extracted ptype to struct libeth_rqe_info field
ixgbe: devlink: add devlink region support for E610
ixgbe: add E610 .set_phys_id() callback implementation
ixgbe: apply different rules for setting FC on E610
ixgbe: add support for ACPI WOL for E610
ixgbe: create E610 specific ethtool_ops structure
igc: Change Tx mode for MQPRIO offloading
igc: Limit netdev_tc calls to MQPRIO
igb: Get rid of spurious interrupts
igb: Add support for persistent NAPI config
igb: Link queues to NAPI instances
igb: Link IRQs to NAPI instances
====================
Link: https://patch.msgid.link/20250429234651.3982025-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove code that isn't reached. There is no need to check for
adapter->req_vec_chunks, because if it isn't set idpf_set_mb_vec_id()
won't be called.
Only one path when idpf_set_mb_vec_id() is called:
idpf_intr_req()
-> idpf_send_alloc_vectors_msg() -> adapter->req_vec_chunk is allocated
here, otherwise an error is returned and idpf_intr_req() exits with an
error.
The idpf_set_mb_vec_id() becomes one-liner and it is called only once.
Remove it and set mailbox vector index directly.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Assign the ptype extracted from qword to the ptype field of struct
libeth_rqe_info.
Remove the now excess ptype param of idpf_rx_singleq_extract_fields(),
idpf_rx_singleq_extract_base_fields() and
idpf_rx_singleq_extract_flex_fields().
Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Before the referenced commit, the shutdown just called idpf_remove(),
this way IDPF_REMOVE_IN_PROG was protecting us from the serv_task
rescheduling reset. Without this flag set the shutdown process is
vulnerable to HW reset or any other triggering conditions (such as
default mailbox being destroyed).
When one of conditions checked in idpf_service_task becomes true,
vc_event_task can be rescheduled during shutdown, this leads to accessing
freed memory e.g. idpf_req_rel_vector_indexes() trying to read
vport->q_vector_idxs. This in turn causes the system to become defunct
during e.g. systemctl kexec.
Considering using IDPF_REMOVE_IN_PROG would lead to more heavy shutdown
process, instead just cancel the serv_task before cancelling
adapter->serv_task before cancelling adapter->vc_event_task to ensure that
reset will not be scheduled while we are doing a shutdown.
Fixes: 4c9106f490 ("idpf: fix adapter NULL pointer dereference on reboot")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In case of failing on rss_data->rss_key allocation the function is
freeing vport without freeing earlier allocated q_vector_idxs. Fix it.
Move from freeing in error branch to goto scheme.
Fixes: d4d5587182 ("idpf: initialize interrupts and enable vport")
Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Suggested-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Split offloads into csum, tso and other offloads so that tunneled
packets do not by default have all the offloads enabled.
Stateless offloads for encapsulated packets are not yet supported in
firmware/software but in the driver we were setting the features same as
non encapsulated features.
Fixed naming to clarify CSUM bits are being checked for Tx.
Inherit netdev features to VLAN interfaces as well.
Fixes: 0fe45467a1 ("idpf: add create vport and netdev configuration")
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Tested-by: Zachary Goldstein <zachmgoldstein@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250425222636.3188441-4-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With SRIOV enabled, idpf ends up calling into idpf_remove() twice.
First via idpf_shutdown() and then again when idpf_remove() calls into
sriov_disable(), because the VF devices use the idpf driver, hence the
same remove routine. When that happens, it is possible for the adapter
to be NULL from the first call to idpf_remove(), leading to a NULL
pointer dereference.
echo 1 > /sys/class/net/<netif>/device/sriov_numvfs
reboot
BUG: kernel NULL pointer dereference, address: 0000000000000020
...
RIP: 0010:idpf_remove+0x22/0x1f0 [idpf]
...
? idpf_remove+0x22/0x1f0 [idpf]
? idpf_remove+0x1e4/0x1f0 [idpf]
pci_device_remove+0x3f/0xb0
device_release_driver_internal+0x19f/0x200
pci_stop_bus_device+0x6d/0x90
pci_stop_and_remove_bus_device+0x12/0x20
pci_iov_remove_virtfn+0xbe/0x120
sriov_disable+0x34/0xe0
idpf_sriov_configure+0x58/0x140 [idpf]
idpf_remove+0x1b9/0x1f0 [idpf]
idpf_shutdown+0x12/0x30 [idpf]
pci_device_shutdown+0x35/0x60
device_shutdown+0x156/0x200
...
Replace the direct idpf_remove() call in idpf_shutdown() with
idpf_vc_core_deinit() and idpf_deinit_dflt_mbx(), which perform
the bulk of the cleanup, such as stopping the init task, freeing IRQs,
destroying the vports and freeing the mailbox. This avoids the calls to
sriov_disable() in addition to a small netdev cleanup, and destroying
workqueues, which don't seem to be required on shutdown.
Reported-by: Yuying Ma <yuma@redhat.com>
Fixes: e850efed5e ("idpf: add module register and probe functionality")
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Merge in late fixes to prepare for the 6.15 net-next PR.
No conflicts, adjacent changes:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
919f9f497d ("eth: bnxt: fix out-of-range access of vnic_info array")
fe96d717d3 ("bnxt_en: Extend queue stop/start for TX rings")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current init logic ignores the error code from register_netdev(),
which will cause WARN_ON() on attempt to unregister it, if there was one,
and there is no info for the user that the creation of the netdev failed.
WARNING: CPU: 89 PID: 6902 at net/core/dev.c:11512 unregister_netdevice_many_notify+0x211/0x1a10
...
[ 3707.563641] unregister_netdev+0x1c/0x30
[ 3707.563656] idpf_vport_dealloc+0x5cf/0xce0 [idpf]
[ 3707.563684] idpf_deinit_task+0xef/0x160 [idpf]
[ 3707.563712] idpf_vc_core_deinit+0x84/0x320 [idpf]
[ 3707.563739] idpf_remove+0xbf/0x780 [idpf]
[ 3707.563769] pci_device_remove+0xab/0x1e0
[ 3707.563786] device_release_driver_internal+0x371/0x530
[ 3707.563803] driver_detach+0xbf/0x180
[ 3707.563816] bus_remove_driver+0x11b/0x2a0
[ 3707.563829] pci_unregister_driver+0x2a/0x250
Introduce an error check and log the vport number and error code.
On removal make sure to check VPORT_REG_NETDEV flag prior to calling
unregister and free on the netdev.
Add local variables for idx, vport_config and netdev for readability.
Fixes: 0fe45467a1 ("idpf: add create vport and netdev configuration")
Suggested-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Structs idpf_rx_csum_decoded and idpf_rx_extracted are used both in
idpf and iavf Intel drivers. Change the prefix from idpf_* to libeth_*
and move mentioned structs to libeth's rx.h header file.
Adjust usage in idpf driver.
Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
On initial driver load, alloc_etherdev_mqs is called with whatever max
queue values are provided by the control plane. However, if the driver
is loaded on a system where num_online_cpus() returns less than the max
queues, the netdev will think there are more queues than are actually
available. Only num_online_cpus() will be allocated, but
skb_get_queue_mapping(skb) could possibly return an index beyond the
range of allocated queues. Consequently, the packet is silently dropped
and it appears as if TX is broken.
Set the real number of queues during open so the netdev knows how many
queues will be allocated.
Fixes: 1c325aac10 ("idpf: configure resources for TX queues")
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Move the call to skb_record_rx_queue in idpf_rx_process_skb_fields()
so that RX queue is recorded for RSC packets too.
Fixes: 90912f9f4f ("idpf: convert header split mode to libeth + napi_build_skb()")
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Handle rsc packet with a single segment same as a multi
segment rsc packet so that CHECKSUM_PARTIAL is set in the
skb->ip_summed field. The current code is passing CHECKSUM_NONE
resulting in TCP GRO layer doing checksum in SW and hiding the
issue. This will fail when using dmabufs as payload buffers as
skb frag would be unreadable.
Fixes: 3a8845af66 ("idpf: add RX splitq napi poll support")
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add more information related to the transaction like cookie, vc_op,
salt when transaction times out and include similar information
when transaction salt does not match.
Info output for transaction timeout:
-------------------
(op:5015 cookie:45fe vc_op:5015 salt:45 timeout:60000ms)
-------------------
before it was:
-------------------
(op 5015, 60000ms)
-------------------
Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When a workqueue is created with `WQ_UNBOUND`, its work items are
served by special worker-pools, whose host workers are not bound to
any specific CPU. In the default configuration (i.e. when
`queue_delayed_work` and friends do not specify which CPU to run the
work item on), `WQ_UNBOUND` allows the work item to be executed on any
CPU in the same node of the CPU it was enqueued on. While this
solution potentially sacrifices locality, it avoids contention with
other processes that might dominate the CPU time of the processor the
work item was scheduled on.
This is not just a theoretical problem: in a particular scenario
misconfigured process was hogging most of the time from CPU0, leaving
less than 0.5% of its CPU time to the kworker. The IDPF workqueues
that were using the kworker on CPU0 suffered large completion delays
as a result, causing performance degradation, timeouts and eventual
system crash.
Tested:
* I have also run a manual test to gauge the performance
improvement. The test consists of an antagonist process
(`./stress --cpu 2`) consuming as much of CPU 0 as possible. This
process is run under `taskset 01` to bind it to CPU0, and its
priority is changed with `chrt -pQ 9900 10000 ${pid}` and
`renice -n -20 ${pid}` after start.
Then, the IDPF driver is forced to prefer CPU0 by editing all calls
to `queue_delayed_work`, `mod_delayed_work`, etc... to use CPU 0.
Finally, `ktraces` for the workqueue events are collected.
Without the current patch, the antagonist process can force
arbitrary delays between `workqueue_queue_work` and
`workqueue_execute_start`, that in my tests were as high as
`30ms`. With the current patch applied, the workqueue can be
migrated to another unloaded CPU in the same node, and, keeping
everything else equal, the maximum delay I could see was `6us`.
Fixes: 0fe45467a1 ("idpf: add create vport and netdev configuration")
Signed-off-by: Marco Leogrande <leogrande@google.com>
Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The transaction salt was being accessed before acquiring the
idpf_vc_xn_lock when idpf has to forward the virtchnl reply.
Fixes: 34c21fa894 ("idpf: implement virtchnl transaction manager")
Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
Signed-off-by: David Decotigny <decot@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Restore the call to idpf_vc_xn_shutdown() at the beginning of
idpf_vc_core_deinit() provided the function is not called on remove.
In the reset path the mailbox is destroyed, leading to all transactions
timing out.
Fixes: 09d0fb5cb3 ("idpf: deinit virtchnl transaction manager after vport and vectors")
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add read memory barrier to ensure the order of operations when accessing
control queue descriptors. Specifically, we want to avoid cases where loads
can be reordered:
1. Load #1 is dispatched to read descriptor flags.
2. Load #2 is dispatched to read some other field from the descriptor.
3. Load #2 completes, accessing memory/cache at a point in time when the DD
flag is zero.
4. NIC DMA overwrites the descriptor, now the DD flag is one.
5. Any fields loaded before step 4 are now inconsistent with the actual
descriptor state.
Add read memory barrier between steps 1 and 2, so that load #2 is not
executed until load #1 has completed.
Fixes: 8077c72756 ("idpf: add controlq init and reset checks")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Suggested-by: Lance Richardson <rlance@google.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There is a race condition between exiting wb_on_itr and completion write
backs. For example, we are in wb_on_itr mode and a Tx completion is
generated by HW, ready to be written back, as we are re-enabling
interrupts:
HW SW
| |
| | idpf_tx_splitq_clean_all
| | napi_complete_done
| |
| tx_completion_wb | idpf_vport_intr_update_itr_ena_irq
That tx_completion_wb happens before the vector is fully re-enabled.
Continuing with this example, it is a UDP stream and the
tx_completion_wb is the last one in the flow (there are no rx packets).
Because the HW generated the completion before the interrupt is fully
enabled, the HW will not fire the interrupt once the timer expires and
the write back will not happen. NAPI poll won't be called. We have
indicated we're back in interrupt mode but nothing else will trigger the
interrupt. Therefore, the completion goes unprocessed, triggering a Tx
timeout.
To mitigate this, fire a SW triggered interrupt upon exiting wb_on_itr.
This interrupt will catch the rogue completion and avoid the timeout.
Add logic to set the appropriate bits in the vector's dyn_ctl register.
Fixes: 9c4a27da0e ("idpf: enable WB_ON_ITR")
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
SW triggered interrupts are guaranteed to fire after their timer
expires, unlike Tx and Rx interrupts which will only fire after the
timer expires _and_ a descriptor write back is available to be processed
by the driver.
Add the necessary fields, defines, and initializations to enable a SW
triggered interrupt in the vector's dyn_ctl register.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Current release - regressions:
- rtnetlink: fix double call of rtnl_link_get_net_ifla()
- tcp: populate XPS related fields of timewait sockets
- ethtool: fix access to uninitialized fields in set RXNFC command
- selinux: use sk_to_full_sk() in selinux_ip_output()
Current release - new code bugs:
- net: make napi_hash_lock irq safe
- eth: bnxt_en: support header page pool in queue API
- eth: ice: fix NULL pointer dereference in switchdev
Previous releases - regressions:
- core: fix icmp host relookup triggering ip_rt_bug
- ipv6:
- avoid possible NULL deref in modify_prefix_route()
- release expired exception dst cached in socket
- smc: fix LGR and link use-after-free issue
- hsr: avoid potential out-of-bound access in fill_frame_info()
- can: hi311x: fix potential use-after-free
- eth: ice: fix VLAN pruning in switchdev mode
Previous releases - always broken:
- netfilter:
- ipset: hold module reference while requesting a module
- nft_inner: incorrect percpu area handling under softirq
- can: j1939: fix skb reference counting
- eth: mlxsw: use correct key block on Spectrum-4
- eth: mlx5: fix memory leak in mlx5hws_definer_calc_layout
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmdRve4SHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOk/TUQAItDVkTQDiAUUqd0TDH2SSnboxXozcMF
fKVrdWulFAOe7qcajUqjkzvTDjAOw9Xbfh8rEiFBLaXmUzSUT2eqbm2VahvWIR5/
k08v7fTTuCNzEwhnbQlsZ47Nd26LJVPwOvbtE/4V8d50pU/serjWuI/74tUCWAjn
DQOjyyqRjKgKKY+WWCQ6g4tVpD//DB4Sj15GiE3MhlW1f08AAnPJSe2oTaq0RZBG
nXo7abOGn8x3RYilrlp/ZwWYuNpVI4oF+qmp+t/46NV+7ER1JgrC97k0kFyyCYVD
g7vBvFjvA7vDmiuzfsOW2n7IRdcfBtkfi8UJYOIvVgJg7KDF0DXoxj3qD4FagI35
RWRMJw+PoNlFFkPprlp0we/19jPJWOO6rx+AOMEQD78jrH7NoFQ/+eeBf/nppfjy
wX0LD1aQsgPk2ju0I8GbcM/qaJ81EJUiYYVLJHieH0+vvmqts8cMDRLZf5t08EHa
myXcRGB9N8gjguGp5mdR5KmtY82zASlNC0PbDp3nlPYc3e/opCmMRYGjBohO+vqn
n7u250WThPwiBtOwYmSbcK7zpS034/VX0ufnTT2X3MWnFGGNDv6XVmho/OBuCHqJ
m/EiJo/D9qVGAIql/vAxN9a4lQKrcZgaMzCyEPYCf7rtLmzx7sfyHWbf/GZlUAU/
9dUUfWqCbZcL
=nkPk
-----END PGP SIGNATURE-----
Merge tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can and netfilter.
Current release - regressions:
- rtnetlink: fix double call of rtnl_link_get_net_ifla()
- tcp: populate XPS related fields of timewait sockets
- ethtool: fix access to uninitialized fields in set RXNFC command
- selinux: use sk_to_full_sk() in selinux_ip_output()
Current release - new code bugs:
- net: make napi_hash_lock irq safe
- eth:
- bnxt_en: support header page pool in queue API
- ice: fix NULL pointer dereference in switchdev
Previous releases - regressions:
- core: fix icmp host relookup triggering ip_rt_bug
- ipv6:
- avoid possible NULL deref in modify_prefix_route()
- release expired exception dst cached in socket
- smc: fix LGR and link use-after-free issue
- hsr: avoid potential out-of-bound access in fill_frame_info()
- can: hi311x: fix potential use-after-free
- eth: ice: fix VLAN pruning in switchdev mode
Previous releases - always broken:
- netfilter:
- ipset: hold module reference while requesting a module
- nft_inner: incorrect percpu area handling under softirq
- can: j1939: fix skb reference counting
- eth:
- mlxsw: use correct key block on Spectrum-4
- mlx5: fix memory leak in mlx5hws_definer_calc_layout"
* tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
net :mana :Request a V2 response version for MANA_QUERY_GF_STAT
net: avoid potential UAF in default_operstate()
vsock/test: verify socket options after setting them
vsock/test: fix parameter types in SO_VM_SOCKETS_* calls
vsock/test: fix failures due to wrong SO_RCVLOWAT parameter
net/mlx5e: Remove workaround to avoid syndrome for internal port
net/mlx5e: SD, Use correct mdev to build channel param
net/mlx5: E-Switch, Fix switching to switchdev mode in MPV
net/mlx5: E-Switch, Fix switching to switchdev mode with IB device disabled
net/mlx5: HWS: Properly set bwc queue locks lock classes
net/mlx5: HWS: Fix memory leak in mlx5hws_definer_calc_layout
bnxt_en: handle tpa_info in queue API implementation
bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap()
bnxt_en: refactor tpa_info alloc/free into helpers
geneve: do not assume mac header is set in geneve_xmit_skb()
mlxsw: spectrum_acl_flex_keys: Use correct key block on Spectrum-4
ethtool: Fix wrong mod state in case of verbose and no_mask bitset
ipmr: tune the ipmr_can_free_table() checks.
netfilter: nft_set_hash: skip duplicated elements pending gc run
netfilter: ipset: Hold module reference while requesting a module
...