linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00

Author	SHA1	Message	Date
Marek Majtyka	66c0e13ad2	drivers: net: turn on XDP features A summary of the flags being set for various drivers is given below. Note that XDP_F_REDIRECT_TARGET and XDP_F_FRAG_TARGET are features that can be turned off and on at runtime. This means that these flags may be set and unset under RTNL lock protection by the driver. Hence, READ_ONCE must be used by code loading the flag value. Also, these flags are not used for synchronization against the availability of XDP resources on a device. It is merely a hint, and hence the read may race with the actual teardown of XDP resources on the device. This may change in the future, e.g. operations taking a reference on the XDP resources of the driver, and in turn inhibiting turning off this flag. However, for now, it can only be used as a hint to check whether device supports becoming a redirection target. Turn 'hw-offload' feature flag on for: - netronome (nfp) - netdevsim. Turn 'native' and 'zerocopy' features flags on for: - intel (i40e, ice, ixgbe, igc) - mellanox (mlx5). - stmmac - netronome (nfp) Turn 'native' features flags on for: - amazon (ena) - broadcom (bnxt) - freescale (dpaa, dpaa2, enetc) - funeth - intel (igb) - marvell (mvneta, mvpp2, octeontx2) - mellanox (mlx4) - mtk_eth_soc - qlogic (qede) - sfc - socionext (netsec) - ti (cpsw) - tap - tsnep - veth - xen - virtio_net. Turn 'basic' (tx, pass, aborted and drop) features flags on for: - netronome (nfp) - cavium (thunder) - hyperv. Turn 'redirect_target' feature flag on for: - amanzon (ena) - broadcom (bnxt) - freescale (dpaa, dpaa2) - intel (i40e, ice, igb, ixgbe) - ti (cpsw) - marvell (mvneta, mvpp2) - sfc - socionext (netsec) - qlogic (qede) - mellanox (mlx5) - tap - veth - virtio_net - xen Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Marek Majtyka <alardam@gmail.com> Link: https://lore.kernel.org/r/3eca9fafb308462f7edb1f58e451d59209aa07eb.1675245258.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-02 20:48:23 -08:00
Jakub Kicinski	82b4a9412b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net net/core/gro.c `7d2c89b325` ("skb: Do mix page pool and page referenced frags in GRO") `b1a78b9b98` ("net: add support for ipv4 big tcp") https://lore.kernel.org/all/20230203094454.5766f160@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-02 14:49:55 -08:00
Maciej Fijalkowski	a24b4c6e9a	ice: xsk: Do not convert to buff to frame for XDP_TX Let us store pointer to xdp_buff that came from xsk_buff_pool on tx_buf so that it will be possible to recycle it via xsk_buff_free() on Tx cleaning side. This way it is not necessary to do expensive copy to another xdp_buff backed by a newly allocated page. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-14-maciej.fijalkowski@intel.com	2023-02-01 23:30:27 +01:00
Maciej Fijalkowski	f4db7b314d	ice: Remove next_{dd,rs} fields from ice_tx_ring Now that both ZC and standard XDP data paths stopped using Tx logic based on next_dd and next_rs fields, we can safely remove these fields and shrink Tx ring structure. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-13-maciej.fijalkowski@intel.com	2023-02-01 23:30:27 +01:00
Maciej Fijalkowski	3246a10752	ice: Add support for XDP multi-buffer on Tx side Similarly as for Rx side in previous patch, logic on XDP Tx in ice driver needs to be adjusted for multi-buffer support. Specifically, the way how HW Tx descriptors are produced and cleaned. Currently, XDP_TX works on strict ring boundaries, meaning it sets RS bit (on producer side) / looks up DD bit (on consumer/cleaning side) every quarter of the ring. It means that if for example multi buffer frame would span across the ring quarter boundary (say that frame consists of 4 frames and we start from 62 descriptor where ring is sized to 256 entries), RS bit would be produced in the middle of multi buffer frame, which would be a broken behavior as it needs to be set on the last descriptor of the frame. To make it work, set RS bit at the last descriptor from the batch of frames that XDP_TX action was used on and make the first entry remember the index of last descriptor with RS bit set. This way, cleaning side can take the index of descriptor with RS bit, look up DD bit's presence and clean from first entry to last. In order to clean up the code base introduce the common ice_set_rs_bit() which will return index of descriptor that got RS bit produced on so that standard driver can store this within proper ice_tx_buf and ZC driver can simply ignore return value. Co-developed-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-12-maciej.fijalkowski@intel.com	2023-02-01 23:30:27 +01:00
Maciej Fijalkowski	2fba7dc515	ice: Add support for XDP multi-buffer on Rx side Ice driver needs to be a bit reworked on Rx data path in order to support multi-buffer XDP. For skb path, it currently works in a way that Rx ring carries pointer to skb so if driver didn't manage to combine fragmented frame at current NAPI instance, it can restore the state on next instance and keep looking for last fragment (so descriptor with EOP bit set). What needs to be achieved is that xdp_buff needs to be combined in such way (linear + frags part) in the first place. Then skb will be ready to go in case of XDP_PASS or BPF program being not present on interface. If BPF program is there, it would work on multi-buffer XDP. At this point xdp_buff resides directly on Rx ring, so given the fact that skb will be built straight from xdp_buff, there will be no further need to carry skb on Rx ring. Besides removing skb pointer from Rx ring, lots of members have been moved around within ice_rx_ring. First and foremost reason was to place rx_buf with xdp_buff on the same cacheline. This means that once we touch rx_buf (which is a preceding step before touching xdp_buff), xdp_buff will already be hot in cache. Second thing was that xdp_rxq is used rather rarely and it occupies a separate cacheline, so maybe it is better to have it at the end of ice_rx_ring. Other change that affects ice_rx_ring is the introduction of ice_rx_ring::first_desc. Its purpose is twofold - first is to propagate rx_buf->act to all the parts of current xdp_buff after running XDP program, so that ice_put_rx_buf() that got moved out of the main Rx processing loop will be able to tak an appriopriate action on each buffer. Second is for ice_construct_skb(). ice_construct_skb() has a copybreak mechanism which had an explicit impact on xdp_buff->skb conversion in the new approach when legacy Rx flag is toggled. It works in a way that linear part is 256 bytes long, if frame is bigger than that, remaining bytes are going as a frag to skb_shared_info. This means while memcpying frags from xdp_buff to newly allocated skb, care needs to be taken when picking the destination frag array entry. Upon the time ice_construct_skb() is called, when dealing with fragmented frame, current rx_buf points to the last fragment, but copybreak needs to be done against the first one. That's where ice_rx_ring::first_desc helps. When frame building spans across NAPI polls (DD bit is not set on current descriptor and xdp->data is not NULL) with current Rx buffer handling state there might be some problems. Since calls to ice_put_rx_buf() were pulled out of the main Rx processing loop and were scoped from cached_ntc to current ntc, remember that now mentioned function relies on rx_buf->act, which is set within ice_run_xdp(). ice_run_xdp() is called when EOP bit was found, so currently we could put Rx buffer with rx_buf->act being uninitialized. To address this, change scoping to rely on first_desc on both boundaries instead. This also implies that cleaned_count which is used as an input to ice_alloc_rx_buffers() and tells how many new buffers should be refilled has to be adjusted. If it stayed as is, what could happen is a case where ntc would go over ntu. Therefore, remove cleaned_count altogether and use against allocing routine newly introduced ICE_RX_DESC_UNUSED() macro which is an equivalent of ICE_DESC_UNUSED() dedicated for Rx side and based on struct ice_rx_ring::first_desc instead of next_to_clean. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-11-maciej.fijalkowski@intel.com	2023-02-01 23:30:27 +01:00
Maciej Fijalkowski	8a11b334ec	ice: Use xdp->frame_sz instead of recalculating truesize SKB path calculates truesize on three different functions, which could be avoided as xdp_buff carries the already calculated truesize under xdp_buff::frame_sz. If ice_add_rx_frag() is adjusted to take the xdp_buff as an input just like functions responsible for creating sk_buff initially, codebase could be simplified by removing these redundant recalculations and rely on xdp_buff::frame_sz instead. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-10-maciej.fijalkowski@intel.com	2023-02-01 23:30:27 +01:00
Maciej Fijalkowski	9070fe3da0	ice: Do not call ice_finalize_xdp_rx() unnecessarily Currently ice_finalize_xdp_rx() is called only when xdp_prog is present on VSI, which is a good thing. However, this optimization can be enhanced and check only if any of the XDP_TX/XDP_REDIRECT took place in current Rx processing. Non-zero value of @xdp_xmit indicates that xdp_prog is present on VSI. This way XDP_DROP-based workloads will not suffer from unnecessary calls to external function. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-9-maciej.fijalkowski@intel.com	2023-02-01 23:30:27 +01:00
Maciej Fijalkowski	60bc72b3c4	ice: Use ice_max_xdp_frame_size() in ice_xdp_setup_prog() This should have been used in there from day 1, let us address that before introducing XDP multi-buffer support for Rx side. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-8-maciej.fijalkowski@intel.com	2023-02-01 23:30:27 +01:00
Maciej Fijalkowski	1dc1a7e7f4	ice: Centrallize Rx buffer recycling Currently calls to ice_put_rx_buf() are sprinkled through ice_clean_rx_irq() - first place is for explicit flow director's descriptor handling, second is after running XDP prog and the last one is after taking care of skb. 1st callsite was actually only for ntc bump purpose, as Rx buffer to be recycled is not even passed to a function. It is possible to walk through Rx buffers processed in particular NAPI cycle by caching ntc from beginning of the ice_clean_rx_irq(). To do so, let us store XDP verdict inside ice_rx_buf, so action we need to take on will be known. For XDP prog absence, just store ICE_XDP_PASS as a verdict. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-7-maciej.fijalkowski@intel.com	2023-02-01 23:30:26 +01:00
Maciej Fijalkowski	e44f4790a2	ice: Inline eop check This might be in future used by ZC driver and might potentially yield a minor performance boost. While at it, constify arguments that ice_is_non_eop() takes, since they are pointers and this will help compiler while generating asm. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-6-maciej.fijalkowski@intel.com	2023-02-01 23:30:26 +01:00
Maciej Fijalkowski	d7956d81f1	ice: Pull out next_to_clean bump out of ice_put_rx_buf() Plan is to move ice_put_rx_buf() to the end of ice_clean_rx_irq() so in order to keep the ability of walking through HW Rx descriptors, pull out next_to_clean handling out of ice_put_rx_buf(). Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-5-maciej.fijalkowski@intel.com	2023-02-01 23:30:26 +01:00
Maciej Fijalkowski	ac07533911	ice: Store page count inside ice_rx_buf This will allow us to avoid carrying additional auxiliary array of page counts when dealing with XDP multi buffer support. Previously combining fragmented frame to skb was not affected in the same way as XDP would be as whole frame is needed to be in place before executing XDP prog. Therefore, when going through HW Rx descriptors one-by-one, calls to ice_put_rx_buf() need to be taken after running XDP prog on a potentially multi buffered frame, so some additional storage of page count is needed. By adding page count to rx buf, it will make it easier to walk through processed entries at the end of rx cleaning routine and decide whether or not buffers should be recycled. While at it, bump ice_rx_buf::pagecnt_bias from u16 up to u32. It was proven many times that calculations on variables smaller than standard register size are harmful. This was also the case during experiments with embedding page count to ice_rx_buf - when this was added as u16 it had a performance impact. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-4-maciej.fijalkowski@intel.com	2023-02-01 23:30:26 +01:00
Maciej Fijalkowski	cb0473e0e9	ice: Add xdp_buff to ice_rx_ring struct In preparation for XDP multi-buffer support, let's store xdp_buff on Rx ring struct. This will allow us to combine fragmented frames across separate NAPI cycles in the same way as currently skb fragments are handled. This means that skb pointer on Rx ring will become redundant and will be removed. For now it is kept and layout of Rx ring struct was not inspected, some member movement will be needed later on so that will be the time to take care of it. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-3-maciej.fijalkowski@intel.com	2023-02-01 23:30:26 +01:00
Maciej Fijalkowski	c61bcebde7	ice: Prepare legacy-rx for upcoming XDP multi-buffer support Rx path is going to be modified in a way that fragmented frame will be gathered within xdp_buff in the first place. This approach implies that underlying buffer has to provide tailroom for skb_shared_info. This is currently the case when ring uses build_skb but not when legacy-rx knob is turned on. This case configures 2k Rx buffers and has no way to provide either headroom or tailroom - FWIW it currently has XDP_PACKET_HEADROOM which is broken and in here it is removed. 2k Rx buffers were used so driver in this setting was able to support 9k MTU as it can chain up to 5 Rx buffers. With offset configuring HW writing 2k of a data was passing the half of the page which broke the assumption of our internal page recycling tricks. Now if above got fixed and legacy-rx path would be left as is, when referring to skb_shared_info via xdp_get_shared_info_from_buff(), packet's content would be corrupted again. Hence size of Rx buffer needs to be lowered and therefore supported MTU. This operation will allow us to keep the unified data path and with 8k MTU users (if any of legacy-rx) would still be good to go. However, tendency is to drop the support for this code path at some point. Add ICE_RXBUF_1664 as vsi::rx_buf_len and ICE_MAX_FRAME_LEGACY_RX (8320) as vsi::max_frame for legacy-rx. For bigger page sizes configure 3k Rx buffers, not 2k. Since headroom support is removed, disable data_meta support on legacy-rx. When preparing XDP buff, rely on ice_rx_ring::rx_offset setting when deciding whether to support data_meta or not. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-2-maciej.fijalkowski@intel.com	2023-02-01 23:30:26 +01:00
Bjorn Helgaas	ba153552c1	ice: Remove redundant pci_enable_pcie_error_reporting() pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since `f26e58bf6f` ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this doesn't control interrupt generation by the Root Port; that is controlled by the AER Root Error Command register, which is managed by the AER service driver. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-30 09:35:05 -08:00
Jiri Pirko	fb8421a94c	devlink: remove devlink features Devlink features were introduced to disallow devlink reload calls of userspace before the devlink was fully initialized. The reason for this workaround was the fact that devlink reload was originally called without devlink instance lock held. However, with recent changes that converted devlink reload to be performed under devlink instance lock, this is redundant so remove devlink features entirely. Note that mlx5 used this to enable devlink reload conditionally only when device didn't act as multi port slave. Move the multi port check into mlx5_devlink_reload_down() callback alongside with the other checks preventing the device from reload in certain states. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-30 08:37:46 +00:00
Jakub Kicinski	b568d3072a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/intel/ice/ice_main.c `418e53401e` ("ice: move devlink port creation/deletion") `643ef23bd9` ("ice: Introduce local var for readability") https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/ https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/ drivers/net/ethernet/engleder/tsnep_main.c `3d53aaef43` ("tsnep: Fix TX queue stop/wake for multiple queues") `25faa6a4c5` ("tsnep: Replace TX spin_lock with __netif_tx_lock") https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/ net/netfilter/nf_conntrack_proto_sctp.c `13bd9b31a9` ("Revert "netfilter: conntrack: add sctp DATA_SENT state"") `a44b765148` ("netfilter: conntrack: unify established states for SCTP paths") `f71cb8f45d` ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets") https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/ https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-27 22:56:18 -08:00
Dave Ertman	a6a0974aae	ice: Prevent set_channel from changing queues while RDMA active The PF controls the set of queues that the RDMA auxiliary_driver requests resources from. The set_channel command will alter that pool and trigger a reconfiguration of the VSI, which breaks RDMA functionality. Prevent set_channel from executing when RDMA driver bound to auxiliary device. Adding a locked variable to pass down the call chain to avoid double locking the device_lock. Fixes: `348048e724` ("ice: Implement iidc operations") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-27 11:32:18 -08:00
Jiri Pirko	2fc631b5d7	ice: remove pointless calls to devlink_param_driverinit_value_set() devlink_param_driverinit_value_set() call makes sense only for "driverinit" params. However here, both params are "runtime". devlink_param_driverinit_value_set() returns -EOPNOTSUPP in such case and does not do anything. So remove the pointless calls to devlink_param_driverinit_value_set() entirely. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-27 12:32:02 +00:00
Paul M Stillwell Jr	418e53401e	ice: move devlink port creation/deletion Commit `a286ba7387` ("ice: reorder PF/representor devlink port register/unregister flows") moved the code to create and destroy the devlink PF port. This was fine, but created a corner case issue in the case of ice_register_netdev() failing. In that case, the driver would end up calling ice_devlink_destroy_pf_port() twice. Additionally, it makes no sense to tie creation of the devlink PF port to the creation of the netdev so separate out the code to create/destroy the devlink PF port from the netdev code. This makes it a cleaner interface. Fixes: `a286ba7387` ("ice: reorder PF/representor devlink port register/unregister flows") Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230124005714.3996270-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-24 18:52:15 -08:00
Arkadiusz Kubalewski	c7ef8221ca	ice: use GNSS subsystem instead of TTY Previously support for GNSS was implemented as a TTY driver, it allowed to access GNSS receiver on /dev/ttyGNSS_<bus><func>. Use generic GNSS subsystem API instead of implementing own TTY driver. The receiver is accessible on /dev/gnss<id>. In case of multiple receivers in the OS, correct device can be found by enumerating either: - /sys/class/net/<eth port>/device/gnss/ - /sys/class/gnss/gnss<id>/device/ Using GNSS subsystem is superior to implementing own TTY driver, as the GNSS subsystem was designed solely for this purpose. It also implements TTY driver but in a common and defined way. From user perspective, there is no difference in communicating with a device, except new path to the device shall be used. The device will provide same information to the userspace as the old one, and can be used in the same way, i.e.: old # gpsmon /dev/ttyGNSS_2100_0 new # gpsmon /dev/gnss0 There is no other impact on userspace tools. User expecting onboard GNSS receiver support is required to enable CONFIG_GNSS=y/m in kernel config. Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Michal Michalik <michal.michalik@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-20 13:27:17 +00:00
Tony Nguyen	d52a6180c7	ice: Remove excess space smatch reports inconsistent indenting due to an extra space; remove it to resolve the issue. smatch warnings: drivers/net/ethernet/intel/ice/ice_lib.c:1673 ice_vsi_alloc_ring_stats() warn: inconsistent indenting Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-19 08:18:03 -08:00
Tony Nguyen	643ef23bd9	ice: Introduce local var for readability Based on previous feedback[1], introduce a local var to make things more readable. [1] https://lore.kernel.org/netdev/20220315203218.607f612b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)	2023-01-19 08:18:03 -08:00
Tony Nguyen	388740b3f6	ice: Match parameter name for ice_cfg_phy_fc() The parameter name in the function declaration and definition do not match; adjust the naming for consistency and to avoid confusion. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-19 08:18:03 -08:00
Tony Nguyen	91dbcb91d0	ice: Explicitly return 0 Previous checks, and goto, will catch all errors meaning these returns will only return 0; explicitly return 0 for these cases. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)	2023-01-19 08:18:03 -08:00
Tony Nguyen	bd557d9797	ice: Reduce scope of variables There are some places where the scope of a variable can be reduced, so do that. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)	2023-01-19 08:18:03 -08:00
Sergey Temerkhanov	2ffd87d38d	ice: Move support DDP code out of ice_flex_pipe.c Currently, ice_flex_pipe.c includes the DDP loading functions and has grown large. Although flexible processing support code is related to DDP loading, these parts are distinct. Move the DDP loading functionality from ice_flex_pipe.c to a separate file. Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-19 08:18:03 -08:00
Tony Nguyen	df2a4c3f65	ice: Remove cppcheck suppressions The use of suppressions for cppcheck in the kernel does not look to be standard as the ice driver is the only one doing it. Remove the comments/suppressions. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-19 08:18:03 -08:00
Przemek Kitszel	9d20797fcd	ice: combine cases in ice_ksettings_find_adv_link_speed() Combine if statements setting the same link speed together. Suggested-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-19 08:18:03 -08:00
Anirudh Venkataramanan	2481e82077	ice: Add support for 100G KR2/CR2/SR2 link reporting Commit `2736d94f35` ("ethtool: Added support for 50Gbps per lane link modes") in v5.1 added (among other things) support for 100G CR2/KR2/SR2 link modes. Advertise these link modes if the firmware reports the corresponding PHY types. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-19 08:18:03 -08:00
Jesse Brandeburg	6a8d013e90	ice: add missing checks for PF vsi type There were a few places we had missed checking the VSI type to make sure it was definitely a PF VSI, before calling setup functions intended only for the PF VSI. This doesn't fix any explicit bugs but cleans up the code in a few places and removes one explicit != vsi->type check that can be superseded by this code (it's a super set) Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-19 08:18:03 -08:00
Anirudh Venkataramanan	d5a6df4437	ice: remove redundant non-null check in ice_setup_pf_sw() Remove a redundant null check, as vsi could not be null at this point. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-19 08:18:03 -08:00
Siddaraju DH	8aa4318c3a	ice: restrict PTP HW clock freq adjustments to 100, 000, 000 PPB The PHY provides only 39b timestamp. With current timing implementation, we discard lower 7b, leaving 32b timestamp. The driver reconstructs the full 64b timestamp by correlating the 32b timestamp with cached_time for performance. The reconstruction algorithm does both forward & backward interpolation. The 32b timeval has overflow duration of 2^32 counts ~= 4.23 second. Due to interpolation in both direction, its now ~= 2.125 second IIRC, going with at least half a duration, the cached_time is updated with periodic thread of 1 second (worst-case) periodicity. But the 1 second periodicity is based on System-timer. With PPB adjustments, if the 1588 timers increments at say double the rate, (2s in-place of 1s), the Nyquist rate/half duration sampling/update of cached_time with 1 second periodic thread will lead to incorrect interpolations. Hence we should restrict the PPB adjustments to at least half duration of cached_time update which translates to 500,000,000 PPB. Since the periodicity of the cached-time system thread can vary, it is good to have some buffer time and considering practicality of PPB adjustments, limiting the max_adj to 100,000,000. Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-19 08:18:03 -08:00
Amritha Nambiar	390889a4b4	ice: Support drop action Currently the drop action is supported only in switchdev mode. Add support for offloading receive filters with action drop in ADQ/non-ADQ modes. This is in addition to other actions such as forwarding to a VSI (ADQ) or a queue (ADQ/non-ADQ). Also renamed 'ch_vsi' to 'dest_vsi' as it is valid for multiple actions such as forward to vsi/queue which may/may not create a channel vsi. Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-19 08:18:03 -08:00
Anatolii Gerasymenko	7d8d7754e6	ice: Handle LLDP MIB Pending change If the number of Traffic Classes (TC) is decreased, the FW will no longer remove TC nodes, but will send a pending change notification. This will allow RDMA to destroy corresponding Control QP markers. After RDMA finishes outstanding operations, the ice driver will send an execute MIB Pending change admin queue command to FW to finish DCB configuration change. The FW will buffer all incoming Pending changes, so there can be only one active Pending change. RDMA driver guarantees to remove Control QP markers within 5000 ms. Hence, LLDP response timeout txTTL (default 30 sec) will be met. In the case of a Pending change, LLDP MIB Change Event (opcode 0x0A01) will contain the whole new MIB. But Get LLDP MIB (opcode 0x0A00) AQ call would still return an old MIB, as the Pending change hasn't been applied yet. Add ice_get_dcb_cfg_from_mib_change() function to retrieve DCBX config from LLDP MIB Change Event's buffer for Pending changes. Co-developed-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-19 08:18:03 -08:00
Tsotne Chakhvadze	a4f68f37e6	ice: Add 'Execute Pending LLDP MIB' Admin Queue command In DCB Willing Mode (FW managed LLDP), when the link partner changes configuration which requires fewer TCs, the TCs that are no longer needed are suspended by EMP FW, removed, and never resumed. This occurs before a MIB change event is indicated to SW. The permanent suspension and removal of these TC nodes in the scheduler prevents RDMA from being able to destroy QPs associated with this TC, requiring a CORE reset to recover. A new DCBX configuration change flow is defined to allow SW driver and other SW components (RDMA) to properly adjust to the configuration changes before they are taking effect in HW. This flow includes a two-way handshake between EMP FW<->LAN SW<->RDMA SW. List of changes: - Add 'Execute Pending LLDP MIB' AQC. - Add 'Pending Event Enable' bit. - Add additional logic to ignore Pending Event Enable' request while 'LLDP MIB Chnage' event is disabled. - Add 'Execute Pending LLDP MIB' AQC sending function to FW, which is needed to take place MIB Event change. Signed-off-by: Tsotne Chakhvadze <tsotne.chakhvadze@intel.com> Co-developed-by: Karen Sornek <karen.sornek@intel.com> Signed-off-by: Karen Sornek <karen.sornek@intel.com> Co-developed-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-19 08:18:03 -08:00
Jiasheng Jiang	40543b3d9d	ice: Add check for kzalloc Add the check for the return value of kzalloc in order to avoid NULL pointer dereference. Moreover, use the goto-label to share the clean code. Fixes: `d6b98c8d24` ("ice: add write functionality for GNSS TTY") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-09 14:02:39 -08:00
Yuan Can	f58985620f	ice: Fix potential memory leak in ice_gnss_tty_write() The ice_gnss_tty_write() return directly if the write_buf alloc failed, leaking the cmd_buf. Fix by free cmd_buf if write_buf alloc failed. Fixes: `d6b98c8d24` ("ice: add write functionality for GNSS TTY") Signed-off-by: Yuan Can <yuancan@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-01-09 14:02:39 -08:00
Maciej Fijalkowski	53fc61be27	ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf Previously ice XDP xmit routine was changed in a way that it avoids xdp_buff->xdp_frame conversion as it is simply not needed for handling XDP_TX action and what is more it saves us CPU cycles. This routine is re-used on ZC driver to handle XDP_TX action. Although for XDP_TX on Rx ZC xdp_buff that comes from xsk_buff_pool is converted to xdp_frame, xdp_frame itself is not stored inside ice_tx_buf, we only store raw data pointer. Casting this pointer to xdp_frame and calling against it xdp_return_frame in ice_clean_xdp_tx_buf() results in undefined behavior. To fix this, simply call page_frag_free() on tx_buf->raw_buf. Later intention is to remove the buff->frame conversion in order to simplify the codebase and improve XDP_TX performance on ZC. Fixes: `126cdfe100` ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Reported-and-tested-by: Robin Cowley <robin.cowley@thehutgroup.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@.intel.com> Link: https://lore.kernel.org/r/20221220175448.693999-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-21 17:46:49 -08:00
Jacob Keller	95af1f1c4c	ice: reschedule ice_ptp_wait_for_offset_valid during reset If the ice_ptp_wait_for_offest_valid function is scheduled to run while the driver is resetting, it will exit without completing calibration. The work function gets scheduled by ice_ptp_port_phy_restart which will be called as part of the reset recovery process. It is possible for the first execution to occur before the driver has completely cleared its resetting flags. Ensure calibration completes by rescheduling the task until reset is fully completed. Reported-by: Siddaraju DH <siddaraju.dh@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 13:15:03 -08:00
Siddaraju DH	f029a34394	ice: make Tx and Rx vernier offset calibration independent The Tx and Rx calibration and timestamp generation blocks are independent. However, the ice driver waits until both blocks are ready before configuring either block. This can result in delay of configuring one block because we have not yet received a packet in the other block. There is no reason to wait to finish programming Tx just because we haven't received a packet. Similarly there is no reason to wait to program Rx just because we haven't transmitted a packet. Instead of checking both offset status before programming either block, refactor the ice_phy_cfg_tx_offset_e822 and ice_phy_cfg_rx_offset_e822 functions so that they perform their own offset status checks. Additionally, make them also check the offset ready bit to determine if the offset values have already been programmed. Call the individual configure functions directly in ice_ptp_wait_for_offset_valid. The functions will now correctly check status, and program the offsets if ready. Once the offset is programmed, the functions will exit quickly after just checking the offset ready register. Remove the ice_phy_calc_vernier_e822 in ice_ptp_hw.c, as well as the offset valid check functions in ice_ptp.c entirely as they are no longer necessary. With this change, the Tx and Rx blocks will each be enabled as soon as possible without waiting for the other block to complete calibration. This can enable timestamps faster in setups which have a low rate of transmitted or received packets. In particular, it can stop a situation where one port never receives traffic, and thus never finishes calibration of the Tx block, resulting in continuous faults reported by the ptp4l daemon application. Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 13:15:03 -08:00
Jacob Keller	e3ba524866	ice: only check set bits in ice_ptp_flush_tx_tracker The ice_ptp_flush_tx_tracker function is called to clear all outstanding Tx timestamp requests when the port is being brought down. This function iterates over the entire list, but this is unnecessary. We only need to check the bits which are actually set in the ready bitmap. Replace this logic with for_each_set_bit, and follow a similar flow as in ice_ptp_tx_tstamp_cleanup. Note that it is safe to call dev_kfree_skb_any on a NULL pointer as it will perform a no-op so we do not need to verify that the skb is actually NULL. The new implementation also avoids clearing (and thus reading!) the PHY timestamp unless the index is marked as having a valid timestamp in the timestamp status bitmap. This ensures that we properly clear the status registers as appropriate. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 13:15:03 -08:00
Jacob Keller	d40fd60093	ice: handle flushing stale Tx timestamps in ice_ptp_tx_tstamp In the event of a PTP clock time change due to .adjtime or .settime, the ice driver needs to update the cached copy of the PHC time and also discard any outstanding Tx timestamps. This is required because otherwise the wrong copy of the PHC time will be used when extending the Tx timestamp. This could result in reporting incorrect timestamps to the stack. The current approach taken to handle this is to call ice_ptp_flush_tx_tracker, which will discard any timestamps which are not yet complete. This is problematic for two reasons: 1) it could lead to a potential race condition where the wrong timestamp is associated with a future packet. This can occur with the following flow: 1. Thread A gets request to transmit a timestamped packet, and picks an index and transmits the packet 2. Thread B calls ice_ptp_flush_tx_tracker and sees the index in use, marking is as disarded. No timestamp read occurs because the status bit is not set, but the index is released for re-use 3. Thread A gets a new request to transmit another timestamped packet, picks the same (now unused) index and transmits that packet. 4. The PHY transmits the first packet and updates the timestamp slot and generates an interrupt. 5. The ice_ptp_tx_tstamp thread executes and sees the interrupt and a valid timestamp but associates it with the new Tx SKB and not the one that actual timestamp for the packet as expected. This could result in the previous timestamp being assigned to a new packet producing incorrect timestamps and leading to incorrect behavior in PTP applications. This is most likely to occur when the packet rate for Tx timestamp requests is very high. 2) on E822 hardware, we must avoid reading a timestamp index more than once each time its status bit is set and an interrupt is generated by hardware. We do have some extensive checks for the unread flag to ensure that only one of either the ice_ptp_flush_tx_tracker or ice_ptp_tx_tstamp threads read the timestamp. However, even with this we can still have cases where we "flush" a timestamp that was actually completed in hardware. This can lead to cases where we don't read the timestamp index as appropriate. To fix both of these issues, we must avoid calling ice_ptp_flush_tx_tracker outside of the teardown path. Rather than using ice_ptp_flush_tx_tracker, introduce a new state bitmap, the stale bitmap. Start this as cleared when we begin a new timestamp request. When we're about to extend a timestamp and send it up to the stack, first check to see if that stale bit was set. If so, drop the timestamp without sending it to the stack. When we need to update the cached PHC timestamp out of band, just mark all currently outstanding timestamps as stale. This will ensure that once hardware completes the timestamp we'll ignore it correctly and avoid reporting bogus timestamps to userspace. With this change, we fix potential issues caused by calling ice_ptp_flush_tx_tracker during normal operation. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 13:15:03 -08:00
Jacob Keller	c1f3414df2	ice: cleanup allocations in ice_ptp_alloc_tx_tracker The ice_ptp_alloc_tx_tracker function must allocate the timestamp array and the bitmap for tracking the currently in use indexes. A future change is going to add yet another allocation to this function. If these allocations fail we need to ensure that we properly cleanup and ensure that the pointers in the ice_ptp_tx structure are NULL. Simplify this logic by allocating to local variables first. If any allocation fails, then free everything and exit. Only update the ice_ptp_tx structure if all allocations succeed. This ensures that we have no side effects on the Tx structure unless all allocations have succeeded. Thus, no code will see an invalid pointer and we don't need to re-assign NULL on cleanup. This is safe because kernel "free" functions are designed to be NULL safe and perform no action if passed a NULL pointer. Thus its safe to simply always call kfree or bitmap_free even if one of those pointers was NULL. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 13:15:03 -08:00
Jacob Keller	3ad5c10bf2	ice: protect init and calibrating check in ice_ptp_request_ts When requesting a new timestamp, the ice_ptp_request_ts function does not hold the Tx tracker lock while checking init and calibrating. This means that we might issue a new timestamp request just after the Tx timestamp tracker starts being deinitialized. This could lead to incorrect access of the timestamp structures. Correct this by moving the init and calibrating checks under the lock, and updating the flows which modify these fields to use the lock. Note that we do not need to hold the lock while checking for tx->init in ice_ptp_tx_tstamp. This is because the teardown function will use synchronize_irq after clearing the flag to ensure that the threaded interrupt completes. Either a) the tx->init flag will be cleared before the ice_ptp_tx_tstamp function starts, thus it will exit immediately, or b) the threaded interrupt will be executing and the synchronize_irq will wait until the threaded interrupt has completed at which point we know the init field has definitely been set and new interrupts will not execute the Tx timestamp thread function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 13:12:54 -08:00
Jacob Keller	f0ae124019	ice: synchronize the misc IRQ when tearing down Tx tracker Since commit `1229b33973` ("ice: Add low latency Tx timestamp read") the ice driver has used a threaded IRQ for handling Tx timestamps. This change did not add a call to synchronize_irq during ice_ptp_release_tx_tracker. Thus it is possible that an interrupt could occur just as the tracker is being removed. This could lead to a use-after-free of the Tx tracker structure data. Fix this by calling sychronize_irq in ice_ptp_release_tx_tracker after we've cleared the init flag. In addition, make sure that we re-check the init flag at the end of ice_ptp_tx_tstamp before we exit ensuring that we will stop polling for new timestamps once the tracker de-initialization has begun. Refactor the ts_handled variable into "more_timestamps" so that we can simply directly assign this boolean instead of relying on an initialized value of true. This makes the new combined check easier to read. With this change, the ice_ptp_release_tx_tracker function will now wait for the threaded interrupt to complete if it was executing while the init flag was cleared. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Jacob Keller	10e4b4a3a3	ice: check Tx timestamp memory register for ready timestamps The PHY for E822 based hardware has a register which indicates which timestamps are valid in the PHY timestamp memory block. Each bit in the register indicates whether the associated index in the timestamp memory is valid. Hardware sets this bit when the timestamp is captured, and clears the bit when the timestamp is read. Use of this register is important as reading timestamp registers can impact the way that hardware generates timestamp interrupts. This occurs because the PHY has an internal value which is incremented when hardware captures a timestamp and decremented when software reads a timestamp. Reading timestamps which are not marked as valid still decrement the internal value and can result in the Tx timestamp interrupt not triggering in the future. To prevent this, use the timestamp memory value to determine which timestamps are ready to be read. The ice_get_phy_tx_tstamp_ready function reads this value. For E810 devices, this just always returns with all bits set. Skip any timestamp which is not set in this bitmap, avoiding reading extra timestamps on E822 devices. The stale check against a cached timestamp value is no longer necessary for PHYs which support the timestamp ready bitmap properly. E810 devices still need this. Introduce a new verify_cached flag to the ice_ptp_tx structure. Use this to determine if we need to perform the verification against the cached timestamp value. Set this to 1 for the E810 Tx tracker init function. Notice that many of the fields in ice_ptp_tx are simple 1 bit flags. Save some structure space by using bitfields of length 1 for these values. Modify the ICE_PTP_TS_VALID check to simply drop the timestamp immediately so that in an event of getting such an invalid timestamp the driver does not attempt to re-read the timestamp again in a future poll of the register. With these changes, the driver now reads each timestamp register exactly once, and does not attempt any re-reads. This ensures the interrupt tracking logic in the PHY will not get stuck. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Jacob Keller	0dd9286263	ice: handle discarding old Tx requests in ice_ptp_tx_tstamp Currently the driver uses the PTP kthread to process handling and discarding of stale Tx timestamp requests. The function ice_ptp_tx_tstamp_cleanup is used for this. A separate thread creates complications for the driver as we now have both the main Tx timestamp processing IRQ checking timestamps as well as the kthread. Rather than using the kthread to handle this, simply check for stale timestamps within the ice_ptp_tx_tstamp function. This function must already process the timestamps anyways. If a Tx timestamp has been waiting for 2 seconds we simply clear the bit and discard the SKB. This avoids the complication of having separate threads polling, reducing overall CPU work. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Jacob Keller	6b1ff5d392	ice: always call ice_ptp_link_change and make it void The ice_ptp_link_change function is currently only called for E822 based hardware. Future changes are going to extend this function to perform additional tasks on link change. Always call this function, moving the E810 check from the callers down to just before we call the E822-specific function required to restart the PHY. This function also returns an error value, but none of the callers actually check it. In general, the errors it produces are more likely systemic problems such as invalid or corrupt port numbers. No caller checks these, and so no warning is logged. Re-order the flag checks so that ICE_FLAG_PTP is checked first. Drop the unnecessary check for ICE_FLAG_PTP_SUPPORTED, as ICE_FLAG_PTP will not be set except when ICE_FLAG_PTP_SUPPORTED is set. Convert the port checks to WARN_ON_ONCE, in order to generate a kernel stack trace when they are hit. Convert the function to void since no caller actually checks these return values. Co-developed-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Jacob Keller	11722c39c8	ice: fix misuse of "link err" with "link status" The ice_ptp_link_change function has a comment which mentions "link err" when referring to the current link status. We are storing the status of whether link is up or down, which is not an error. It is appears that this use of err accidentally got included due to an overzealous search and replace when removing the ice_status enum and local status variable. Fix the wording to use the correct term. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Karol Kolacinski	407b66c07e	ice: Reset TS memory for all quads In E822 products, the owner PF should reset memory for all quads, not only for the one where assigned lport is. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Milena Olech	0357d5cab8	ice: Remove the E822 vernier "bypass" logic The E822 devices support an extended "vernier" calibration which enables higher precision timestamps by accounting for delays in the PHY, and compensating for them. These delays are measured by hardware as part of its vernier calibration logic. The driver currently starts the PHY in "bypass" mode which skips the compensation. Then it later attempts to switch from bypass to vernier. This unfortunately does not work as expected. Instead of properly compensating for the delays, the hardware continues operating in bypass without the improved precision expected. Because we cannot dynamically switch between bypass and vernier mode, refactor the driver to always operate in vernier mode. This has a slight downside: Tx timestamp and Rx timestamp requests that occur as the very first packet set after link up will not complete properly and may be reported to applications as missing timestamps. This occurs frequently in test environments where traffic is light or targeted specifically at testing PTP. However, in practice most environments will have transmitted or received some data over the network before such initial requests are made. Signed-off-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Sergey Temerkhanov	6b5cbc8c4e	ice: Use more generic names for ice_ptp_tx fields Some supported devices have per-port timestamp memory blocks while others have shared ones within quads. Rename the struct ice_ptp_tx fields to reflect the block entities it works with Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Vincent Mailhol	226bf98055	net: devlink: let the core report the driver name instead of the drivers The driver name is available in device_driver::name. Right now, drivers still have to report this piece of information themselves in their devlink_ops::info_get callback function. In order to factorize code, make devlink_nl_info_fill() add the driver name attribute. Now that the core sets the driver name attribute, drivers are not supposed to call devlink_info_driver_name_put() anymore. Remove devlink_info_driver_name_put() and clean-up all the drivers using this function in their callback. Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Tested-by: Ido Schimmel <idosch@nvidia.com> # mlxsw Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-30 21:49:38 -08:00
Jacob Keller	3af4b40b0f	ice: implement direct read for NVM and Shadow RAM regions Implement the .read handler for the NVM and Shadow RAM regions. This enables user space to read a small chunk of the flash without needing the overhead of creating a full snapshot. Update the documentation for ice to detail which regions have direct read support. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-30 20:54:31 -08:00
Jacob Keller	ed23debec5	ice: use same function to snapshot both NVM and Shadow RAM The ice driver supports a region for both the flat NVM contents as well as the Shadow RAM which is a layer built on top of the flash during device initialization. These regions use an almost identical read function, except that the NVM needs to set the direct flag when reading, while Shadow RAM needs to read without the direct flag set. They each call ice_read_flat_nvm with the only difference being whether to set the direct flash flag. The NVM region read function also was fixed to read the NVM in blocks to avoid a situation where the firmware reclaims the lock due to taking too long. Note that the region snapshot function takes the ops pointer so the function can easily determine which region to read. Make use of this and re-use the NVM snapshot function for both the NVM and Shadow RAM regions. This makes Shadow RAM benefit from the same block approach as the NVM region. It also reduces code in the ice driver. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-30 20:54:30 -08:00
Jakub Kicinski	f2bb566f5c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net tools/lib/bpf/ringbuf.c `927cbb478a` ("libbpf: Handle size overflow for ringbuf mmap") `b486d19a0a` ("libbpf: checkpatch: Fixed code alignments in ringbuf.c") https://lore.kernel.org/all/20221121122707.44d1446a@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 13:04:52 -08:00
Anatolii Gerasymenko	60aeca6dc4	ice: Use ICE_RLAN_BASE_S instead of magic number Commit `72adf2421d` ("ice: Move common functions out of ice_main.c part 2/7") moved an older version of ice_setup_rx_ctx() function with usage of magic number 7. Reimplement the commit `5ab522443b` ("ice: Cleanup magic number") to use ICE_RLAN_BASE_S instead of magic number. Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-11-23 08:59:43 -08:00
Marcin Szycik	c7cb9dfc57	ice: Fix configuring VIRTCHNL_OP_CONFIG_VSI_QUEUES with unbalanced queues Currently the VIRTCHNL_OP_CONFIG_VSI_QUEUES command may fail if there are less RX queues than TX queues requested. To fix it, only configure RXDID if RX queue exists. Fixes: `e753df8fbc` ("ice: Add support Flex RXD") Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-11-23 08:59:43 -08:00
Benjamin Mikailenko	288ecf491b	ice: Accumulate ring statistics over reset Resets may occur with or without user interaction. For example, a TX hang or reconfiguration of parameters will result in a reset. During reset, the VSI is freed, freeing any statistics structures inside as well. This would create an issue for the user where a reset happens in the background, statistics set to zero, and the user checks ring statistics expecting them to be populated. To ensure this doesn't happen, accumulate ring statistics over reset. Define a new ring statistics structure, ice_ring_stats. The new structure lives in the VSI's parent, preserving ring statistics when VSI is freed. 1. Define a new structure vsi_ring_stats in the PF scope 2. Allocate/free stats only during probe, unload, or change in ring size 3. Replace previous ring statistics functionality with new structure Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-11-23 08:59:43 -08:00
Benjamin Mikailenko	2fd5e433cd	ice: Accumulate HW and Netdev statistics over reset Resets happen with or without user interaction. For example, incidents such as TX hang or a reconfiguration of parameters will result in a reset. During reset, hardware and software statistics were set to zero. This created an issue for the user where a reset happens in the background, statistics set to zero, and the user checks statistics expecting them to be populated. To ensure this doesn't happen, keep accumulating stats over reset. 1. Remove function calls which reset hardware and netdev statistics. 2. Do not rollover statistics in ice_stat_update40 during reset. Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-11-23 08:59:43 -08:00
Brett Creeley	1d0e28a9be	ice: Remove and replace ice speed defines with ethtool.h versions The driver is currently using ICE_LINK_SPEED_* defines that mirror what ethtool.h defines, with one exception ICE_LINK_SPEED_UNKNOWN. This issue is fixed by the following changes: 1. replace ICE_LINK_SPEED_UNKNOWN with 0 because SPEED_UNKNOWN in ethtool.h is "-1" and that doesn't match the driver's expected behavior 2. transform ICE_LINK_SPEED_MBPS to SPEED_ using static tables and fls()-1 to convert from BIT() to an index in a table. Suggested-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-11-23 08:59:42 -08:00
Karol Kolacinski	a711a3288c	ice: Check for PTP HW lock more frequently It was observed that PTP HW semaphore can be held for ~50 ms in worst case. SW should wait longer and check more frequently if the HW lock is held. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-11-23 08:59:42 -08:00
Jacob Keller	30f1587409	ice: fix handling of burst Tx timestamps Commit `1229b33973` ("ice: Add low latency Tx timestamp read") refactored PTP timestamping logic to use a threaded IRQ instead of a separate kthread. This implementation introduced ice_misc_intr_thread_fn and redefined the ice_ptp_process_ts function interface to return a value of whether or not the timestamp processing was complete. ice_misc_intr_thread_fn would take the return value from ice_ptp_process_ts and convert it into either IRQ_HANDLED if there were no more timestamps to be processed, or IRQ_WAKE_THREAD if the thread should continue processing. This is not correct, as the kernel does not re-schedule threaded IRQ functions automatically. IRQ_WAKE_THREAD can only be used by the main IRQ function. This results in the ice_ptp_process_ts function (and in turn the ice_ptp_tx_tstamp function) from only being called exactly once per interrupt. If an application sends a burst of Tx timestamps without waiting for a response, the interrupt will trigger for the first timestamp. However, later timestamps may not have arrived yet. This can result in dropped or discarded timestamps. Worse, on E822 hardware this results in the interrupt logic getting stuck such that no future interrupts will be triggered. The result is complete loss of Tx timestamp functionality. Fix this by modifying the ice_misc_intr_thread_fn to perform its own polling of the ice_ptp_process_ts function. We sleep for a few microseconds between attempts to avoid wasting significant CPU time. The value was chosen to allow time for the Tx timestamps to complete without wasting so much time that we overrun application wait budgets in the worst case. The ice_ptp_process_ts function also currently returns false in the event that the Tx tracker is not initialized. This would result in the threaded IRQ handler never exiting if it gets started while the tracker is not initialized. Fix the function to appropriately return true when the tracker is not initialized. Note that this will not reproduce with default ptp4l behavior, as the program always synchronously waits for a timestamp response before sending another timestamp request. Reported-by: Siddaraju DH <siddaraju.dh@intel.com> Fixes: `1229b33973` ("ice: Add low latency Tx timestamp read") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20221118222729.1565317-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-21 20:50:35 -08:00
Michal Wilczynski	80fe30a8c1	ice: Prevent ADQ, DCB coexistence with Custom Tx scheduler ADQ, DCB might interfere with Custom Tx Scheduler changes that user might introduce using devlink-rate API. Check if ADQ, DCB is active, when user tries to change any setting in exported Tx scheduler tree. If any of those are active block the user from doing so, and log an appropriate message. Remove the exported hierarchy if user enable ADQ or DCB. Prevent ADQ or DCB from getting configured if user already made some changes using devlink-rate API. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-17 21:41:28 -08:00
Michal Wilczynski	42c2eb6b1f	ice: Implement devlink-rate API There is a need to support modification of Tx scheduler tree, in the ice driver. This will allow user to control Tx settings of each node in the internal hierarchy of nodes. As a result user will be able to use Hierarchy QoS implemented entirely in the hardware. This patch implemenents devlink-rate API. It also exports initial default hierarchy. It's mostly dictated by the fact that the tree can't be removed entirely, all we can do is enable the user to modify it. For example root node shouldn't ever be removed, also nodes that have children are off-limits. Example initial tree with 2 VF's: [root@fedora ~]# devlink port function rate show pci/0000:4b:00.0/node_27: type node parent node_26 pci/0000:4b:00.0/node_26: type node parent node_0 pci/0000:4b:00.0/node_34: type node parent node_33 pci/0000:4b:00.0/node_33: type node parent node_32 pci/0000:4b:00.0/node_32: type node parent node_16 pci/0000:4b:00.0/node_19: type node parent node_18 pci/0000:4b:00.0/node_18: type node parent node_17 pci/0000:4b:00.0/node_17: type node parent node_16 pci/0000:4b:00.0/node_21: type node parent node_20 pci/0000:4b:00.0/node_20: type node parent node_3 pci/0000:4b:00.0/node_14: type node parent node_5 pci/0000:4b:00.0/node_5: type node parent node_3 pci/0000:4b:00.0/node_13: type node parent node_4 pci/0000:4b:00.0/node_12: type node parent node_4 pci/0000:4b:00.0/node_11: type node parent node_4 pci/0000:4b:00.0/node_10: type node parent node_4 pci/0000:4b:00.0/node_9: type node parent node_4 pci/0000:4b:00.0/node_8: type node parent node_4 pci/0000:4b:00.0/node_7: type node parent node_4 pci/0000:4b:00.0/node_6: type node parent node_4 pci/0000:4b:00.0/node_4: type node parent node_3 pci/0000:4b:00.0/node_3: type node parent node_16 pci/0000:4b:00.0/node_16: type node parent node_15 pci/0000:4b:00.0/node_15: type node parent node_0 pci/0000:4b:00.0/node_2: type node parent node_1 pci/0000:4b:00.0/node_1: type node parent node_0 pci/0000:4b:00.0/node_0: type node pci/0000:4b:00.0/1: type leaf parent node_27 pci/0000:4b:00.0/2: type leaf parent node_27 Let me visualize part of the tree: +---------+ \| node_0 \| +---------+ \| +----v----+ \| node_26 \| +----+----+ \| +----v----+ \| node_27 \| +----+----+ \| \|-----------------\| +----v----+ +----v----+ \| VF 1 \| \| VF 2 \| +----+----+ +----+----+ So at this point there is a couple things that can be done. For example we could only assign parameters to VF's. [root@fedora ~]# devlink port function rate set pci/0000:4b:00.0/1 \ tx_max 5Gbps This would cap the VF 1 BW to 5Gbps. But let's say you would like to create a completely new branch. This can be done like this: [root@fedora ~]# devlink port function rate add \ pci/0000:4b:00.0/node_custom parent node_0 [root@fedora ~]# devlink port function rate add \ pci/0000:4b:00.0/node_custom_1 parent node_custom [root@fedora ~]# devlink port function rate set \ pci/0000:4b:00.0/1 parent node_custom_1 This creates a completely new branch and reassigns VF 1 to it. A number of parameters is supported per each node: tx_max, tx_share, tx_priority and tx_weight. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-17 21:41:28 -08:00
Michal Wilczynski	bdf96d965a	ice: Add an option to pre-allocate memory for ice_sched_node devlink-rate API requires a priv object to be allocated when node still doesn't have a parent. This is problematic, because ice_sched_node can't be currently created without a parent. Add an option to pre-allocate memory for ice_sched_node struct. Add new arguments to ice_sched_add() and ice_sched_add_elems() that allow for pre-allocation of memory for ice_sched_node struct. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-17 21:41:28 -08:00
Michal Wilczynski	16dfa49406	ice: Introduce new parameters in ice_sched_node To support new devlink-rate API ice_sched_node struct needs to store a number of additional parameters. This includes tx_max, tx_share, tx_weight, and tx_priority. Add new fields to ice_sched_node struct. Add new functions to configure the hardware with new parameters. Introduce new xarray to identify nodes uniquely. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-17 21:41:27 -08:00
Md Fahad Iqbal Polash	e384cf35bf	ice: virtchnl rss hena support Add support for 2 virtchnl msgs: VIRTCHNL_OP_SET_RSS_HENA VIRTCHNL_OP_GET_RSS_HENA_CAPS The first one allows VFs to clear all previously programmed RSS configuration and customize it. The second one returns the RSS HENA bits allowed by the hardware. Introduce ice_err_to_virt_err which converts kernel specific errors to virtchnl errors. Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 10:44:38 +00:00
Jakub Kicinski	966a9b4903	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/can/pch_can.c `ae64438be1` ("can: dev: fix skb drop check") `1dd1b521be` ("can: remove obsolete PCH CAN driver") https://lore.kernel.org/all/20221110102509.1f7d63cc@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-10 17:43:53 -08:00
Norbert Zulinski	f23df5220d	ice: Fix spurious interrupt during removal of trusted VF Previously, during removal of trusted VF when VF is down there was number of spurious interrupt equal to number of queues on VF. Add check if VF already has inactive queues. If VF is disabled and has inactive rx queues then do not disable rx queues. Add check in ice_vsi_stop_tx_ring if it's VF's vsi and if VF is disabled. Fixes: `efe4186000` ("ice: Fix memory corruption in VF driver") Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-11-09 13:20:38 -08:00
Jiri Pirko	77df1db80d	net: remove unused ndo_get_devlink_port Remove ndo_get_devlink_port which is no longer used alongside with the implementations in drivers. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-03 20:48:36 -07:00
Jiri Pirko	ac73d4bf2c	net: make drivers to use SET_NETDEV_DEVLINK_PORT to set devlink_port Benefit from the previously implemented tracking of netdev events in devlink code and instead of calling devlink_port_type_eth_set() and devlink_port_type_clear() to set devlink port type and link to related netdev, use SET_NETDEV_DEVLINK_PORT() macro to assign devlink_port pointer to netdevice which is about to be registered. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-03 20:48:34 -07:00
Jacob Keller	1060707e38	ptp: introduce helpers to adjust by scaled parts per million Many drivers implement the .adjfreq or .adjfine PTP op function with the same basic logic: 1. Determine a base frequency value 2. Multiply this by the abs() of the requested adjustment, then divide by the appropriate divisor (1 billion, or 65,536 billion). 3. Add or subtract this difference from the base frequency to calculate a new adjustment. A few drivers need the difference and direction rather than the combined new increment value. I recently converted the Intel drivers to .adjfine and the scaled parts per million (65.536 parts per billion) logic. To avoid overflow with minimal loss of precision, mul_u64_u64_div_u64 was used. The basic logic used by all of these drivers is very similar, and leads to a lot of duplicate code to perform the same task. Rather than keep this duplicate code, introduce diff_by_scaled_ppm and adjust_by_scaled_ppm. These helper functions calculate the difference or adjustment necessary based on the scaled parts per million input. The diff_by_scaled_ppm function returns true if the difference should be subtracted, and false otherwise. Update the Intel drivers to use the new helper functions. Other vendor drivers will be converted to .adjfine and this helper function in the following changes. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-31 11:14:16 +00:00
Lukasz Czapnik	637639cbfe	ice: Add additional CSR registers to ETHTOOL_GREGS In the event of a Tx hang it can be useful to read a variety of hardware registers to capture some state about why the transmit queue got stuck. Extend the ETHTOOL_GREGS dump provided by the ice driver with several CSR registers that provide such relevant information regarding the hardware Tx state. This enables capturing relevant data to enable debugging such a Tx hang. Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Link: https://lore.kernel.org/r/20221027104239.1691549-1-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-28 21:57:29 -07:00
Thomas Gleixner	068c38ad88	net: Remove the obsolte u64_stats_fetch_*_irq() users (drivers). Now that the 32bit UP oddity is gone and 32bit uses always a sequence count, there is no need for the fetch_irq() variants anymore. Convert to the regular interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-28 20:13:54 -07:00
Michal Jaron	e753df8fbc	ice: Add support Flex RXD Add new VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC flag, opcode VIRTCHNL_OP_GET_SUPPORTED_RXDIDS and add member rxdid in struct virtchnl_rxq_info to support AVF Flex RXD extension. Add support to allow VF to query flexible descriptor RXDIDs supported by DDP package and configure Rx queues with selected RXDID for IAVF. Add code to allow VIRTCHNL_OP_GET_SUPPORTED_RXDIDS message to be processed. Add necessary macros for registers. Signed-off-by: Leyi Rong <leyi.rong@intel.com> Signed-off-by: Xu Ting <ting.xu@intel.com> Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20221025161252.1952939-1-jacob.e.keller@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-10-27 13:23:43 +02:00
Amritha Nambiar	143b86f346	ice: Enable RX queue selection using skbedit action This patch uses TC skbedit queue_mapping action to support forwarding packets to a device queue. Such filters with action forward to queue will be the highest priority switch filter in HW. Example: $ tc filter add dev ens4f0 protocol ip ingress flower\ dst_ip 192.168.1.12 ip_proto tcp dst_port 5001\ action skbedit queue_mapping 5 skip_sw The above command adds an ingress filter, incoming packets qualifying the match will be accepted into queue 5. The queue number is in decimal format. Refactored ice_add_tc_flower_adv_fltr() to consolidate code with action FWD_TO_VSI and FWD_TO QUEUE. Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-10-25 10:32:40 +02:00
Jakub Kicinski	d742ea6b8e	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-09-28 (ice) Arkadiusz implements a single pin initialization function, checking feature bits, instead of having separate device functions and updates sub-device IDs for recognizing E810T devices. Martyna adds support for switchdev filters on VLAN priority field. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Add support for VLAN priority filters in switchdev ice: support features on new E810T variants ice: Merge pin initialization of E810 and E810T adapters ==================== Link: https://lore.kernel.org/r/20220928203217.411078-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-29 19:29:03 -07:00
Jakub Kicinski	accc3b4a57	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-29 14:30:51 -07:00
Jakub Kicinski	b48b89f9c1	net: drop the weight argument from netif_napi_add We tell driver developers to always pass NAPI_POLL_WEIGHT as the weight to netif_napi_add(). This may be confusing to newcomers, drop the weight argument, those who really need to tweak the weight can use netif_napi_add_weight(). Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-28 18:57:14 -07:00
Martyna Szapar-Mudlaw	34800178b3	ice: Add support for VLAN priority filters in switchdev Enable support for adding TC rules that filter on the VLAN priority in switchdev mode. VLAN priority are the first 3 bits of 16b switch field vector word which contain also vlan id value within its last 12 bits. When getting vlan priority value from tc match.key it has to be shifted first to proper bits positions (by VLAN_PRIO_SHIFT) and then can be added to the joint 'vlan' field in ice_vlan_hdr in lookup element. The mask of lookup changes accordingly. 0x0FFF - when only vlan id is added in filter 0xE000 - when only vlan priority is added in filter 0xEFFF - when both these values are specified Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-28 11:40:57 -07:00
Arkadiusz Kubalewski	793189a2fc	ice: support features on new E810T variants Add new sub-device ids required for proper initialization of features on E810T devices supported by ice driver. Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-28 11:40:57 -07:00
Arkadiusz Kubalewski	43c4958a3d	ice: Merge pin initialization of E810 and E810T adapters Remove separate function initializing pins for E810T-based adapters and initialize pins based on feature bits. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-28 11:40:57 -07:00
Maciej Fijalkowski	b3056ae2b5	ice: xsk: drop power of 2 ring size restriction for AF_XDP We had multiple customers in the past months that reported commit `296f13ff38` ("ice: xsk: Force rings to be sized to power of 2") makes them unable to use ring size of 8160 in conjunction with AF_XDP. Remove this restriction. Fixes: `296f13ff38` ("ice: xsk: Force rings to be sized to power of 2") CC: Alasdair McWilliam <alasdair.mcwilliam@outlook.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-27 09:01:01 -07:00
Maciej Fijalkowski	29322791bc	ice: xsk: change batched Tx descriptor cleaning AF_XDP Tx descriptor cleaning in ice driver currently works in a "lazy" way - descriptors are not cleaned immediately after send. We rather hold on with cleaning until we see that free space in ring drops below particular threshold. This was supposed to reduce the amount of unnecessary work related to cleaning and instead of keeping the ring empty, ring was rather saturated. In AF_XDP realm cleaning Tx descriptors implies producing them to CQ. This is a way of letting know user space that particular descriptor has been sent, as John points out in [0]. We tried to implement serial descriptor cleaning which would be used in conjunction with batched cleaning but it made code base more convoluted and probably harder to maintain in future. Therefore we step away from batched cleaning in a current form in favor of an approach where we set RS bit on every last descriptor from a batch and clean always at the beginning of ice_xmit_zc(). This means that we give up a bit of Tx performance, but this doesn't hurt l2fwd scenario which is way more meaningful than txonly as this can be treaten as AF_XDP based packet generator. l2fwd is not hurt due to the fact that Tx side is much faster than Rx and Rx is the one that has to catch Tx up. FWIW Tx descriptors are still produced in a batched way. [0]: https://lore.kernel.org/bpf/62b0a20232920_3573208ab@john.notmuch/ Fixes: `126cdfe100` ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-27 08:11:02 -07:00
Jiri Pirko	a286ba7387	ice: reorder PF/representor devlink port register/unregister flows Make sure that netdevice is registered/unregistered while devlink port is registered. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-27 07:54:45 -07:00
Jakub Kicinski	0140a7168f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/freescale/fec.h `7b15515fc1` ("Revert "fec: Restart PPS after link state change"") `40c79ce13b` ("net: fec: add stop mode support for imx8 platform") https://lore.kernel.org/all/20220921105337.62b41047@canb.auug.org.au/ drivers/pinctrl/pinctrl-ocelot.c `c297561bc9` ("pinctrl: ocelot: Fix interrupt controller") `181f604b33` ("pinctrl: ocelot: add ability to be used in a non-mmio configuration") https://lore.kernel.org/all/20220921110032.7cd28114@canb.auug.org.au/ tools/testing/selftests/drivers/net/bonding/Makefile `bbb774d921` ("net: Add tests for bonding and team address list management") `152e8ec776` ("selftests/bonding: add a test for bonding lladdr target") https://lore.kernel.org/all/20220921110437.5b7dbd82@canb.auug.org.au/ drivers/net/can/usb/gs_usb.c `5440428b3d` ("can: gs_usb: gs_can_open(): fix race dev->can.state condition") `45dfa45f52` ("can: gs_usb: add RX and TX hardware timestamp support") https://lore.kernel.org/all/84f45a7d-92b6-4dc5-d7a1-072152fab6ff@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-22 13:02:10 -07:00
Jakub Kicinski	624aea6bed	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-09-20 (ice) Michal re-sets TC configuration when changing number of queues. Mateusz moves the check and call for link-down-on-close to the specific path for downing/closing the interface. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix interface being down after reset with link-down-on-close flag on ice: config netdev tc before setting queues number ==================== Link: https://lore.kernel.org/r/20220920205344.1860934-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-21 18:39:47 -07:00
Larysa Zaremba	114f398d48	ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient The original patch added the static branch to handle the situation, when assigning an XDP TX queue to every CPU is not possible, so they have to be shared. However, in the XDP transmit handler ice_xdp_xmit(), an error was returned in such cases even before static condition was checked, thus making queue sharing still impossible. Fixes: `22bf877e52` ("ice: introduce XDP_TX fallback path") Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/r/20220919134346.25030-1-larysa.zaremba@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-21 17:33:29 -07:00
Karol Kolacinski	1229b33973	ice: Add low latency Tx timestamp read E810 products can support low latency Tx timestamp register read. This requires usage of threaded IRQ instead of kthread to reduce the kthread start latency (spikes up to 20 ms). Add a check for the device capability and use the new method if supported. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220916201728.241510-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 16:14:28 -07:00
Mateusz Palczewski	8ac7132704	ice: Fix interface being down after reset with link-down-on-close flag on When performing a reset on ice driver with link-down-on-close flag on interface would always stay down. Fix this by moving a check of this flag to ice_stop() that is called only when user wants to bring interface down. Fixes: `ab4ab73fc1` ("ice: Add ethtool private flag to make forcing link down optional") Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Petr Oros <poros@redhat.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-20 13:30:51 -07:00
Michal Swiatkowski	122045ca77	ice: config netdev tc before setting queues number After lowering number of tx queues the warning appears: "Number of in use tx queues changed invalidating tc mappings. Priority traffic classification disabled!" Example command to reproduce: ethtool -L enp24s0f0 tx 36 rx 36 Fix this by setting correct tc mapping before setting real number of queues on netdev. Fixes: `0754d65bd4` ("ice: Add infrastructure for mqprio support via ndo_setup_tc") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-20 13:30:51 -07:00
Marcin Szycik	cd63454902	ice: Add L2TPv3 hardware offload support Add support for offloading packets based on L2TPv3 session id in switchdev mode. Example filter: tc filter add dev $PF1 ingress prio 1 protocol ip flower ip_proto l2tp \ l2tpv3_sid 1234 skip_sw action mirred egress redirect dev $VF1_PR Changes in iproute2 are required to be able to specify l2tpv3_sid. ICE COMMS DDP package is required to create a filter as it contains L2TPv3 profiles. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 09:13:38 +02:00
Ding Hui	a509702cac	ice: Fix crash by keep old cfg when update TCs more than queues There are problems if allocated queues less than Traffic Classes. Commit `a632b2a4c9` ("ice: ethtool: Prohibit improper channel config for DCB") already disallow setting less queues than TCs. Another case is if we first set less queues, and later update more TCs config due to LLDP, ice_vsi_cfg_tc() will failed but left dirty num_txq/rxq and tc_cfg in vsi, that will cause invalid pointer access. [ 95.968089] ice 0000:3b:00.1: More TCs defined than queues/rings allocated. [ 95.968092] ice 0000:3b:00.1: Trying to use more Rx queues (8), than were allocated (1)! [ 95.968093] ice 0000:3b:00.1: Failed to config TC for VSI index: 0 [ 95.969621] general protection fault: 0000 [#1] SMP NOPTI [ 95.969705] CPU: 1 PID: 58405 Comm: lldpad Kdump: loaded Tainted: G U W O --------- -t - 4.18.0 #1 [ 95.969867] Hardware name: O.E.M/BC11SPSCB10, BIOS 8.23 12/30/2021 [ 95.969992] RIP: 0010:devm_kmalloc+0xa/0x60 [ 95.970052] Code: 5c ff ff ff 31 c0 5b 5d 41 5c c3 b8 f4 ff ff ff eb f4 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 89 d1 <8b> 97 60 02 00 00 48 8d 7e 18 48 39 f7 72 3f 55 89 ce 53 48 8b 4c [ 95.970344] RSP: 0018:ffffc9003f553888 EFLAGS: 00010206 [ 95.970425] RAX: dead000000000200 RBX: ffffea003c425b00 RCX: 00000000006080c0 [ 95.970536] RDX: 00000000006080c0 RSI: 0000000000000200 RDI: dead000000000200 [ 95.970648] RBP: dead000000000200 R08: 00000000000463c0 R09: ffff888ffa900000 [ 95.970760] R10: 0000000000000000 R11: 0000000000000002 R12: ffff888ff6b40100 [ 95.970870] R13: ffff888ff6a55018 R14: 0000000000000000 R15: ffff888ff6a55460 [ 95.970981] FS: 00007f51b7d24700(0000) GS:ffff88903ee80000(0000) knlGS:0000000000000000 [ 95.971108] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 95.971197] CR2: 00007fac5410d710 CR3: 0000000f2c1de002 CR4: 00000000007606e0 [ 95.971309] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 95.971419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 95.971530] PKRU: 55555554 [ 95.971573] Call Trace: [ 95.971622] ice_setup_rx_ring+0x39/0x110 [ice] [ 95.971695] ice_vsi_setup_rx_rings+0x54/0x90 [ice] [ 95.971774] ice_vsi_open+0x25/0x120 [ice] [ 95.971843] ice_open_internal+0xb8/0x1f0 [ice] [ 95.971919] ice_ena_vsi+0x4f/0xd0 [ice] [ 95.971987] ice_dcb_ena_dis_vsi.constprop.5+0x29/0x90 [ice] [ 95.972082] ice_pf_dcb_cfg+0x29a/0x380 [ice] [ 95.972154] ice_dcbnl_setets+0x174/0x1b0 [ice] [ 95.972220] dcbnl_ieee_set+0x89/0x230 [ 95.972279] ? dcbnl_ieee_del+0x150/0x150 [ 95.972341] dcb_doit+0x124/0x1b0 [ 95.972392] rtnetlink_rcv_msg+0x243/0x2f0 [ 95.972457] ? dcb_doit+0x14d/0x1b0 [ 95.972510] ? __kmalloc_node_track_caller+0x1d3/0x280 [ 95.972591] ? rtnl_calcit.isra.31+0x100/0x100 [ 95.972661] netlink_rcv_skb+0xcf/0xf0 [ 95.972720] netlink_unicast+0x16d/0x220 [ 95.972781] netlink_sendmsg+0x2ba/0x3a0 [ 95.975891] sock_sendmsg+0x4c/0x50 [ 95.979032] ___sys_sendmsg+0x2e4/0x300 [ 95.982147] ? kmem_cache_alloc+0x13e/0x190 [ 95.985242] ? __wake_up_common_lock+0x79/0x90 [ 95.988338] ? __check_object_size+0xac/0x1b0 [ 95.991440] ? _copy_to_user+0x22/0x30 [ 95.994539] ? move_addr_to_user+0xbb/0xd0 [ 95.997619] ? __sys_sendmsg+0x53/0x80 [ 96.000664] __sys_sendmsg+0x53/0x80 [ 96.003747] do_syscall_64+0x5b/0x1d0 [ 96.006862] entry_SYSCALL_64_after_hwframe+0x65/0xca Only update num_txq/rxq when passed check, and restore tc_cfg if setup queue map failed. Fixes: `a632b2a4c9` ("ice: ethtool: Prohibit improper channel config for DCB") Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Reviewed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-08 13:22:11 -07:00
Dave Ertman	23c6191903	ice: Don't double unplug aux on peer initiated reset In the IDC callback that is accessed when the aux drivers request a reset, the function to unplug the aux devices is called. This function is also called in the ice_prepare_for_reset function. This double call is causing a "scheduling while atomic" BUG. [ 662.676430] ice 0000:4c:00.0 rocep76s0: cqp opcode = 0x1 maj_err_code = 0xffff min_err_code = 0x8003 [ 662.676609] ice 0000:4c:00.0 rocep76s0: [Modify QP Cmd Error][op_code=8] status=-29 waiting=1 completion_err=1 maj=0xffff min=0x8003 [ 662.815006] ice 0000:4c:00.0 rocep76s0: ICE OICR event notification: oicr = 0x10000003 [ 662.815014] ice 0000:4c:00.0 rocep76s0: critical PE Error, GLPE_CRITERR=0x00011424 [ 662.815017] ice 0000:4c:00.0 rocep76s0: Requesting a reset [ 662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002 [ 662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002 [ 662.815477] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill 8021q garp mrp stp llc vfat fat rpcrdma intel_rapl_msr intel_rapl_common sunrpc i10nm_edac rdma_ucm nfit ib_srpt libnvdimm ib_isert iscsi_target_mod x86_pkg_temp_thermal intel_powerclamp coretemp target_core_mod snd_hda_intel ib_iser snd_intel_dspcfg libiscsi snd_intel_sdw_acpi scsi_transport_iscsi kvm_intel iTCO_wdt rdma_cm snd_hda_codec kvm iw_cm ipmi_ssif iTCO_vendor_support snd_hda_core irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hwdep snd_seq snd_seq_device rapl snd_pcm snd_timer isst_if_mbox_pci pcspkr isst_if_mmio irdma intel_uncore idxd acpi_ipmi joydev isst_if_common snd mei_me idxd_bus ipmi_si soundcore i2c_i801 mei ipmi_devintf i2c_smbus i2c_ismt ipmi_msghandler acpi_power_meter acpi_pad rv(OE) ib_uverbs ib_cm ib_core xfs libcrc32c ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helpe r ttm [ 662.815546] nvme nvme_core ice drm crc32c_intel i40e t10_pi wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod fuse [ 662.815557] Preemption disabled at: [ 662.815558] [<0000000000000000>] 0x0 [ 662.815563] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S OE 5.17.1 #2 [ 662.815566] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021 [ 662.815568] Call Trace: [ 662.815572] <IRQ> [ 662.815574] dump_stack_lvl+0x33/0x42 [ 662.815581] __schedule_bug.cold.147+0x7d/0x8a [ 662.815588] __schedule+0x798/0x990 [ 662.815595] schedule+0x44/0xc0 [ 662.815597] schedule_preempt_disabled+0x14/0x20 [ 662.815600] __mutex_lock.isra.11+0x46c/0x490 [ 662.815603] ? __ibdev_printk+0x76/0xc0 [ib_core] [ 662.815633] device_del+0x37/0x3d0 [ 662.815639] ice_unplug_aux_dev+0x1a/0x40 [ice] [ 662.815674] ice_schedule_reset+0x3c/0xd0 [ice] [ 662.815693] irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma] [ 662.815712] ? bitmap_find_next_zero_area_off+0x45/0xa0 [ 662.815719] ice_send_event_to_aux+0x54/0x70 [ice] [ 662.815741] ice_misc_intr+0x21d/0x2d0 [ice] [ 662.815756] __handle_irq_event_percpu+0x4c/0x180 [ 662.815762] handle_irq_event_percpu+0xf/0x40 [ 662.815764] handle_irq_event+0x34/0x60 [ 662.815766] handle_edge_irq+0x9a/0x1c0 [ 662.815770] __common_interrupt+0x62/0x100 [ 662.815774] common_interrupt+0xb4/0xd0 [ 662.815779] </IRQ> [ 662.815780] <TASK> [ 662.815780] asm_common_interrupt+0x1e/0x40 [ 662.815785] RIP: 0010:cpuidle_enter_state+0xd6/0x380 [ 662.815789] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [ 662.815791] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202 [ 662.815793] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f [ 662.815795] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7 [ 662.815796] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0 [ 662.815797] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08 [ 662.815798] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000 [ 662.815801] cpuidle_enter+0x29/0x40 [ 662.815803] do_idle+0x261/0x2b0 [ 662.815807] cpu_startup_entry+0x19/0x20 [ 662.815809] start_secondary+0x114/0x150 [ 662.815813] secondary_startup_64_no_verify+0xd5/0xdb [ 662.815818] </TASK> [ 662.815846] bad: scheduling from the idle thread! [ 662.815849] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S W OE 5.17.1 #2 [ 662.815852] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021 [ 662.815853] Call Trace: [ 662.815855] <IRQ> [ 662.815856] dump_stack_lvl+0x33/0x42 [ 662.815860] dequeue_task_idle+0x20/0x30 [ 662.815863] __schedule+0x1c3/0x990 [ 662.815868] schedule+0x44/0xc0 [ 662.815871] schedule_preempt_disabled+0x14/0x20 [ 662.815873] __mutex_lock.isra.11+0x3a8/0x490 [ 662.815876] ? __ibdev_printk+0x76/0xc0 [ib_core] [ 662.815904] device_del+0x37/0x3d0 [ 662.815909] ice_unplug_aux_dev+0x1a/0x40 [ice] [ 662.815937] ice_schedule_reset+0x3c/0xd0 [ice] [ 662.815961] irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma] [ 662.815979] ? bitmap_find_next_zero_area_off+0x45/0xa0 [ 662.815985] ice_send_event_to_aux+0x54/0x70 [ice] [ 662.816011] ice_misc_intr+0x21d/0x2d0 [ice] [ 662.816033] __handle_irq_event_percpu+0x4c/0x180 [ 662.816037] handle_irq_event_percpu+0xf/0x40 [ 662.816039] handle_irq_event+0x34/0x60 [ 662.816042] handle_edge_irq+0x9a/0x1c0 [ 662.816045] __common_interrupt+0x62/0x100 [ 662.816048] common_interrupt+0xb4/0xd0 [ 662.816052] </IRQ> [ 662.816053] <TASK> [ 662.816054] asm_common_interrupt+0x1e/0x40 [ 662.816057] RIP: 0010:cpuidle_enter_state+0xd6/0x380 [ 662.816060] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [ 662.816063] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202 [ 662.816065] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f [ 662.816067] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7 [ 662.816068] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0 [ 662.816070] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08 [ 662.816071] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000 [ 662.816075] cpuidle_enter+0x29/0x40 [ 662.816077] do_idle+0x261/0x2b0 [ 662.816080] cpu_startup_entry+0x19/0x20 [ 662.816083] start_secondary+0x114/0x150 [ 662.816087] secondary_startup_64_no_verify+0xd5/0xdb [ 662.816091] </TASK> [ 662.816169] bad: scheduling from the idle thread! The correct place to unplug the aux devices for a reset is in the prepare_for_reset function, as this is a common place for all reset flows. It also has built in protection from being called twice in a single reset instance before the aux devices are replugged. Fixes: `f9f5301e7e` ("ice: Register auxiliary device to provide RDMA") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Helena Anna Dubel <helena.anna.dubel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-08 11:21:41 -07:00
Paolo Abeni	9f8f1933dc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/freescale/fec.h `7d650df99d` ("net: fec: add pm_qos support on imx6q platform") `40c79ce13b` ("net: fec: add stop mode support for imx8 platform") Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-08 18:38:30 +02:00
Christophe JAILLET	04cbaa6c08	ice: Simplify memory allocation in ice_sched_init_port() 'buf' is locale to the ice_sched_init_port() function. There is no point in using devm_kzalloc()/devm_kfree(). use kzalloc()/kfree() instead. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-06 13:55:47 -07:00
Christophe JAILLET	1b9e740dd7	ice: switch: Simplify memory allocation 'rbuf' is locale to the ice_get_initial_sw_cfg() function. There is no point in using devm_kzalloc()/devm_kfree(). use kzalloc()/kfree() instead. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-06 13:55:47 -07:00
Paul Greenwalt	1bd50f2deb	ice: add helper function to check FW API version Several functions in ice_common.c check the firmware API version to see if the current API version meets some minimum requirement. Improve the readability of these checks by introducing ice_is_fw_api_min_ver, a helper function to perform that check. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-06 13:55:47 -07:00
Michal Michalik	0b57e0d442	ice: Check if reset in progress while waiting for offsets Occasionally while waiting to valid offsets from hardware we get reset. Add check for reset before proceeding to execute scheduled work. Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Michal Michalik <michal.michalik@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-06 13:55:47 -07:00
Tony Nguyen	ce46261311	ice: Allow operation with reduced device MSI-X The driver currently takes an all or nothing approach for device MSI-X vectors. Meaning if it does not get its full allocation, it will fail and not load. There is no reason it can't work with a reduced number of MSI-X vectors. Take a similar approach as commit `741106f7bd` ("ice: Improve MSI-X fallback logic") and, instead, adjust the MSI-X request to make use of what is available. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Petr Oros <poros@redhat.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)	2022-09-06 13:55:22 -07:00
Michal Swiatkowski	59ac325557	ice: use bitmap_free instead of devm_kfree pf->avail_txqs was allocated using bitmap_zalloc, bitmap_free should be used to free this memory. Fixes: `78b5713ac1` ("ice: Alloc queue management bitmaps and arrays dynamically") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-02 08:25:15 -07:00
Przemyslaw Patynowski	7e753eb675	ice: Fix DMA mappings leak Fix leak, when user changes ring parameters. During reallocation of RX buffers, new DMA mappings are created for those buffers. New buffers with different RX ring count should substitute older ones, but those buffers were freed in ice_vsi_cfg_rxq and reallocated again with ice_alloc_rx_buf. kfree on rx_buf caused leak of already mapped DMA. Reallocate ZC with xdp_buf struct, when BPF program loads. Reallocate back to rx_buf, when BPF program unloads. If BPF program is loaded/unloaded and XSK pools are created, reallocate RX queues accordingly in XDP_SETUP_XSK_POOL handler. Steps for reproduction: while : do for ((i=0; i<=8160; i=i+32)) do ethtool -G enp130s0f0 rx $i tx $i sleep 0.5 ethtool -g enp130s0f0 done done Fixes: `617f3e1b58` ("ice: xsk: allocate separate memory for XDP SW ring") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Chandan <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-02 08:25:15 -07:00
David S. Miller	77baa37a9b	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-08-24 (ice) This series contains updates to ice driver only. Marcin adds support for TC parsing on TTL and ToS fields. Anatolli adds support for devlink port split command to allow configuration of various port configurations. Jake allows for passing and writing an additional NVM write activate field by expanding current cmd_flag. Ani makes PHY debug output more readable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-26 11:39:00 +01:00
Jakub Kicinski	880b0dd94f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/mellanox/mlx5/core/en_fs.c `21234e3a84` ("net/mlx5e: Fix use after free in mlx5e_fs_init()") `c7eafc5ed0` ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer") https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/ https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-25 16:07:42 -07:00
Anirudh Venkataramanan	f8c74ca6d3	ice: Print human-friendly PHY types Provide human readable description of PHY capabilities and report_mode. Sample output: Old: [ 286.130405] ice 0000:16:00.0: get phy caps - report_mode = 0x2 [ 286.130409] ice 0000:16:00.0: phy_type_low = 0x108021020502000 [ 286.130412] ice 0000:16:00.0: phy_type_high = 0x0 [ 286.130415] ice 0000:16:00.0: caps = 0xc8 [ 286.130419] ice 0000:16:00.0: low_power_ctrl_an = 0x4 [ 286.130421] ice 0000:16:00.0: eee_cap = 0x0 [ 286.130424] ice 0000:16:00.0: eeer_value = 0x0 [ 286.130427] ice 0000:16:00.0: link_fec_options = 0xdf [ 286.130430] ice 0000:16:00.0: module_compliance_enforcement = 0x0 [ 286.130433] ice 0000:16:00.0: extended_compliance_code = 0xb [ 286.130435] ice 0000:16:00.0: module_type[0] = 0x11 [ 286.130438] ice 0000:16:00.0: module_type[1] = 0x1 [ 286.130441] ice 0000:16:00.0: module_type[2] = 0x0 New: [ 1128.297347] ice 0000:16:00.0: get phy caps dump [ 1128.297351] ice 0000:16:00.0: phy_caps_active: phy_type_low: 0x0108021020502000 [ 1128.297355] ice 0000:16:00.0: phy_caps_active: bit(13): 10G_SFI_DA [ 1128.297359] ice 0000:16:00.0: phy_caps_active: bit(20): 25GBASE_CR [ 1128.297362] ice 0000:16:00.0: phy_caps_active: bit(22): 25GBASE_CR1 [ 1128.297365] ice 0000:16:00.0: phy_caps_active: bit(29): 25G_AUI_C2C [ 1128.297368] ice 0000:16:00.0: phy_caps_active: bit(36): 50GBASE_CR2 [ 1128.297371] ice 0000:16:00.0: phy_caps_active: bit(41): 50G_LAUI2 [ 1128.297374] ice 0000:16:00.0: phy_caps_active: bit(51): 100GBASE_CR4 [ 1128.297377] ice 0000:16:00.0: phy_caps_active: bit(56): 100G_CAUI4 [ 1128.297380] ice 0000:16:00.0: phy_caps_active: phy_type_high: 0x0000000000000000 [ 1128.297383] ice 0000:16:00.0: phy_caps_active: report_mode = 0x4 [ 1128.297386] ice 0000:16:00.0: phy_caps_active: caps = 0xc8 [ 1128.297389] ice 0000:16:00.0: phy_caps_active: low_power_ctrl_an = 0x4 [ 1128.297392] ice 0000:16:00.0: phy_caps_active: eee_cap = 0x0 [ 1128.297394] ice 0000:16:00.0: phy_caps_active: eeer_value = 0x0 [ 1128.297397] ice 0000:16:00.0: phy_caps_active: link_fec_options = 0xdf [ 1128.297400] ice 0000:16:00.0: phy_caps_active: module_compliance_enforcement = 0x0 [ 1128.297402] ice 0000:16:00.0: phy_caps_active: extended_compliance_code = 0xb [ 1128.297405] ice 0000:16:00.0: phy_caps_active: module_type[0] = 0x11 [ 1128.297408] ice 0000:16:00.0: phy_caps_active: module_type[1] = 0x1 [ 1128.297411] ice 0000:16:00.0: phy_caps_active: module_type[2] = 0x0 Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Co-developed-by: Lukasz Plachno <lukasz.plachno@intel.com> Signed-off-by: Lukasz Plachno <lukasz.plachno@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-24 08:45:55 -07:00
Anatolii Gerasymenko	26d1c571e1	ice: Implement devlink port split operations Allow to configure port split options using the devlink port split interface. Support port splitting only for port 0, as the FW has a predefined set of available port split options for the whole device. Add ice_devlink_port_options_print() function to print the table with all available FW port split options. It will be printed after each port split and unsplit command. Add documentation for devlink port split interface usage for the ice driver. Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-24 08:45:55 -07:00
Jacob Keller	da02ee9c22	ice: Add additional flags to ice_nvm_write_activate The ice_nvm_write_activate function is used to issue AdminQ command 0x0707 which sends a request to firmware to activate a flash bank. For basic operations, this command takes an 8bit flag value which defines the flags to control the activation process. There are some additional flags that are stored in a second 8bit flag field. We can simplify the interface by using a u16 cmd_flags variable. Split this over the two bytes of flag storage in the structure. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-24 08:45:54 -07:00
Anatolii Gerasymenko	781f15eac0	ice: Add port option admin queue commands Implement support for Get/Set Port Options admin queue commands (0x06EA/0x06EB). These firmware commands allow the driver to change port specific options and will be used in the next patch. Co-developed-by: Lev Faerman <lev.faerman@intel.com> Signed-off-by: Lev Faerman <lev.faerman@intel.com> Co-developed-by: Damian Milosek <damian.milosek@intel.com> Signed-off-by: Damian Milosek <damian.milosek@intel.com> Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-24 08:45:53 -07:00
Marcin Szycik	4c99bc96e0	ice: Add support for ip TTL & ToS offload Add support for parsing TTL and ToS (Hop Limit and Traffic Class) tc fields and matching on those fields in filters. Incomplete part of implementation was already in place (getting enc_ip and enc_tos from flow_match_ip and writing them to filter header). Note: matching on ipv6 ip_ttl, enc_ttl and enc_tos is currently not supported by the DDP package. Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-24 08:45:53 -07:00
Jakub Kicinski	0134fe8512	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-08-18 (ice) This series contains updates to ice driver only. Jesse and Anatolii add support for controlling FCS/CRC stripping via ethtool. Anirudh allows for 100M speeds on devices which support it. Sylwester removes ucast_shared field and the associated dead code related to it. Mikael removes non-inclusive language from the driver. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: remove non-inclusive language ice: Remove ucast_shared ice: Allow 100M speeds for some devices ice: Implement FCS/CRC and VLAN stripping co-existence policy ice: Implement control of FCS/CRC stripping ==================== Link: https://lore.kernel.org/r/20220818155207.996297-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-22 20:10:50 -07:00
Maciej Fijalkowski	9ead7e74bf	ice: xsk: use Rx ring's XDP ring when picking NAPI context Ice driver allocates per cpu XDP queues so that redirect path can safely use smp_processor_id() as an index to the array. At the same time though, XDP rings are used to pick NAPI context to call napi_schedule() or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and num_possible_cpus() of underlying platform is 44, then this means queue vectors with correlated NAPI contexts will carry several XDP queues. This in turn can result in a broken behavior where NAPI context of interest will never be scheduled and AF_XDP socket will not process any traffic. To fix this, let us change the way how XDP rings are assigned to Rx rings and use this information later on when setting ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated queue vector and walk through Tx ring's linked list. Once we stumble upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring. Previous [0] approach of fixing this issue was for txonly scenario because of the described grouping of XDP rings across queue vectors. So, relying on Rx ring meant that NAPI context could be scheduled with a queue vector without XDP ring with associated XSK pool. [0]: https://lore.kernel.org/netdev/20220707161128.54215-1-maciej.fijalkowski@intel.com/ Fixes: `2d4238f556` ("ice: Add support for AF_XDP") Fixes: `22bf877e52` ("ice: introduce XDP_TX fallback path") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-22 08:47:26 -07:00
Maciej Fijalkowski	5a42f112d3	ice: xsk: prohibit usage of non-balanced queue id Fix the following scenario: 1. ethtool -L $IFACE rx 8 tx 96 2. xdpsock -q 10 -t -z Above refers to a case where user would like to attach XSK socket in txonly mode at a queue id that does not have a corresponding Rx queue. At this moment ice's XSK logic is tightly bound to act on a "queue pair", e.g. both Tx and Rx queues at a given queue id are disabled/enabled and both of them will get XSK pool assigned, which is broken for the presented queue configuration. This results in the splat included at the bottom, which is basically an OOB access to Rx ring array. To fix this, allow using the ids only in scope of "combined" queues reported by ethtool. However, logic should be rewritten to allow such configurations later on, which would end up as a complete rewrite of the control path, so let us go with this temporary fix. [420160.558008] BUG: kernel NULL pointer dereference, address: 0000000000000082 [420160.566359] #PF: supervisor read access in kernel mode [420160.572657] #PF: error_code(0x0000) - not-present page [420160.579002] PGD 0 P4D 0 [420160.582756] Oops: 0000 [#1] PREEMPT SMP NOPTI [420160.588396] CPU: 10 PID: 21232 Comm: xdpsock Tainted: G OE 5.19.0-rc7+ #10 [420160.597893] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [420160.609894] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice] [420160.616968] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85 [420160.639421] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282 [420160.646650] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8 [420160.655893] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000 [420160.665166] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000 [420160.674493] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a [420160.683833] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828 [420160.693211] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000 [420160.703645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [420160.711783] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0 [420160.721399] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [420160.731045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [420160.740707] PKRU: 55555554 [420160.745960] Call Trace: [420160.750962] <TASK> [420160.755597] ? kmalloc_large_node+0x79/0x90 [420160.762703] ? __kmalloc_node+0x3f5/0x4b0 [420160.769341] xp_assign_dev+0xfd/0x210 [420160.775661] ? shmem_file_read_iter+0x29a/0x420 [420160.782896] xsk_bind+0x152/0x490 [420160.788943] __sys_bind+0xd0/0x100 [420160.795097] ? exit_to_user_mode_prepare+0x20/0x120 [420160.802801] __x64_sys_bind+0x16/0x20 [420160.809298] do_syscall_64+0x38/0x90 [420160.815741] entry_SYSCALL_64_after_hwframe+0x63/0xcd [420160.823731] RIP: 0033:0x7fa86a0dd2fb [420160.830264] Code: c3 66 0f 1f 44 00 00 48 8b 15 69 8b 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 8b 0c 00 f7 d8 64 89 01 48 [420160.855410] RSP: 002b:00007ffc1146f618 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 [420160.866366] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa86a0dd2fb [420160.876957] RDX: 0000000000000010 RSI: 00007ffc1146f680 RDI: 0000000000000003 [420160.887604] RBP: 000055d7113a0520 R08: 00007fa868fb8000 R09: 0000000080000000 [420160.898293] R10: 0000000000008001 R11: 0000000000000246 R12: 000055d7113a04e0 [420160.909038] R13: 000055d7113a0320 R14: 000000000000000a R15: 0000000000000000 [420160.919817] </TASK> [420160.925659] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mei_me coretemp ioatdma mei ipmi_si wmi ipmi_msghandler acpi_pad acpi_power_meter ip_tables x_tables autofs4 ixgbe i40e crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ahci mdio dca libahci lpc_ich [last unloaded: ice] [420160.977576] CR2: 0000000000000082 [420160.985037] ---[ end trace 0000000000000000 ]--- [420161.097724] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice] [420161.107341] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85 [420161.134741] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282 [420161.144274] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8 [420161.155690] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000 [420161.168088] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000 [420161.179295] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a [420161.190420] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828 [420161.201505] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000 [420161.213628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [420161.223413] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0 [420161.234653] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [420161.245893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [420161.257052] PKRU: 55555554 Fixes: `2d4238f556` ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-22 08:47:26 -07:00
Jakub Kicinski	268603d79c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-18 21:17:10 -07:00
Mikael Barsehyan	5c603001d7	ice: remove non-inclusive language Remove non-inclusive language from the driver where possible; replace "master" with "primary"; replace "slave" with "secondary". Signed-off-by: Mikael Barsehyan <mikael.barsehyan@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-18 08:26:11 -07:00
Sylwester Dziedziuch	e1e9db57c0	ice: Remove ucast_shared Remove ucast_shared as it was always true. Remove the code depending on ucast_shared from ice_add_mac and ice_remove_mac. Remove ice_find_ucast_rule_entry function as it was only used when ucast_shared was set to false. Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-18 08:23:44 -07:00
Anirudh Venkataramanan	39ed02a4ba	ice: Allow 100M speeds for some devices For certain devices, 100M speeds are supported. Do not mask off 100M speed for these devices. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Co-developed-by: Chinh T Cao <chinh.t.cao@intel.com> Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com> Signed-off-by: Mikael Barsehyan <mikael.barsehyan@intel.com> Tested-by: Kavya AV <kavyax.av@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-18 08:23:44 -07:00
Anatolii Gerasymenko	affa1029d6	ice: Implement FCS/CRC and VLAN stripping co-existence policy Make sure that only the valid combinations of FCS/CRC stripping and VLAN stripping offloads are allowed. You cannot have FCS/CRC stripping disabled while VLAN stripping is enabled - this breaks the correctness of the FCS/CRC. If administrator tries to enable VLAN stripping when FCS/CRC stripping is disabled, the request should be rejected. If administrator tries to disable FCS/CRC stripping when VLAN stripping is enabled, the request should be rejected if VLANs are configured. If there is no VLAN configured, then both FCS/CRC and VLAN stripping should be disabled. Testing Hints: The default settings after driver load are: - VLAN C-Tag offloads are enabled - VLAN S-Tag offloads are disabled - FCS/CRC stripping is enabled Restore the default settings before each test with the command: ethtool -K eth0 rx-fcs off rxvlan on txvlan on rx-vlan-stag-hw-parse off tx-vlan-stag-hw-insert off Test 1: Disable FCS/CRC and VLAN stripping: ethtool -K eth0 rx-fcs on rxvlan off Try to enable VLAN stripping: ethtool -K eth0 rxvlan on Expected: VLAN stripping request is rejected Test 2: Try to disable FCS/CRC stripping: ethtool -K eth0 rx-fcs on Expected: VLAN stripping is also disabled, as there are no VLAN configured Test 3: Add a VLAN: ip link add link eth0 eth0.42 type vlan id 42 ip link set eth0 up Try to disable FCS/CRC stripping: ethtool -K eth0 rx-fcs on Expected: FCS/CRC stripping request is rejected Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-18 08:08:23 -07:00
Jesse Brandeburg	dddd406d9d	ice: Implement control of FCS/CRC stripping The driver can allow the user to configure whether the CRC aka the FCS (Frame Check Sequence) is DMA'd to the host as part of the receive buffer. The driver usually wants this feature disabled so that the hardware checks the FCS and strips it in order to save PCI bandwidth. Control the reception of FCS to the host using the command: ethtool -K eth0 rx-fcs <on\|off> The default shown in ethtool -k eth0 \| grep fcs; should be "off", as the hardware will drop any frame with a bad checksum, and DMA of the checksum is useless overhead especially for small packets. Testing Hints: test the FCS/CRC arrives with received packets using tcpdump -nnpi eth0 -xxxx and it should show crc data as the last 4 bytes of the packet. Can also use wireshark to turn on CRC checking and check the data is correct. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Co-developed-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-18 08:08:23 -07:00
Sylwester Dziedziuch	664d464618	ice: Fix VF not able to send tagged traffic with no VLAN filters VF was not able to send tagged traffic when it didn't have any VLAN interfaces and VLAN anti-spoofing was enabled. Fix this by allowing VFs with no VLAN filters to send tagged traffic. After VF adds a VLAN interface it will be able to send tagged traffic matching VLAN filters only. Testing hints: 1. Spawn VF 2. Send tagged packet from a VF 3. The packet should be sent out and not dropped 4. Add a VLAN interface on VF 5. Send tagged packet on that VLAN interface 6. Packet should be sent out and not dropped 7. Send tagged packet with id different than VLAN interface 8. Packet should be dropped Fixes: `daf4dd1643` ("ice: Refactor spoofcheck configuration functions") Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-17 09:30:44 -07:00
Benjamin Mikailenko	79956b83ed	ice: Ignore error message when setting same promiscuous mode Commit `1273f89578` ("ice: Fix broken IFF_ALLMULTI handling") introduced new checks when setting/clearing promiscuous mode. But if the requested promiscuous mode setting already exists, an -EEXIST error message would be printed. This is incorrect because promiscuous mode is either on/off and shouldn't print an error when the requested configuration is already set. This can happen when removing a bridge with two bonded interfaces and promiscuous most isn't fully cleared from VLAN VSI in hardware. Fix this by ignoring cases where requested promiscuous mode exists. Fixes: `1273f89578` ("ice: Fix broken IFF_ALLMULTI handling") Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/ Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-17 09:30:44 -07:00
Grzegorz Siwik	abddafd458	ice: Fix clearing of promisc mode with bridge over bond When at least two interfaces are bonded and a bridge is enabled on the bond, an error can occur when the bridge is removed and re-added. The reason for the error is because promiscuous mode was not fully cleared from the VLAN VSI in the hardware. With this change, promiscuous mode is properly removed when the bridge disconnects from bonding. [ 1033.676359] bond1: link status definitely down for interface enp95s0f0, disabling it [ 1033.676366] bond1: making interface enp175s0f0 the new active one [ 1033.676369] device enp95s0f0 left promiscuous mode [ 1033.676522] device enp175s0f0 entered promiscuous mode [ 1033.676901] ice 0000:af:00.0 enp175s0f0: Error setting Multicast promiscuous mode on VSI 6 [ 1041.795662] ice 0000:af:00.0 enp175s0f0: Error setting Multicast promiscuous mode on VSI 6 [ 1041.944826] bond1: link status definitely down for interface enp175s0f0, disabling it [ 1041.944874] device enp175s0f0 left promiscuous mode [ 1041.944918] bond1: now running without any active interface! Fixes: `c31af68a1b` ("ice: Add outer_vlan_ops and VSI specific VLAN ops implementations") Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/ Tested-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Tested-by: Igor Raits <igor@gooddata.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-17 09:30:32 -07:00
Grzegorz Siwik	11e551a2ef	ice: Ignore EEXIST when setting promisc mode Ignore EEXIST error when setting promiscuous mode. This fix is needed because the driver could set promiscuous mode when it still has not cleared properly. Promiscuous mode could be set only once, so setting it second time will be rejected. Fixes: `5eda8afd6b` ("ice: Add support for PF/VF promiscuous mode") Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/ Tested-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Tested-by: Igor Raits <igor@gooddata.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-17 09:30:23 -07:00
Grzegorz Siwik	ffa9ed8652	ice: Fix double VLAN error when entering promisc mode Avoid enabling or disabling VLAN 0 when trying to set promiscuous VLAN mode if double VLAN mode is enabled. This fix is needed because the driver tries to add the VLAN 0 filter twice (once for inner and once for outer) when double VLAN mode is enabled. The filter program is rejected by the firmware when double VLAN is enabled, because the promiscuous filter only needs to be set once. This issue was missed in the initial implementation of double VLAN mode. Fixes: `5eda8afd6b` ("ice: Add support for PF/VF promiscuous mode") Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/ Tested-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Tested-by: Igor Raits <igor@gooddata.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-17 09:29:50 -07:00
Jacob Keller	b1a582e64b	ice: introduce ice_ptp_reset_cached_phctime function If the PTP hardware clock is adjusted, the ice driver must update the cached PHC timestamp. This is required in order to perform timestamp extension on the shorter timestamps captured by the PHY. Currently, we simply call ice_ptp_update_cached_phctime in the settime and adjtime callbacks. This has a few issues: 1) if ICE_CFG_BUSY is set because another thread is updating the Rx rings, we will exit with an error. This is not checked, and the functions do not re-schedule the update. This could leave the cached timestamp incorrect until the next scheduled work item execution. 2) even if we did handle an update, any currently outstanding Tx timestamp would be extended using the wrong cached PHC time. This would produce incorrect results. To fix these issues, introduce a new ice_ptp_reset_cached_phctime function. This function calls the ice_ptp_update_cached_phctime, and discards outstanding Tx timestamps. If the ice_ptp_update_cached_phctime function fails because ICE_CFG_BUSY is set, we log a warning and schedule the thread to execute soon. The update function is modified so that it always updates the cached copy in the PF regardless. This ensures we have the most up to date values possible and minimizes the risk of a packet timestamp being extended with the wrong value. It would be nice if we could skip reporting Rx timestamps until the cached values are up to date. However, we can't access the Rx rings while ICE_CFG_BUSY is set because they are actively being updated by another thread. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-16 09:04:38 -07:00
Jacob Keller	4b1251bdd1	ice: re-arrange some static functions in ice_ptp.c A following change is going to want to make use of ice_ptp_flush_tx_tracker earlier in the ice_ptp.c file. To make this work, move the Tx timestamp tracking functions higher up in the file, and pull the ice_ptp_update_cached_timestamp function below them. This should have no functional change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-16 09:04:38 -07:00
Jacob Keller	cd25507a31	ice: track and warn when PHC update is late The ice driver requires a cached copy of the PHC time in order to perform timestamp extension on Tx and Rx hardware timestamp values. This cached PHC time must always be updated at least once every 2 seconds. Otherwise, the math used to perform the extension would produce invalid results. The updates are supposed to occur periodically in the PTP kthread work item, which is scheduled to run every half second. Thus, we do not expect an update to be delayed for so long. However, there are error conditions which can cause the update to be delayed. Track this situation by using jiffies to determine approximately how long ago the last update occurred. Add a new statistic and a dev_warn when we have failed to update the cached PHC time. This makes the error case more obvious. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-16 09:04:38 -07:00
Jacob Keller	f020481be5	ice: track Tx timestamp stats similar to other Intel drivers Several Intel networking drivers which support PTP track when Tx timestamps are skipped or when they timeout without a timestamp from hardware. The conditions which could cause these events are rare, but it can be useful to know when and how often they occur. Implement similar statistics for the ice driver, tx_hwtstamp_skipped, tx_hwtstamp_timeouts, and tx_hwtstamp_flushed. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-16 09:04:38 -07:00
Jacob Keller	cf6b82fd3f	ice: initialize cached_phctime when creating Rx rings When we create new Rx rings, the cached_phctime field is zero initialized. This could result in incorrect timestamp reporting due to the cached value not yet being updated. Although a background task will periodically update the cached value, ensure it matches the existing cached value in the PF structure at ring initialization. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-16 09:04:37 -07:00
Jacob Keller	b3b173745c	ice: set tx_tstamps when creating new Tx rings via ethtool When the user changes the number of queues via ethtool, the driver allocates new rings. This allocation did not initialize tx_tstamps. This results in the tx_tstamps field being zero (due to kcalloc allocation), and would result in a NULL pointer dereference when attempting a transmit timestamp on the new ring. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-16 09:04:37 -07:00
David S. Miller	27b8d4d7a0	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net -queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-08-11 (ice) This series contains updates to ice driver only. Benjamin corrects a misplaced parenthesis for a WARN_ON check. Michal removes WARN_ON from a check as its recoverable and not warranting of a call trace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-15 11:30:10 +01:00
Michal Jaron	cf90b74341	ice: Fix call trace with null VSI during VF reset During stress test with attaching and detaching VF from KVM and simultaneously changing VFs spoofcheck and trust there was a call trace in ice_reset_vf that VF's VSI is null. [145237.352797] WARNING: CPU: 46 PID: 840629 at drivers/net/ethernet/intel/ice/ice_vf_lib.c:508 ice_reset_vf+0x3d6/0x410 [ice] [145237.352851] Modules linked in: ice(E) vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio iavf dm_mod xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc sunrpc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iTCO_wdt iTC O_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl ipmi_si intel_cstate ipmi_devintf joydev intel_uncore m ei_me ipmi_msghandler i2c_i801 pcspkr mei lpc_ich ioatdma i2c_smbus acpi_pad acpi_power_meter ip_tables xfs libcrc32c i2c_algo_bit drm_sh mem_helper drm_kms_helper sd_mod t10_pi crc64_rocksoft syscopyarea crc64 sysfillrect sg sysimgblt fb_sys_fops drm i40e ixgbe ahci libahci libata crc32c_intel mdio dca wmi fuse [last unloaded: ice] [145237.352917] CPU: 46 PID: 840629 Comm: kworker/46:2 Tainted: G S W I E 5.19.0-rc6+ #24 [145237.352921] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [145237.352923] Workqueue: ice ice_service_task [ice] [145237.352948] RIP: 0010:ice_reset_vf+0x3d6/0x410 [ice] [145237.352984] Code: 30 ec f3 cc e9 28 fd ff ff 0f b7 4b 50 48 c7 c2 48 19 9c c0 4c 89 ee 48 c7 c7 30 fe 9e c0 e8 d1 21 9d cc 31 c0 e9 a 9 fe ff ff <0f> 0b b8 ea ff ff ff e9 c1 fc ff ff 0f 0b b8 fb ff ff ff e9 91 fe [145237.352987] RSP: 0018:ffffb453e257fdb8 EFLAGS: 00010246 [145237.352990] RAX: ffff8bd0040181c0 RBX: ffff8be68db8f800 RCX: 0000000000000000 [145237.352991] RDX: 000000000000ffff RSI: 0000000000000000 RDI: ffff8be68db8f800 [145237.352993] RBP: ffff8bd0040181c0 R08: 0000000000001000 R09: ffff8bcfd520e000 [145237.352995] R10: 0000000000000000 R11: 00008417b5ab0bc0 R12: 0000000000000005 [145237.352996] R13: ffff8bcee061c0d0 R14: ffff8bd004019640 R15: 0000000000000000 [145237.352998] FS: 0000000000000000(0000) GS:ffff8be5dfb00000(0000) knlGS:0000000000000000 [145237.353000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [145237.353002] CR2: 00007fd81f651d68 CR3: 0000001a0fe10001 CR4: 00000000001726e0 [145237.353003] Call Trace: [145237.353008] <TASK> [145237.353011] ice_process_vflr_event+0x8d/0xb0 [ice] [145237.353049] ice_service_task+0x79f/0xef0 [ice] [145237.353074] process_one_work+0x1c8/0x390 [145237.353081] ? process_one_work+0x390/0x390 [145237.353084] worker_thread+0x30/0x360 [145237.353087] ? process_one_work+0x390/0x390 [145237.353090] kthread+0xe8/0x110 [145237.353094] ? kthread_complete_and_exit+0x20/0x20 [145237.353097] ret_from_fork+0x22/0x30 [145237.353103] </TASK> Remove WARN_ON() from check if VSI is null in ice_reset_vf. Add "VF is already removed\n" in dev_dbg(). This WARN_ON() is unnecessary and causes call trace, despite that call trace, driver still works. There is no need for this warn because this piece of code is responsible for disabling VF's Tx/Rx queues when VF is disabled, but when VF is already removed there is no need to do reset or disable queues. Fixes: `efe4186000` ("ice: Fix memory corruption in VF driver") Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-11 08:53:33 -07:00
Benjamin Mikailenko	7fe05e125d	ice: Fix VSI rebuild WARN_ON check for VF In commit `b03d519d34` ("ice: store VF pointer instead of VF ID") WARN_ON checks were added to validate the vsi->vf pointer and catch programming errors. However, one check to vsi->vf was missed. This caused a call trace when resetting VFs. Fix ice_vsi_rebuild by encompassing VF pointer in WARN_ON check. Fixes: `b03d519d34` ("ice: store VF pointer instead of VF ID") Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-08-11 08:32:08 -07:00
Linus Torvalds	4e23eeebb2	Bitmap patches for v6.0-rc1 This branch consists of: Qu Wenruo: lib: bitmap: fix the duplicated comments on bitmap_to_arr64() https://lore.kernel.org/lkml/0d85e1dbad52ad7fb5787c4432bdb36cbd24f632.1656063005.git.wqu@suse.com/ Alexander Lobakin: bitops: let optimize out non-atomic bitops on compile-time constants https://lore.kernel.org/lkml/20220624121313.2382500-1-alexandr.lobakin@intel.com/T/ Yury Norov: lib: cleanup bitmap-related headers https://lore.kernel.org/linux-arm-kernel/YtCVeOGLiQ4gNPSf@yury-laptop/T/#m305522194c4d38edfdaffa71fcaaf2e2ca00a961 Alexander Lobakin: x86/olpc: fix 'logical not is only applied to the left hand side' https://www.spinics.net/lists/kernel/msg4440064.html Yury Norov: lib/nodemask: inline wrappers around bitmap https://lore.kernel.org/all/20220723214537.2054208-1-yury.norov@gmail.com/ -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmLpVvwACgkQsUSA/Tof vsiAHgwAwS9pl8GJ+fKYnue2CYo9349d2oT6BBUs/Rv8uqYEa4QkpYsR7NS733TG pos0hhoRvSOzrUP4qppXUjfJ+NkzLgpnKFOeWfFoNAKlHuaaMRvF3Y0Q/P8g0/Kg HPWcCQLHyCH9Wjs3e2TTgRjxTrHuruD2VJ401/PX/lw0DicUhmev5mUFa10uwFkP ZJRprjoFn9HJ0Hk16pFZDi36d3YumhACOcWRiJdoBDrEPV3S6lm9EeOy/yHBNp5k 9bKj+RboeT2t70KaZcKv+M5j1nu0cAhl7kRkjcxcmGyimI0l82Vgq9yFxhGqvWg8 RnCrJ5EaO08FGCAKG9GEwzdiNa24Gdq5XZSpQA7JZHmhmchpnnlNenJicyv0gOQi abChZeWSEsyA+78l2+kk9nezfVKUOnKDEZQxBVTOyWsmZYxHZV94oam340VjQDaY 4/fETdOy/qqPIxnpxAeFGWxZjcVaYiYPLj7KLPMsB0aAAF7pZrem465vSfgbrE81 +gCdqrWd =4dTW -----END PGP SIGNATURE----- Merge tag 'bitmap-6.0-rc1' of https://github.com/norov/linux Pull bitmap updates from Yury Norov: - fix the duplicated comments on bitmap_to_arr64() (Qu Wenruo) - optimize out non-atomic bitops on compile-time constants (Alexander Lobakin) - cleanup bitmap-related headers (Yury Norov) - x86/olpc: fix 'logical not is only applied to the left hand side' (Alexander Lobakin) - lib/nodemask: inline wrappers around bitmap (Yury Norov) * tag 'bitmap-6.0-rc1' of https://github.com/norov/linux: (26 commits) lib/nodemask: inline next_node_in() and node_random() powerpc: drop dependency on <asm/machdep.h> in archrandom.h x86/olpc: fix 'logical not is only applied to the left hand side' lib/cpumask: move some one-line wrappers to header file headers/deps: mm: align MANITAINERS and Docs with new gfp.h structure headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h> headers/deps: mm: Optimize <linux/gfp.h> header dependencies lib/cpumask: move trivial wrappers around find_bit to the header lib/cpumask: change return types to unsigned where appropriate cpumask: change return types to bool where appropriate lib/bitmap: change type of bitmap_weight to unsigned long lib/bitmap: change return types to bool where appropriate arm: align find_bit declarations with generic kernel iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE) lib/test_bitmap: test the tail after bitmap_to_arr64() lib/bitmap: fix off-by-one in bitmap_to_arr64() lib: test_bitmap: add compile-time optimization/evaluations assertions bitmap: don't assume compiler evaluates small mem*() builtins calls net/ice: fix initializing the bitmap in the switch code bitops: let optimize out non-atomic bitops on compile-time constants ...	2022-08-07 17:52:35 -07:00
Paolo Abeni	7c6327c77d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: net/ax25/af_ax25.c `d7c4c9e075` ("ax25: fix incorrect dev_tracker usage") `d62607c3fe` ("net: rename reference+tracking helpers") drivers/net/netdevsim/fib.c `180a6a3ee6` ("netdevsim: fib: Fix reference count leak on route deletion failure") `012ec02ae4` ("netdevsim: convert driver to use unlocked devlink API during init/fini") Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-03 09:04:55 +02:00
Jian Shen	7dc839fe47	net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr() vsi->current_netdev_flags is used store the current net device flags, not the active netdevice features. So it should use vsi->netdev->featurs, rather than vsi->current_netdev_flags to check NETIF_F_HW_VLAN_CTAG_FILTER. Fixes: `1babaf77f4` ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-01 12:09:28 -07:00
Jakub Kicinski	9936e07eaf	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2022-07-28 Jacob Keller says: Convert all of the Intel drivers with PTP support to the newer .adjfine implementation which uses scaled parts per million. This improves the precision of the frequency adjustments by taking advantage of the full scaled parts per million input coming from user space. In addition, all implementations are converted to using the mul_u64_u64_div_u64 function which better handles the intermediate value. This function supports architecture specific instructions where possible to avoid loss of precision if the normal 64-bit multiplication would overflow. Of note, the i40e implementation is now able to avoid loss of precision on slower link speeds by taking advantage of this to multiply by the link speed factor first. This results in a significantly more precise adjustment by allowing the calculation to impact the lower bits. This also gets us a step closer to being able to remove the .adjfreq entirely by removing its use from many drivers. I plan to follow this up with a series to update the drivers from other vendors and drop the .adjfreq implementation entirely. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igb: convert .adjfreq to .adjfine ixgbe: convert .adjfreq to .adjfine i40e: convert .adjfreq to .adjfine i40e: use mul_u64_u64_div_u64 for PTP frequency calculation e1000e: convert .adjfreq to .adjfine e1000e: remove unnecessary range check in e1000e_phc_adjfreq ice: implement adjfine with mul_u64_u64_div_u64 ==================== Link: https://lore.kernel.org/r/20220728181836.3387862-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-01 11:15:40 -07:00
Jakub Kicinski	84a8d931ab	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-07-28 This series contains updates to ice driver only. Michal allows for VF true promiscuous mode to be set for multiple VFs and adds clearing of promiscuous filters when VF trust is removed. Maciej refactors ice_set_features() to track/check changed features instead of constantly checking against netdev features and adds support for NETIF_F_LOOPBACK. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: allow toggling loopback mode via ndo_set_features callback ice: compress branches in ice_set_features() ice: Fix promiscuous mode not turning off ice: Introduce enabling promiscuous mode on multiple VF's ==================== Link: https://lore.kernel.org/r/20220728195538.3391360-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-29 21:26:10 -07:00
Jakub Kicinski	272ac32f56	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 18:21:16 -07:00
Maciej Fijalkowski	44ece4e1a3	ice: allow toggling loopback mode via ndo_set_features callback Add support for NETIF_F_LOOPBACK. This feature can be set via: $ ethtool -K eth0 loopback <on\|off> Feature can be useful for local data path tests. Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 11:44:40 -07:00
Maciej Fijalkowski	c67672fa26	ice: compress branches in ice_set_features() Instead of rather verbose comparison of current netdev->features bits vs the incoming ones from user, let us compress them by a helper features set that will be the result of netdev->features XOR features. This way, current, extensive branches: if (features & NETIF_F_BIT && !(netdev->features & NETIF_F_BIT)) set_feature(true); else if (!(features & NETIF_F_BIT) && netdev->features & NETIF_F_BIT) set_feature(false); can become: netdev_features_t changed = netdev->features ^ features; if (changed & NETIF_F_BIT) set_feature(!!(features & NETIF_F_BIT)); This is nothing new as currently several other drivers use this approach, which I find much more convenient. Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 11:44:40 -07:00
Michal Wilczynski	a419526de6	ice: Fix promiscuous mode not turning off When trust is turned off for the VF, the expectation is that promiscuous and allmulticast filters are removed. Currently default VSI filter is not getting cleared in this flow. Example: ip link set enp236s0f0 vf 0 trust on ip link set enp236s0f0v0 promisc on ip link set enp236s0f0 vf 0 trust off /* promiscuous mode is still enabled on VF0 */ Remove switch filters for both cases. This commit fixes above behavior by removing default VSI filters and allmulticast filters when vf-true-promisc-support is OFF. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 11:44:40 -07:00
Michal Wilczynski	d7393425e7	ice: Introduce enabling promiscuous mode on multiple VF's In current implementation default VSI switch filter is only able to forward traffic to a single VSI. This limits promiscuous mode with private flag 'vf-true-promisc-support' to a single VF. Enabling it on the second VF won't work. Also allmulticast support doesn't seem to be properly implemented when vf-true-promisc-support is true. Use standard ice_add_rule_internal() function that already implements forwarding to multiple VSI's instead of constructing AQ call manually. Add switch filter for allmulticast mode when vf-true-promisc-support is enabled. The same filter is added regardless of the flag - it doesn't matter for this case. Remove unnecessary fields in switch structure. From now on book keeping will be done by ice_add_rule_internal(). Refactor unnecessarily passed function arguments. To test: 1) Create 2 VM's, and two VF's. Attach VF's to VM's. 2) Enable promiscuous mode on both of them and check if traffic is seen on both of them. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 11:44:22 -07:00
Jacob Keller	4488df1401	ice: implement adjfine with mul_u64_u64_div_u64 The PTP frequency adjustment code needs to determine an appropriate adjustment given an input scaled_ppm adjustment. We calculate the adjustment to the register by multiplying the base (nominal) increment value by the scaled_ppm and then dividing by the scaled one million value. For very large adjustments, this might overflow. To avoid this, both the scaled_ppm and divisor values are downshifted. We can avoid that on X86 architectures by using mul_u64_u64_div_u64. This helper function will perform the multiplication and division with 128bit intermediate values. We know that scaled_ppm is never larger than the divisor so this operation will never result in an overflow. This improves the accuracy of the calculations for large adjustment values on X86. It is likely an improvement on other architectures as well because the default implementation of mul_u64_u64_div_u64 is smarter than the original approach taken in the ice code. Additionally, this implementation is easier to read, using fewer local variables and lines of code to implement. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-28 10:54:47 -07:00
Maciej Fijalkowski	cc019545a2	ice: do not setup vlan for loopback VSI Currently loopback test is failiing due to the error returned from ice_vsi_vlan_setup(). Skip calling it when preparing loopback VSI. Fixes: `0e674aeb0b` ("ice: Add handler for ethtool selftest") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-26 13:15:36 -07:00
Maciej Fijalkowski	283d736ff7	ice: check (DD \| EOF) bits on Rx descriptor rather than (EOP \| RS) Tx side sets EOP and RS bits on descriptors to indicate that a particular descriptor is the last one and needs to generate an irq when it was sent. These bits should not be checked on completion path regardless whether it's the Tx or the Rx. DD bit serves this purpose and it indicates that a particular descriptor is either for Rx or was successfully Txed. EOF is also set as loopback test does not xmit fragmented frames. Look at (DD \| EOF) bits setting in ice_lbtest_receive_frames() instead of EOP and RS pair. Fixes: `0e674aeb0b` ("ice: Add handler for ethtool selftest") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-26 13:15:36 -07:00
Anirudh Venkataramanan	5c8e3c7ff3	ice: Fix VSIs unable to share unicast MAC The driver currently does not allow two VSIs in the same PF domain to have the same unicast MAC address. This is incorrect in the sense that a policy decision is being made in the driver when it must be left to the user. This approach was causing issues when rebooting the system with VFs spawned not being able to change their MAC addresses. Such errors were present in dmesg: [ 7921.068237] ice 0000:b6:00.2 ens2f2: Unicast MAC 6a:0d:e4:70:ca:d1 already exists on this PF. Preventing setting VF 7 unicast MAC address to 6a:0d:e4:70:ca:d1 Fix that by removing this restriction. Doing this also allows us to remove some additional code that's checking if a unicast MAC filter already exists. Fixes: `47ebc7b024` ("ice: Check if unicast MAC exists before setting VF MAC") Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-26 13:15:36 -07:00
Przemyslaw Patynowski	01658aeead	ice: Fix tunnel checksum offload with fragmented traffic Fix checksum offload on VXLAN tunnels. In case, when mpls protocol is not used, set l4 header to transport header of skb. This fixes case, when user tries to offload checksums of VXLAN tunneled traffic. Steps for reproduction (requires link partner with tunnels): ip l s enp130s0f0 up ip a f enp130s0f0 ip a a 10.10.110.2/24 dev enp130s0f0 ip l s enp130s0f0 mtu 1600 ip link add vxlan12_sut type vxlan id 12 group 238.168.100.100 dev enp130s0f0 dstport 4789 ip l s vxlan12_sut up ip a a 20.10.110.2/24 dev vxlan12_sut iperf3 -c 20.10.110.1 #should connect Offload params: td_offset, cd_tunnel_params were corrupted, due to l4 header pointing wrong address. NIC would then drop those packets internally, due to incorrect TX descriptor data, which increased GLV_TEPC register. Fixes: `69e66c04c6` ("ice: Add mpls+tso support") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-26 13:15:36 -07:00
Przemyslaw Patynowski	1e308c6fb7	ice: Fix max VLANs available for VF Legacy VLAN implementation allows for untrusted VF to have 8 VLAN filters, not counting VLAN 0 filters. Current VLAN_V2 implementation lowers available filters for VF, by counting in VLAN 0 filter for both TPIDs. Fix this by counting only non zero VLAN filters. Without this patch, untrusted VF would not be able to access 8 VLAN filters. Fixes: `cc71de8fa1` ("ice: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-26 13:15:18 -07:00
Marcin Szycik	cd8efeeed1	ice: Add support for PPPoE hardware offload Add support for creating PPPoE filters in switchdev mode. Add support for parsing PPPoE and PPP-specific tc options: pppoe_sid and ppp_proto. Example filter: tc filter add dev $PF1 ingress protocol ppp_ses prio 1 flower pppoe_sid \ 1234 ppp_proto ip skip_sw action mirred egress redirect dev $VF1_PR Changes in iproute2 are required to use the new fields. ICE COMMS DDP package is required to create a filter as it contains PPPoE profiles. Added a warning message when loaded DDP package does not contain required profiles. Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-26 10:56:32 -07:00
Karol Kolacinski	d6b98c8d24	ice: add write functionality for GNSS TTY Add the possibility to write raw bytes to the GNSS module through the first TTY device. This allows user to configure the module. Create a second read-only TTY device. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-21 13:25:17 -07:00
Karol Kolacinski	fcf9b695a5	ice: add i2c write command Add the possibility to write to connected i2c devices using the AQ command. FW may reject the write if the device is not on allowlist. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-21 11:22:45 -07:00
Zhuo Chen	ca415ea1f0	ice: Remove pci_aer_clear_nonfatal_status() call After commit `62b36c3ea6` ("PCI/AER: Remove pci_cleanup_aer_uncorrect_error_status() calls"), calls to pci_cleanup_aer_uncorrect_error_status() have already been removed. But in commit `5995b6d0c6` ("ice: Implement pci_error_handler ops") pci_cleanup_aer_uncorrect_error_status was used again, so remove it in this patch. Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Sen Wang <wangsen.harry@bytedance.com> Cc: Wenliang Wang <wangwenliang.1995@bytedance.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-15 10:41:39 -07:00
Anirudh Venkataramanan	896a55aa52	ice: Add EXTTS feature to the feature bitmap External time stamp sources are supported only on certain devices. Enforce the right support matrix by adding the ICE_F_PTP_EXTTS bit to the feature bitmap set. Co-developed-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-15 10:41:39 -07:00
Jakub Kicinski	816cd16883	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net include/net/sock.h `310731e2f1` ("net: Fix data-races around sysctl_mem.") `e70f3c7012` ("Revert "net: set SK_MEM_QUANTUM to 4096"") https://lore.kernel.org/all/20220711120211.7c8b7cba@canb.auug.org.au/ net/ipv4/fib_semantics.c `747c143072` ("ip: fix dflt addr selection for connected nexthop") `d62607c3fe` ("net: rename reference+tracking helpers") net/tls/tls.h include/net/tls.h `3d8c51b25a` ("net/tls: Check for errors in tls_device_init") `5879031423` ("tls: create an internal header") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-14 15:27:35 -07:00
Paul M Stillwell Jr	7b6f9462a3	ice: change devlink code to read NVM in blocks When creating a snapshot of the NVM the driver needs to read the entire contents from the NVM and store it. The NVM reads are protected by a lock that is shared between the driver and the firmware. If the driver takes too long to read the entire NVM (which can happen on some systems) then the firmware could reclaim the lock and cause subsequent reads from the driver to fail. We could fix this by increasing the timeout that we pass to the firmware, but we could end up in the same situation again if the system is slow. Instead have the driver break the reading of the NVM into blocks that are small enough that we have confidence that the read will complete within the timeout time, but large enough not to cause significant AQ overhead. Fixes: `dce730f178` ("ice: add a devlink region for dumping NVM contents") Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-12 09:25:47 -07:00
Paul M Stillwell Jr	f52d166819	ice: handle E822 generic device ID in PLDM header The driver currently presumes that the record data in the PLDM header of the firmware image will match the device ID of the running device. This is true for E810 devices. It appears that for E822 devices that this is not guaranteed to be true. Fix this by adding a check for the generic E822 device. Fixes: `d69ea414c9` ("ice: implement device flash update via devlink") Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-12 09:25:34 -07:00
David S. Miller	456bfd9dc8	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex t-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-06-30 This series contains updates to ice driver only. Martyna adds support for VLAN related TC switchdev filters and reworks dummy packet implementation of VLANs to enable dynamic header insertion to allow for more rule types. Lu Wei utilizes eth_broadcast_addr() helper over an open coded version. Ziyang Xuan removes unneeded NULL checks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-01 11:21:56 +01:00
Alexander Lobakin	2f7ee2a72c	net/ice: fix initializing the bitmap in the switch code Kbuild spotted the following bug during the testing of one of the optimizations: In file included from include/linux/cpumask.h:12, [...] from drivers/net/ethernet/intel/ice/ice_switch.c:4: drivers/net/ethernet/intel/ice/ice_switch.c: In function 'ice_find_free_recp_res_idx.constprop': include/linux/bitmap.h:447:22: warning: 'possible_idx[0]' is used uninitialized [-Wuninitialized] 447 \| map \|= GENMASK(start + nbits - 1, start); \| ^~ In file included from drivers/net/ethernet/intel/ice/ice.h:7, from drivers/net/ethernet/intel/ice/ice_lib.h:7, from drivers/net/ethernet/intel/ice/ice_switch.c:4: drivers/net/ethernet/intel/ice/ice_switch.c:4929:24: note: 'possible_idx[0]' was declared here 4929 \| DECLARE_BITMAP(possible_idx, ICE_MAX_FV_WORDS); \| ^~~~~~~~~~~~ include/linux/types.h:11:23: note: in definition of macro 'DECLARE_BITMAP' 11 \| unsigned long name[BITS_TO_LONGS(bits)] \| ^~~~ %ICE_MAX_FV_WORDS is 48, so bitmap_set() here was initializing only 48 bits, leaving a junk in the rest 16. It was previously hidden due to that filling 48 bits makes bitmap_set() call external __bitmap_set(), but after making it use plain bit arithmetics on small bitmaps, compilers started seeing the issue. It was still working because those 16 weren't used anywhere anyhow. bitmap_{clear,set}() are not really intended to initialize bitmaps, rather to modify already initialized ones, as they don't do anything past the passed number of bits. The correct function to do this in that particular case is bitmap_fill(), so use it here. It will do `possible_idx = ~0UL` instead of `*possible_idx \|= GENMASK(47, 0)`, not leaving anything in an undefined state. Fixes: `fd2a6b71e3` ("ice: create advanced switch recipe") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>	2022-06-30 19:52:42 -07:00
Jilin Yuan	173e468c71	intel/ice:fix repeated words in comments Delete the redundant word 'a'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-30 14:22:16 -07:00
Ziyang Xuan	afa646299a	ice: Remove unnecessary NULL check before dev_put Since commit `b37a466837` ("netdevice: add the case if dev is NULL"), dev_put(NULL) is safe, check NULL before dev_put() is not needed. Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-30 13:53:33 -07:00
Lu Wei	0ca8582990	ice: use eth_broadcast_addr() to set broadcast address Use eth_broadcast_addr() to set broadcast address instead of memset(). Signed-off-by: Lu Wei <luwei32@huawei.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-30 13:53:33 -07:00
Martyna Szapar-Mudlaw	263957263a	ice: switch: dynamically add VLAN headers to dummy packets Enable the support of creating all kinds of declared dummy packets with the VLAN tags by inserting VLAN headers (single VLAN and QinQ cases) if needed. Decrease the number of declared dummy packets and increase in the possible packet's combinations for adding switch rules. This change enables support of creating filters that match both on VLAN + tunnels properties in switchdev. Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-30 13:53:33 -07:00
Martyna Szapar-Mudlaw	ea71b967a5	ice: Add support for VLAN TPID filters in switchdev Enable support for adding TC rules that filter on the VLAN tag type in switchdev mode. Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-30 13:53:33 -07:00
Martyna Szapar-Mudlaw	06bca7c268	ice: Add support for double VLAN in switchdev Enable support for adding TC rules with both C-tag and S-tag that can filter on the inner and outer VLAN in QinQ for basic packets (not tunneled cases). Signed-off-by: Wiktor Pilarczyk <wiktor.pilarczyk@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-30 13:53:25 -07:00
Jakub Kicinski	93817be8b6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-23 12:33:24 -07:00
Anatolii Gerasymenko	a632b2a4c9	ice: ethtool: Prohibit improper channel config for DCB Do not allow setting less channels, than Traffic Classes there are via ethtool. There must be at least one channel per Traffic Class. If you set less channels, than Traffic Classes there are, then during ice_vsi_rebuild there would be allocated only the requested amount of tx/rx rings in ice_vsi_alloc_arrays. But later in ice_vsi_setup_q_map there would be requested at least one channel per Traffic Class. This results in setting num_rxq > alloc_rxq and num_txq > alloc_txq. Later, there would be a NULL pointer dereference in ice_vsi_map_rings_to_vectors, because we go beyond of rx_rings or tx_rings arrays. Change ice_set_channels() to return error if you try to allocate less channels, than Traffic Classes there are. Change ice_vsi_setup_q_map() and ice_vsi_setup_q_map_mqprio() to return status code instead of void. Add error handling for ice_vsi_setup_q_map() and ice_vsi_setup_q_map_mqprio() in ice_vsi_init() and ice_vsi_cfg_tc(). [53753.889983] INFO: Flow control is disabled for this traffic class (0) on this vsi. [53763.984862] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [53763.992915] PGD 14b45f5067 P4D 0 [53763.996444] Oops: 0002 [#1] SMP NOPTI [53764.000312] CPU: 12 PID: 30661 Comm: ethtool Kdump: loaded Tainted: GOE --------- - - 4.18.0-240.el8.x86_64 #1 [53764.011825] Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0020.P21.2012150710 12/15/2020 [53764.022584] RIP: 0010:ice_vsi_map_rings_to_vectors+0x7e/0x120 [ice] [53764.029089] Code: 41 0d 0f b7 b7 12 05 00 00 0f b6 d0 44 29 de 44 0f b7 c6 44 01 c2 41 39 d0 7d 2d 4c 8b 47 28 44 0f b7 ce 83 c6 01 4f 8b 04 c8 <49> 89 48 28 4 c 8b 89 b8 01 00 00 4d 89 08 4c 89 81 b8 01 00 00 44 [53764.048379] RSP: 0018:ff550dd88ea47b20 EFLAGS: 00010206 [53764.053884] RAX: 0000000000000002 RBX: 0000000000000004 RCX: ff385ea42fa4a018 [53764.061301] RDX: 0000000000000006 RSI: 0000000000000005 RDI: ff385e9baeedd018 [53764.068717] RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000004 [53764.076133] R10: 0000000000000002 R11: 0000000000000004 R12: 0000000000000000 [53764.083553] R13: 0000000000000000 R14: ff385e658fdd9000 R15: ff385e9baeedd018 [53764.090976] FS: 000014872c5b5740(0000) GS:ff385e847f100000(0000) knlGS:0000000000000000 [53764.099362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [53764.105409] CR2: 0000000000000028 CR3: 0000000a820fa002 CR4: 0000000000761ee0 [53764.112851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [53764.120301] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [53764.127747] PKRU: 55555554 [53764.130781] Call Trace: [53764.133564] ice_vsi_rebuild+0x611/0x870 [ice] [53764.138341] ice_vsi_recfg_qs+0x94/0x100 [ice] [53764.143116] ice_set_channels+0x1a8/0x3e0 [ice] [53764.147975] ethtool_set_channels+0x14e/0x240 [53764.152667] dev_ethtool+0xd74/0x2a10 [53764.156665] ? __mod_lruvec_state+0x44/0x110 [53764.161280] ? __mod_lruvec_state+0x44/0x110 [53764.165893] ? page_add_file_rmap+0x15/0x170 [53764.170518] ? inet_ioctl+0xd1/0x220 [53764.174445] ? netdev_run_todo+0x5e/0x290 [53764.178808] dev_ioctl+0xb5/0x550 [53764.182485] sock_do_ioctl+0xa0/0x140 [53764.186512] sock_ioctl+0x1a8/0x300 [53764.190367] ? selinux_file_ioctl+0x161/0x200 [53764.195090] do_vfs_ioctl+0xa4/0x640 [53764.199035] ksys_ioctl+0x60/0x90 [53764.202722] __x64_sys_ioctl+0x16/0x20 [53764.206845] do_syscall_64+0x5b/0x1a0 [53764.210887] entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: `87324e747f` ("ice: Implement ethtool ops for channels") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-21 15:20:24 -07:00
Anatolii Gerasymenko	c3d184c83f	ice: ethtool: advertise 1000M speeds properly In current implementation ice_update_phy_type enables all link modes for selected speed. This approach doesn't work for 1000M speeds, because both copper (1000baseT) and optical (1000baseX) standards cannot be enabled at once. Fix this, by adding the function `ice_set_phy_type_from_speed()` for 1000M speeds. Fixes: `48cb27f2fd` ("ice: Implement handlers for ethtool PHY/link operations") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-21 13:48:57 -07:00
Wojciech Drewek	3578dc9001	ice: Fix switchdev rules book keeping Adding two filters with same matching criteria ends up with one rule in hardware with act = ICE_FWD_TO_VSI_LIST. In order to remove them properly we have to keep the information about vsi handle which is used in VSI bitmap (ice_adv_fltr_mgmt_list_entry::vsi_list_info::vsi_map). Fixes: `0d08a441fb` ("ice: ndo_setup_tc implementation for PF") Reported-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-21 13:09:03 -07:00
Marcin Szycik	d4ea6f6373	ice: ignore protocol field in GTP offload Commit `34a897758e` ("ice: Add support for inner etype in switchdev") added the ability to match on inner ethertype. A side effect of that change is that it is now impossible to add some filters for protocols which do not contain inner ethtype field. tc requires the protocol field to be specified when providing certain other options, e.g. src_ip. This is a problem in case of GTP - when user wants to specify e.g. src_ip, they also need to specify protocol in tc command (otherwise tc fails with: Illegal "src_ip"). Because GTP is a tunnel, the protocol field is treated as inner protocol. GTP does not contain inner ethtype field and the filter cannot be added. To fix this, ignore the ethertype field in case of GTP filters. Fixes: `9a225f81f5` ("ice: Support GTP-U and GTP-C offload in switchdev") Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-21 11:14:36 -07:00
Jakub Kicinski	9cbc991126	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:13:52 -07:00
Przemyslaw Patynowski	efe4186000	ice: Fix memory corruption in VF driver Disable VF's RX/TX queues, when it's disabled. VF can have queues enabled, when it requests a reset. If PF driver assumes that VF is disabled, while VF still has queues configured, VF may unmap DMA resources. In such scenario device still can map packets to memory, which ends up silently corrupting it. Previously, VF driver could experience memory corruption, which lead to crash: [ 5119.170157] BUG: unable to handle kernel paging request at 00001b9780003237 [ 5119.170166] PGD 0 P4D 0 [ 5119.170173] Oops: 0002 [#1] PREEMPT_RT SMP PTI [ 5119.170181] CPU: 30 PID: 427592 Comm: kworker/u96:2 Kdump: loaded Tainted: G W I --------- - - 4.18.0-372.9.1.rt7.166.el8.x86_64 #1 [ 5119.170189] Hardware name: Dell Inc. PowerEdge R740/014X06, BIOS 2.3.10 08/15/2019 [ 5119.170193] Workqueue: iavf iavf_adminq_task [iavf] [ 5119.170219] RIP: 0010:__page_frag_cache_drain+0x5/0x30 [ 5119.170238] Code: 0f 0f b6 77 51 85 f6 74 07 31 d2 e9 05 df ff ff e9 90 fe ff ff 48 8b 05 49 db 33 01 eb b4 0f 1f 80 00 00 00 00 0f 1f 44 00 00 <f0> 29 77 34 74 01 c3 48 8b 07 f6 c4 80 74 0f 0f b6 77 51 85 f6 74 [ 5119.170244] RSP: 0018:ffffa43b0bdcfd78 EFLAGS: 00010282 [ 5119.170250] RAX: ffffffff896b3e40 RBX: ffff8fb282524000 RCX: 0000000000000002 [ 5119.170254] RDX: 0000000049000000 RSI: 0000000000000000 RDI: 00001b9780003203 [ 5119.170259] RBP: ffff8fb248217b00 R08: 0000000000000022 R09: 0000000000000009 [ 5119.170262] R10: 2b849d6300000000 R11: 0000000000000020 R12: 0000000000000000 [ 5119.170265] R13: 0000000000001000 R14: 0000000000000009 R15: 0000000000000000 [ 5119.170269] FS: 0000000000000000(0000) GS:ffff8fb1201c0000(0000) knlGS:0000000000000000 [ 5119.170274] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5119.170279] CR2: 00001b9780003237 CR3: 00000008f3e1a003 CR4: 00000000007726e0 [ 5119.170283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5119.170286] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 5119.170290] PKRU: 55555554 [ 5119.170292] Call Trace: [ 5119.170298] iavf_clean_rx_ring+0xad/0x110 [iavf] [ 5119.170324] iavf_free_rx_resources+0xe/0x50 [iavf] [ 5119.170342] iavf_free_all_rx_resources.part.51+0x30/0x40 [iavf] [ 5119.170358] iavf_virtchnl_completion+0xd8a/0x15b0 [iavf] [ 5119.170377] ? iavf_clean_arq_element+0x210/0x280 [iavf] [ 5119.170397] iavf_adminq_task+0x126/0x2e0 [iavf] [ 5119.170416] process_one_work+0x18f/0x420 [ 5119.170429] worker_thread+0x30/0x370 [ 5119.170437] ? process_one_work+0x420/0x420 [ 5119.170445] kthread+0x151/0x170 [ 5119.170452] ? set_kthread_struct+0x40/0x40 [ 5119.170460] ret_from_fork+0x35/0x40 [ 5119.170477] Modules linked in: iavf sctp ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core nfp tls vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr iTCO_wdt iTCO_vendor_support dell_smbios wmi_bmof dell_wmi_descriptor dcdbas kvm_intel kvm irqbypass intel_rapl_common isst_if_common skx_edac irdma nfit libnvdimm x86_pkg_temp_thermal i40e intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ib_uverbs rapl ipmi_ssif intel_cstate intel_uncore mei_me pcspkr acpi_ipmi ib_core mei lpc_ich i2c_i801 ipmi_si ipmi_devintf wmi ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ice ahci drm libahci crc32c_intel libata tg3 megaraid_sas [ 5119.170613] i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: iavf] [ 5119.170627] CR2: 00001b9780003237 Fixes: `ec4f5a436b` ("ice: Check if VF is disabled for Opcode and other operations") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Co-developed-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-14 09:38:57 -07:00
Przemyslaw Patynowski	be2af71496	ice: Fix queue config fail handling Disable VF's RX/TX queues, when VIRTCHNL_OP_CONFIG_VSI_QUEUES fail. Not disabling them might lead to scenario, where PF driver leaves VF queues enabled, when VF's VSI failed queue config. In this scenario VF should not have RX/TX queues enabled. If PF failed to set up VF's queues, VF will reset due to TX timeouts in VF driver. Initialize iterator 'i' to -1, so if error happens prior to configuring queues then error path code will not disable queue 0. Loop that configures queues will is using same iterator, so error path code will only disable queues that were configured. Fixes: `77ca27c417` ("ice: add support for virtchnl_queue_select.[tx\|rx]_queues bitmap") Suggested-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-14 09:38:57 -07:00
Roman Storozhenko	9542ef4fba	ice: Sync VLAN filtering features for DVM VLAN filtering features, that is C-Tag and S-Tag, in DVM mode must be both enabled or disabled. In case of turning off/on only one of the features, another feature must be turned off/on automatically with issuing an appropriate message to the kernel log. Fixes: `1babaf77f4` ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev") Signed-off-by: Roman Storozhenko <roman.storozhenko@intel.com> Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-14 09:38:57 -07:00
Michal Michalik	71a579f0d3	ice: Fix PTP TX timestamp offset calculation The offset was being incorrectly calculated for E822 - that led to collisions in choosing TX timestamp register location when more than one port was trying to use timestamping mechanism. In E822 one quad is being logically split between ports, so quad 0 is having trackers for ports 0-3, quad 1 ports 4-7 etc. Each port should have separate memory location for tracking timestamps. Due to error for example ports 1 and 2 had been assigned to quad 0 with same offset (0), while port 1 should have offset 0 and 1 offset 16. Fix it by correctly calculating quad offset. Fixes: `3a7496234d` ("ice: implement basic E822 PTP support") Signed-off-by: Michal Michalik <michal.michalik@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-14 09:35:57 -07:00
Bin Chen	10e11aa241	ethernet: Remove vf rate limit check for drivers The commit `a14857c27a` ("rtnetlink: verify rate parameters for calls to ndo_set_vf_rate") has been merged to master, so we can to remove the now-duplicate checks in drivers. Signed-off-by: Bin Chen <bin.chen@corigine.com> Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220609084717.155154-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-10 22:19:32 -07:00
Jakub Kicinski	68c51dd992	Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 10GbE Intel Wired LAN Driver Updates 2022-06-09 Maximilian Heyne adds reporting of VF statistics on ixgbe via iproute2 interface. Kai-Heng Feng removes duplicate defines from igb. Jiaqing Zhao fixes typos in e1000, ixgb, and ixgbe drivers. Julia Lawall fixes typos for fm10k, ixgbe, and ice drivers. * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: drivers/net/ethernet/intel: fix typos in comments ixgbe: Fix typos in comments ixgb: Fix typos in comments e1000: Fix typos in comments igb: Remove duplicate defines drivers, ixgbe: export vf statistics ==================== Link: https://lore.kernel.org/r/20220609171257.2727150-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-10 22:07:32 -07:00
Julia Lawall	138f9f50eb	drivers/net/ethernet/intel: fix typos in comments Spelling mistakes (triple letters) in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-09 10:05:31 -07:00
Christophe JAILLET	a4da4913a0	ice: Use correct order for the parameters of devm_kcalloc() We should have 'n', then 'size', not the opposite. This is harmless because the 2 values are just multiplied, but having the correct order silence a (unpublished yet) smatch warning. While at it use 'tun_seg' instead 'seg'. The both variable have the same type, so the result is the same, but it lokks more logical. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-08 08:55:13 -07:00
Karol Kolacinski	0a3ca0867c	ice: remove u16 arithmetic in ice_gnss Change u16 to unsigned int where arithmetic occurs. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-08 08:55:13 -07:00
Michal Swiatkowski	b33de560f9	ice: remove VLAN representor specific ops In switchdev mode VF VLAN caps will not be set there is no need to have specific VLAN ops for representor that only returns not supported error. As VLAN configuration commands will be blocked, the VF driver can't disable VLAN stripping at initialization. It leads to the situation when VLAN stripping on VF VSI is on, but in kernel it is off. To prevent this, disable VLAN stripping in VSI initialization. It doesn't break other usecases, because it is set according to kernel settings. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-08 08:54:14 -07:00
Michal Swiatkowski	cede04b352	ice: don't set VF VLAN caps in switchdev In switchdev mode any VLAN manipulation from VF side isn't allowed. In order to prevent parsing VLAN commands don't set VF VLAN caps. This will result in removing VLAN specific opcodes from allowlist. If VF send any VLAN specific opcode PF driver will answer with not supported error. With this approach VF driver know that VLAN caps aren't supported so it shouldn't send VLAN specific opcodes. Thanks to that, some ugly errors will not show up in dmesg (ex. on creating VFs in switchdev mode there are errors about not supported VLAN insertion and stripping) Move setting VLAN caps to separate function, including switchdev mode specific code. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-06-08 08:47:50 -07:00
Alexander Lobakin	6e1ff61873	ice: fix access-beyond-end in the switch code Global `-Warray-bounds` enablement revealed some problems, one of which is the way we define and use AQC rules messages. In fact, they have a shared header, followed by the actual message, which can be of one of several different formats. So it is straightforward enough to define that header as a separate struct and then embed it into message structures as needed, but currently all the formats reside in one union coupled with the header. Then, the code allocates only the memory needed for a particular message format, leaving the union potentially incomplete. There are no actual reads or writes beyond the end of an allocated chunk, but at the same time, the whole implementation is fragile and backed by an equilibrium rather than strong type and memory checks. Define the structures the other way around: one for the common header and the rest for the actual formats with the header embedded. There are no places where several union members would be used at the same time anyway. This allows to use proper struct_size() and let the compiler know what is going to be done. Finally, unsilence `-Warray-bounds` back for ice_switch.c. Other little things worth mentioning: * &ice_sw_rule_vsi_list_query is not used anywhere, remove it. It's weird anyway to talk to hardware with purely kernel types (bitmaps); * expand the ICE_SW_RULE__SIZE() macros to pass a structure variable name to struct_size() to let it do strict typechecking; rename ice_sw_rule_lkup_rx_tx::hdr to ::hdr_data to keep ::hdr for the header structure to have the same name for it constistenly everywhere; * drop the duplicate of %ICE_SW_RULE_RX_TX_NO_HDR_SIZE residing in ice_switch.h. Fixes: `9daf8208dd` ("ice: Add support for switch filter programming") Fixes: `66486d8943` ("ice: replace single-element array used for C struct hack") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220601105924.2841410-1-alexandr.lobakin@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-02 12:41:10 +02:00
Jakub Kicinski	385bc51b41	eth: ice: silence the GCC 12 array-bounds warning GCC 12 gets upset because driver allocates partial struct ice_aqc_sw_rules_elem buffers. The writes are within bounds. Silence these warnings for now, our build bot runs GCC 12 so we won't allow any new instances. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-22 22:21:11 +01:00
Jakub Kicinski	d7e6f58360	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/mellanox/mlx5/core/main.c `b33886971d` ("net/mlx5: Initialize flow steering during driver probe") `40379a0084` ("net/mlx5_fpga: Drop INNOVA TLS support") `f2b41b32cd` ("net/mlx5: Remove ipsec_ops function table") https://lore.kernel.org/all/20220519040345.6yrjromcdistu7vh@sx1/ `16d42d3133` ("net/mlx5: Drain fw_reset when removing device") `8324a02c34` ("net/mlx5: Add exit route when waiting for FW") https://lore.kernel.org/all/20220519114119.060ce014@canb.auug.org.au/ tools/testing/selftests/net/mptcp/mptcp_join.sh `e274f71540` ("selftests: mptcp: add subflow limits test-cases") `b6e074e171` ("selftests: mptcp: add infinite map testcase") `5ac1d2d634` ("selftests: mptcp: Add tests for userspace PM type") https://lore.kernel.org/all/20220516111918.366d747f@canb.auug.org.au/ net/mptcp/options.c `ba2c89e0ea` ("mptcp: fix checksum byte order") `1e39e5a32a` ("mptcp: infinite mapping sending") `ea66758c17` ("tcp: allow MPTCP to update the announced window") https://lore.kernel.org/all/20220519115146.751c3a37@canb.auug.org.au/ net/mptcp/pm.c `95d6865178` ("mptcp: fix subflow accounting on close") `4d25247d3a` ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs") https://lore.kernel.org/all/20220516111435.72f35dca@canb.auug.org.au/ net/mptcp/subflow.c `ae66fb2ba6` ("mptcp: Do TCP fallback on early DSS checksum failure") `0348c690ed` ("mptcp: add the fallback check") `f8d4bcacff` ("mptcp: infinite mapping receiving") https://lore.kernel.org/all/20220519115837.380bb8d4@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-19 11:23:59 -07:00
Michal Wilczynski	bf13502ed5	ice: Fix interrupt moderation settings getting cleared Adaptive-rx and Adaptive-tx are interrupt moderation settings that can be enabled/disabled using ethtool: ethtool -C ethX adaptive-rx on/off adaptive-tx on/off Unfortunately those settings are getting cleared after changing number of queues, or in ethtool world 'channels': ethtool -L ethX rx 1 tx 1 Clearing was happening due to introduction of bit fields in ice_ring_container struct. This way only itr_setting bits were rebuilt during ice_vsi_rebuild_set_coalesce(). Introduce an anonymous struct of bitfields and create a union to refer to them as a single variable. This way variable can be easily saved and restored. Fixes: `61dc79ced7` ("ice: Restore interrupt throttle settings after VSI rebuild") Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-05-17 10:37:09 -07:00
Paul Greenwalt	31b6298fd8	ice: fix possible under reporting of ethtool Tx and Rx statistics The hardware statistics counters are not cleared during resets so the drivers first access is to initialize the baseline and then subsequent reads are for reporting the counters. The statistics counters are read during the watchdog subtask when the interface is up. If the baseline is not initialized before the interface is up, then there can be a brief window in which some traffic can be transmitted/received before the initial baseline reading takes place. Directly initialize ethtool statistics in driver open so the baseline will be initialized when the interface is up, and any dropped packets incremented before the interface is up won't be reported. Fixes: `28dc1b86f8` ("ice: ignore dropped packets during init") Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-05-17 10:37:09 -07:00
Arkadiusz Kubalewski	4503cc7fdf	ice: fix crash when writing timestamp on RX rings Do not allow to write timestamps on RX rings if PF is being configured. When PF is being configured RX rings can be freed or rebuilt. If at the same time timestamps are updated, the kernel will crash by dereferencing null RX ring pointer. PID: 1449 TASK: ff187d28ed658040 CPU: 34 COMMAND: "ice-ptp-0000:51" #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be #1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d #2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd #3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54 #4 [ff1966a94a713d08] no_context at ffffffff9d06bda4 #5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c #6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4 #7 [ff1966a94a713de0] page_fault at ffffffff9da0107e [exception RIP: ice_ptp_update_cached_phctime+91] RIP: ffffffffc076db8b RSP: ff1966a94a713e98 RFLAGS: 00010246 RAX: 16e3db9c6b7ccae4 RBX: ff187d269dd3c180 RCX: ff187d269cd4d018 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ff187d269cfcc644 R8: ff187d339b9641b0 R9: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: ff187d269cfcc648 R13: ffffffff9f128784 R14: ffffffff9d101b70 R15: ff187d269cfcc640 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice] #9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b #10 [ff1966a94a713f10] kthread at ffffffff9d101b4d #11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f Fixes: `77a781155a` ("ice: enable receive hardware timestamping") Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Dave Cain <dcain@redhat.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-05-17 10:36:50 -07:00
Sridhar Samudrala	d971308815	ice: Expose RSS indirection tables for queue groups via ethtool When ADQ queue groups (TCs) are created via tc mqprio command, RSS contexts and associated RSS indirection tables are configured automatically per TC based on the queue ranges specified for each traffic class. For ex: tc qdisc add dev enp175s0f0 root mqprio num_tc 3 map 0 1 2 \ queues 2@0 8@2 4@10 hw 1 mode channel will create 3 queue groups (TC 0-2) with queue ranges 2, 8 and 4 in 3 queue groups. Each queue group is associated with its own RSS context and RSS indirection table. Add support to expose RSS indirection tables for all ADQ queue groups using ethtool RSS contexts interface. ethtool -x enp175s0f0 context <tc-num> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220512213249.3747424-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-13 17:05:49 -07:00
Jakub Kicinski	9b19e57a3c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c `54fccfdd7c` ("sfc: efx_default_channel_type APIs can be static") `49e6123c65` ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-12 16:15:30 -07:00
Jakub Kicinski	5bcfeb6efe	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-05-06 Marcin Szycik says: This patchset adds support for systemd defined naming scheme for port representors, as well as re-enables displaying PCI bus-info in ethtool. bus-info information has previously been removed from ethtool for port representors, as a workaround for a bug in lshw tool, where the tool would sometimes display wrong descriptions for port representors/PF. Now the bug has been fixed in lshw tool [1]. Removing the workaround can be considered a regression (user might be running an older, unpatched version of lshw) (see [2] for discussion). However, calling SET_NETDEV_DEV also produces the same effect as removing the workaround, i.e. lshw is able to access PCI bus-info (this time not via ethtool, but in some other way) and the bug can occur. Adding SET_NETDEV_DEV is important, as it greatly improves netdev naming - - port representors are named based on PF name. Currently port representors are named "ethX", which might be confusing, especially when spawning VFs on multiple PFs. Furthermore, it's currently harder to determine to which PF does a particular port representor belong, as bus-info is not shown in ethtool. Consider the following three cases: Case 1: current code - driver workaround in place, no SET_NETDEV_DEV, lshw with or without fix. Port representors are not displayed because they don't have bus-info (the workaround), PFs are labelled correctly: $ sudo ./lshw -c net -businfo Bus info Device Class Description ======================================================== pci@0000:02:00.0 ens6f0 network Ethernet Controller E810-XXV for SFP <-- PF pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function <-- VF pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function ... Case 2: driver workaround in place, SET_NETDEV_DEV, no lshw fix. Port representors have predictable names. lshw is able to get bus-info because of SET_NETDEV_DEV and netdevs CAN be mislabelled: $ sudo ./lshw -c net -businfo Bus info Device Class Description ============================================================= pci@0000:02:00.0 ens6f0npf0vf60 network Ethernet Controller E810-XXV for SFP <-- mislabeled port representor pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function ... pci@0000:02:00.0 ens6f0npf0vf26 network Ethernet interface pci@0000:02:00.0 ens6f0 network Ethernet interface <-- mislabeled PF pci@0000:02:00.0 ens6f0npf0vf81 network Ethernet interface ... $ sudo ethtool -i ens6f0npf0vf60 driver: ice ... bus-info: ... Output of lshw would be the same with workaround removed; it does not change the fact that lshw labels netdevs incorrectly, while at the same time it prevents ethtool from displaying potentially useful data (bus-info). Case 3: workaround removed, SET_NETDEV_DEV, lshw fix: $ sudo ./lshw -c net -businfo Bus info Device Class Description ============================================================= pci@0000:02:00.0 ens6f0npf0vf73 network Ethernet Controller E810-XXV for SFP pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function ... pci@0000:02:00.0 ens6f0npf0vf5 network Ethernet Controller E810-XXV for SFP pci@0000:02:00.0 ens6f0 network Ethernet Controller E810-XXV for SFP pci@0000:02:00.0 ens6f0npf0vf60 network Ethernet Controller E810-XXV for SFP ... $ sudo ethtool -i ens6f0npf0vf73 driver: ice ... bus-info: 0000:02:00.0 ... In this case poort representors have predictable names, netdevs are not mislabelled in lshw, and bus-info is shown in ethtool. [1] https://ezix.org/src/pkg/lshw/commit/9bf4e4c9c1 [2] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20220321144731.3935-1-marcin.szycik@linux.intel.com * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: Revert "ice: Hide bus-info in ethtool for PRs in switchdev mode" ice: link representors to PCI device ==================== Link: https://lore.kernel.org/r/20220506180052.5256-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-09 17:51:01 -07:00
Alaa Mohamed	ca4567f1e6	rtnetlink: add extack support in fdb del handlers Add extack support to .ndo_fdb_del in netdevice.h and all related methods. Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-09 11:58:20 +01:00
Michal Michalik	a11b6c1a38	ice: fix PTP stale Tx timestamps cleanup Read stale PTP Tx timestamps from PHY on cleanup. After running out of Tx timestamps request handlers, hardware (HW) stops reporting finished requests. Function ice_ptp_tx_tstamp_cleanup() used to only clean up stale handlers in driver and was leaving the hardware registers not read. Not reading stale PTP Tx timestamps prevents next interrupts from arriving and makes timestamping unusable. Fixes: `ea9b847cda` ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Michal Michalik <michal.michalik@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-05-06 10:09:24 -07:00
Anatolii Gerasymenko	6096dae926	ice: clear stale Tx queue settings before configuring The iAVF driver uses 3 virtchnl op codes to communicate with the PF regarding the VF Tx queues: * VIRTCHNL_OP_CONFIG_VSI_QUEUES configures the hardware and firmware logic for the Tx queues * VIRTCHNL_OP_ENABLE_QUEUES configures the queue interrupts * VIRTCHNL_OP_DISABLE_QUEUES disables the queue interrupts and Tx rings. There is a bug in the iAVF driver due to the race condition between VF reset request and shutdown being executed in parallel. This leads to a break in logic and VIRTCHNL_OP_DISABLE_QUEUES is not being sent. If this occurs, the PF driver never cleans up the Tx queues. This results in leaving behind stale Tx queue settings in the hardware and firmware. The most obvious outcome is that upon the next VIRTCHNL_OP_CONFIG_VSI_QUEUES, the PF will fail to program the Tx scheduler node due to a lack of space. We need to protect ICE driver against such situation. To fix this, make sure we clear existing stale settings out when handling VIRTCHNL_OP_CONFIG_VSI_QUEUES. This ensures we remove the previous settings. Calling ice_vf_vsi_dis_single_txq should be safe as it will do nothing if the queue is not configured. The function already handles the case when the Tx queue is not currently configured and exits with a 0 return in that case. Fixes: `7ad15440ac` ("ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-05-06 10:09:24 -07:00
Ivan Vecera	486b9eee57	ice: Fix race during aux device (un)plugging Function ice_plug_aux_dev() assigns pf->adev field too early prior aux device initialization and on other side ice_unplug_aux_dev() starts aux device deinit and at the end assigns NULL to pf->adev. This is wrong because pf->adev should always be non-NULL only when aux device is fully initialized and ready. This wrong order causes a crash when ice_send_event_to_aux() call occurs because that function depends on non-NULL value of pf->adev and does not assume that aux device is half-initialized or half-destroyed. After order correction the race window is tiny but it is still there, as Leon mentioned and manipulation with pf->adev needs to be protected by mutex. Fix (un-)plugging functions so pf->adev field is set after aux device init and prior aux device destroy and protect pf->adev assignment by new mutex. This mutex is also held during ice_send_event_to_aux() call to ensure that aux device is valid during that call. Note that device lock used ice_send_event_to_aux() needs to be kept to avoid race with aux drv unload. Reproducer: cycle=1 while :;do echo "#### Cycle: $cycle" ip link set ens7f0 mtu 9000 ip link add bond0 type bond mode 1 miimon 100 ip link set bond0 up ifenslave bond0 ens7f0 ip link set bond0 mtu 9000 ethtool -L ens7f0 combined 1 ip link del bond0 ip link set ens7f0 mtu 1500 sleep 1 let cycle++ done In short when the device is added/removed to/from bond the aux device is unplugged/plugged. When MTU of the device is changed an event is sent to aux device asynchronously. This can race with (un)plugging operation and because pf->adev is set too early (plug) or too late (unplug) the function ice_send_event_to_aux() can touch uninitialized or destroyed fields. In the case of crash below pf->adev->dev.mutex. Crash: [ 53.372066] bond0: (slave ens7f0): making interface the new active one [ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready [ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.248204] bond0: (slave ens7f0): Releasing backup interface [ 54.253955] bond0: (slave ens7f1): making interface the new active one [ 54.274875] bond0: (slave ens7f1): Releasing backup interface [ 54.289153] bond0 (unregistering): Released all slaves [ 55.383179] MII link monitoring set to 100 ms [ 55.398696] bond0: (slave ens7f0): making interface the new active one [ 55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080 [ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 55.412198] #PF: supervisor write access in kernel mode [ 55.412200] #PF: error_code(0x0002) - not-present page [ 55.412201] PGD 25d2ad067 P4D 0 [ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S 5.17.0-13579-g57f2d6540f03 #1 [ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/ 2021 [ 55.430226] Workqueue: ice ice_service_task [ice] [ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20 [ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48 [ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246 [ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001 [ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080 [ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041 [ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0 [ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000 [ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000 [ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0 [ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.567305] PKRU: 55555554 [ 55.570018] Call Trace: [ 55.572474] <TASK> [ 55.574579] ice_service_task+0xaab/0xef0 [ice] [ 55.579130] process_one_work+0x1c5/0x390 [ 55.583141] ? process_one_work+0x390/0x390 [ 55.587326] worker_thread+0x30/0x360 [ 55.590994] ? process_one_work+0x390/0x390 [ 55.595180] kthread+0xe6/0x110 [ 55.598325] ? kthread_complete_and_exit+0x20/0x20 [ 55.603116] ret_from_fork+0x1f/0x30 [ 55.606698] </TASK> Fixes: `f9f5301e7e` ("ice: Register auxiliary device to provide RDMA") Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-05-06 10:09:24 -07:00
Marcin Szycik	e0c7402270	Revert "ice: Hide bus-info in ethtool for PRs in switchdev mode" This reverts commit `bfaaba99e6`. Commit `bfaaba99e6` ("ice: Hide bus-info in ethtool for PRs in switchdev mode") was a workaround for lshw tool displaying incorrect descriptions for port representors and PF in switchdev mode. Now the issue has been fixed in the lshw tool itself [1]. Removing the workaround can be considered a regression, as the user might be running older, unpatched lshw version. However, another important change (ice: link representors to PCI device, which improves port representor netdev naming with SET_NETDEV_DEV) also causes the same "regression" as removing the workaround, i.e. unpatched lshw is able to access bus-info information (this time not via ethtool) and the bug can occur. Therefore, the workaround no longer prevents the bug and can be removed. [1] https://ezix.org/src/pkg/lshw/commit/9bf4e4c9c1 Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-05-06 10:04:17 -07:00
Michal Swiatkowski	6384b76959	ice: link representors to PCI device Link port representors to parent PCI device to benefit from systemd defined naming scheme. Example from ip tool: - without linking: eth0 ... - with linking: eth0 ... altname enp24s0f0npf0vf0 The port representor name is being shown in altname, because the name is longer than IFNAMSIZ (16) limit. Altname can be used in ip tool. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-05-06 09:43:01 -07:00
Jacob Keller	4eaf1797bc	ice: remove period on argument description in ice_for_each_vf The ice_for_each_vf macros have comments describing the implementation. One of the arguments has a period on the end, which is not our typical style. Remove the unnecessary period. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-05-05 11:28:36 -07:00

... 2 3 4 5 6 ...

1632 Commits