2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

339 Commits

Author SHA1 Message Date
Jakub Kicinski
e016cf5f39 netdevsim: 'support' multi-buf XDP
Don't error out on large MTU if XDP is multi-buf.
The ping test now tests ping with XDP and high MTU.
netdevsim doesn't actually run the prog (yet?) so
it doesn't matter if the prog was multi-buf..

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20250311092820.542148-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12 13:39:59 -07:00
Stanislav Fomichev
1d22d3060b net: drop rtnl_lock for queue_mgmt operations
All drivers that use queue API are already converted to use
netdev instance lock. Move netdev instance lock management to
the netlink layer and drop rtnl_lock.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Mina Almasry. <almasrymina@google.com>
Link: https://patch.msgid.link/20250311144026.4154277-4-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12 13:32:35 -07:00
Jakub Kicinski
8ef890df40 net: move misc netdev_lock flavors to a separate header
Move the more esoteric helpers for netdev instance lock to
a dedicated header. This avoids growing netdevice.h to infinity
and makes rebuilding the kernel much faster (after touching
the header with the helpers).

The main netdev_lock() / netdev_unlock() functions are used
in static inlines in netdevice.h and will probably be used
most commonly, so keep them in netdevice.h.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250307183006.2312761-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-08 09:06:50 -08:00
Jakub Kicinski
2bcf4772e4 net: ethtool: try to protect all callback with netdev instance lock
Protect all ethtool callbacks and PHY related state with the netdev
instance lock, for drivers which want / need to have their ops
instance-locked. Basically take the lock everywhere we take rtnl_lock.
It was tempting to take the lock in ethnl_ops_begin(), but turns
out we actually nest those calls (when generating notifications).

Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Cc: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250305163732.2766420-11-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06 12:59:44 -08:00
Stanislav Fomichev
cae03e5bdd net: hold netdev instance lock during queue operations
For the drivers that use queue management API, switch to the mode where
core stack holds the netdev instance lock. This affects the following
drivers:
- bnxt
- gve
- netdevsim

Originally I locked only start/stop, but switched to holding the
lock over all iterations to make them look atomic to the device
(feels like it should be easier to reason about).

Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250305163732.2766420-6-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06 12:59:43 -08:00
Stanislav Fomichev
d4c22ec680 net: hold netdev instance lock during ndo_open/ndo_stop
For the drivers that use shaper API, switch to the mode where
core stack holds the netdev lock. This affects two drivers:

* iavf - already grabs netdev lock in ndo_open/ndo_stop, so mostly
         remove these
* netdevsim - switch to _locked APIs to avoid deadlock

iavf_close diff is a bit confusing, the existing call looks like this:
  iavf_close() {
    netdev_lock()
    ..
    netdev_unlock()
    wait_event_timeout(down_waitqueue)
  }

I change it to the following:
  netdev_lock()
  iavf_close() {
    ..
    netdev_unlock()
    wait_event_timeout(down_waitqueue)
    netdev_lock() // reusing this lock call
  }
  netdev_unlock()

Since I'm reusing existing netdev_lock call, so it looks like I only
add netdev_unlock.

Cc: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250305163732.2766420-2-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06 12:59:43 -08:00
Jakub Kicinski
357660d759 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.14-rc5).

Conflicts:

drivers/net/ethernet/cadence/macb_main.c
  fa52f15c74 ("net: cadence: macb: Synchronize stats calculations")
  75696dd0fd ("net: cadence: macb: Convert to get_stats64")
https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/intel/ice/ice_sriov.c
  79990cf5e7 ("ice: Fix deinitializing VF in error path")
  a203163274 ("ice: simplify VF MSI-X managing")

net/ipv4/tcp.c
  18912c5206 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace")
  297d389e9e ("net: prefix devmem specific helpers")

net/mptcp/subflow.c
  8668860b0a ("mptcp: reset when MPTCP opts are dropped after join")
  c3349a22c2 ("mptcp: consolidate subflow cleanup")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27 10:20:58 -08:00
Jakub Kicinski
29b036be1b selftests: drv-net: test XDP, HDS auto and the ioctl path
Test XDP and HDS interaction. While at it add a test for using the IOCTL,
as that turned out to be the real culprit.

Testing bnxt:

  # NETIF=eth0 ./ksft-net-drv/drivers/net/hds.py
  KTAP version 1
  1..12
  ok 1 hds.get_hds
  ok 2 hds.get_hds_thresh
  ok 3 hds.set_hds_disable # SKIP disabling of HDS not supported by the device
  ok 4 hds.set_hds_enable
  ok 5 hds.set_hds_thresh_zero
  ok 6 hds.set_hds_thresh_max
  ok 7 hds.set_hds_thresh_gt
  ok 8 hds.set_xdp
  ok 9 hds.enabled_set_xdp
  ok 10 hds.ioctl
  ok 11 hds.ioctl_set_xdp
  ok 12 hds.ioctl_enabled_set_xdp
  # Totals: pass:11 fail:0 xfail:0 xpass:0 skip:1 error:0

and netdevsim:

  # ./ksft-net-drv/drivers/net/hds.py
  KTAP version 1
  1..12
  ok 1 hds.get_hds
  ok 2 hds.get_hds_thresh
  ok 3 hds.set_hds_disable
  ok 4 hds.set_hds_enable
  ok 5 hds.set_hds_thresh_zero
  ok 6 hds.set_hds_thresh_max
  ok 7 hds.set_hds_thresh_gt
  ok 8 hds.set_xdp
  ok 9 hds.enabled_set_xdp
  ok 10 hds.ioctl
  ok 11 hds.ioctl_set_xdp
  ok 12 hds.ioctl_enabled_set_xdp
  # Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0

Netdevsim needs a sane default for tx/rx ring size.

ethtool 6.11 is needed for the --disable-netlink option.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Tested-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250221025141.1132944-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24 14:16:37 -08:00
Breno Leitao
bf3624cf1c netdevsim: call napi_schedule from a timer context
The netdevsim driver was experiencing NOHZ tick-stop errors during packet
transmission due to pending softirq work when calling napi_schedule().
This issue was observed when running the netconsole selftest, which
triggered the following error message:

  NOHZ tick-stop error: local softirq work is pending, handler #08!!!

To fix this issue, introduce a timer that schedules napi_schedule()
from a timer context instead of calling it directly from the TX path.

Create an hrtimer for each queue and kick it from the TX path,
which then schedules napi_schedule() from the timer context.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250219-netdevsim-v3-1-811e2b8abc4c@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 13:18:31 -08:00
Jakub Kicinski
285b3f78ea netdevsim: allow normal queue reset while down
Resetting queues while the device is down should be legal.
Allow it, test it. Ideally we'd test this with a real device
supporting devmem but I don't have access to such devices.

Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250206225638.1387810-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-07 17:21:03 -08:00
Linus Torvalds
c2933b2bef First batch of fixes for 6.14. Nothing really stands out,
but as usual there's a slight concentration of fixes for issues
 added in the last two weeks before the MW, and driver bugs
 from 6.13 which tend to get discovered upon wider distribution.
 
 Including fixes from IPSec, netfilter and Bluetooth.
 
 Current release - regressions:
 
  - net: revert RTNL changes in unregister_netdevice_many_notify()
 
  - Bluetooth: fix possible infinite recursion of btusb_reset
 
  - eth: adjust locking in some old drivers which protect their state
 	with spinlocks to avoid sleeping in atomic; core protects
 	netdev state with a mutex now
 
 Previous releases - regressions:
 
  - eth: mlx5e: make sure we pass node ID, not CPU ID to kvzalloc_node()
 
  - eth: bgmac: reduce max frame size to support just 1500 bytes;
 	the jumbo frame support would previously cause OOB writes,
 	but now fails outright
 
  - mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted,
 	avoid false detection of MPTCP blackholing
 
 Previous releases - always broken:
 
  - mptcp: handle fastopen disconnect correctly
 
  - xfrm: make sure skb->sk is a full sock before accessing its fields
 
  - xfrm: fix taking a lock with preempt disabled for RT kernels
 
  - usb: ipheth: improve safety of packet metadata parsing; prevent
 	potential OOB accesses
 
  - eth: renesas: fix missing rtnl lock in suspend/resume path
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmebzXsACgkQMUZtbf5S
 IrvGBQ//auOF2yY1sg40fBvc6Hr1jpZBcr+uqTL6Qka1uVOvTFY51hAN54lBt32+
 ixmcHsD0xdcHrr7VrqSXqurQLiGsdwUpnxZFCj/FymQuMunVysEqudvPeKDVHpsw
 JW5c4nJOexEA2viByK9iB23Qq0P3uBoPEnKrbSTVSDvYaXUj6y8Cvt3/vXc+H/tc
 T7GaxHH55NNNPkRz34YU3OWcaZsgkQEcdVpZf4tODPmg7J5VQj8SQeMhk/HI0sdO
 WKjWB0woZkiQECtamqAOXnv47PXd6igv8NALRPlJcKjs0EszUvuYhD/9MEOeghjI
 sjcQn9JnPpG+ca/qFVCSpEEOo2zGVn5dkJT5x26udH+5XHf7Pq+zpJwB6LHo98yF
 bGMpIrF6gi2EnBtS/tRjMyBU9Ut9KiUtjXMvn9EsD1U1FGIbz6wyQLlT0pG0hwJb
 rEdKfegrcyhWKHOD4vH9ciEg/7lgGfsGyfJDktIMdyailZc6tBcxwbdlc+5jRDA9
 0RqGASIXaiX7AC3WOSeQzMgbV+WXdhaX/yrMJL5KBfBTzxG2audnJt1tPN3mbh3z
 NM6M2cnMsoX4QLSiaukJaCL7LWHSTlVttZVg8FGXHj1PejMQQBVjGVvzk2UF55UR
 gV7X9/VkXhmIDAZgThWtOdPLz+ItksfSKiruhUsXust6JgqRuqc=
 =GjBE
 -----END PGP SIGNATURE-----

Merge tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from IPSec, netfilter and Bluetooth.

  Nothing really stands out, but as usual there's a slight concentration
  of fixes for issues added in the last two weeks before the merge
  window, and driver bugs from 6.13 which tend to get discovered upon
  wider distribution.

  Current release - regressions:

   - net: revert RTNL changes in unregister_netdevice_many_notify()

   - Bluetooth: fix possible infinite recursion of btusb_reset

   - eth: adjust locking in some old drivers which protect their state
     with spinlocks to avoid sleeping in atomic; core protects netdev
     state with a mutex now

  Previous releases - regressions:

   - eth:
      - mlx5e: make sure we pass node ID, not CPU ID to kvzalloc_node()
      - bgmac: reduce max frame size to support just 1500 bytes; the
        jumbo frame support would previously cause OOB writes, but now
        fails outright

   - mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted, avoid
     false detection of MPTCP blackholing

  Previous releases - always broken:

   - mptcp: handle fastopen disconnect correctly

   - xfrm:
      - make sure skb->sk is a full sock before accessing its fields
      - fix taking a lock with preempt disabled for RT kernels

   - usb: ipheth: improve safety of packet metadata parsing; prevent
     potential OOB accesses

   - eth: renesas: fix missing rtnl lock in suspend/resume path"

* tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
  MAINTAINERS: add Neal to TCP maintainers
  net: revert RTNL changes in unregister_netdevice_many_notify()
  net: hsr: fix fill_frame_info() regression vs VLAN packets
  doc: mptcp: sysctl: blackhole_timeout is per-netns
  mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted
  netfilter: nf_tables: reject mismatching sum of field_len with set key length
  net: sh_eth: Fix missing rtnl lock in suspend/resume path
  net: ravb: Fix missing rtnl lock in suspend/resume path
  selftests/net: Add test for loading devbound XDP program in generic mode
  net: xdp: Disallow attaching device-bound programs in generic mode
  tcp: correct handling of extreme memory squeeze
  bgmac: reduce max frame size to support just MTU 1500
  vsock/test: Add test for connect() retries
  vsock/test: Add test for UAF due to socket unbinding
  vsock/test: Introduce vsock_connect_fd()
  vsock/test: Introduce vsock_bind()
  vsock: Allow retrying on connect() failure
  vsock: Keep the binding until socket destruction
  Bluetooth: L2CAP: accept zero as a special value for MTU auto-selection
  Bluetooth: btnxpuart: Fix glitches seen in dual A2DP streaming
  ...
2025-01-30 12:24:20 -08:00
Linus Torvalds
2ab002c755 Driver core and debugfs updates
Here is the big set of driver core and debugfs updates for 6.14-rc1.
 It's coming late in the merge cycle as there are a number of merge
 conflicts with your tree now, and I wanted to make sure they were
 working properly.  To resolve them, look in linux-next, and I will send
 the "fixup" patch as a response to the pull request.
 
 Included in here is a bunch of driver core, PCI, OF, and platform rust
 bindings (all acked by the different subsystem maintainers), hence the
 merge conflict with the rust tree, and some driver core api updates to
 mark things as const, which will also require some fixups due to new
 stuff coming in through other trees in this merge window.
 
 There are also a bunch of debugfs updates from Al, and there is at least
 one user that does have a regression with these, but Al is working on
 tracking down the fix for it.  In my use (and everyone else's linux-next
 use), it does not seem like a big issue at the moment.
 
 Here's a short list of the things in here:
   - driver core bindings for PCI, platform, OF, and some i/o functions.
     We are almost at the "write a real driver in rust" stage now,
     depending on what you want to do.
   - misc device rust bindings and a sample driver to show how to use
     them
   - debugfs cleanups in the fs as well as the users of the fs api for
     places where drivers got it wrong or were unnecessarily doing things
     in complex ways.
   - driver core const work, making more of the api take const * for
     different parameters to make the rust bindings easier overall.
   - other small fixes and updates
 
 All of these have been in linux-next with all of the aforementioned
 merge conflicts, and the one debugfs issue, which looks to be resolved
 "soon".
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ5koPA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymFHACfT5acDKf2Bov2Lc/5u3vBW/R6ChsAnj+LmgVI
 hcDSPodj4szR40RRnzBd
 =u5Ey
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core and debugfs updates from Greg KH:
 "Here is the big set of driver core and debugfs updates for 6.14-rc1.

  Included in here is a bunch of driver core, PCI, OF, and platform rust
  bindings (all acked by the different subsystem maintainers), hence the
  merge conflict with the rust tree, and some driver core api updates to
  mark things as const, which will also require some fixups due to new
  stuff coming in through other trees in this merge window.

  There are also a bunch of debugfs updates from Al, and there is at
  least one user that does have a regression with these, but Al is
  working on tracking down the fix for it. In my use (and everyone
  else's linux-next use), it does not seem like a big issue at the
  moment.

  Here's a short list of the things in here:

   - driver core rust bindings for PCI, platform, OF, and some i/o
     functions.

     We are almost at the "write a real driver in rust" stage now,
     depending on what you want to do.

   - misc device rust bindings and a sample driver to show how to use
     them

   - debugfs cleanups in the fs as well as the users of the fs api for
     places where drivers got it wrong or were unnecessarily doing
     things in complex ways.

   - driver core const work, making more of the api take const * for
     different parameters to make the rust bindings easier overall.

   - other small fixes and updates

  All of these have been in linux-next with all of the aforementioned
  merge conflicts, and the one debugfs issue, which looks to be resolved
  "soon""

* tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (95 commits)
  rust: device: Use as_char_ptr() to avoid explicit cast
  rust: device: Replace CString with CStr in property_present()
  devcoredump: Constify 'struct bin_attribute'
  devcoredump: Define 'struct bin_attribute' through macro
  rust: device: Add property_present()
  saner replacement for debugfs_rename()
  orangefs-debugfs: don't mess with ->d_name
  octeontx2: don't mess with ->d_parent or ->d_parent->d_name
  arm_scmi: don't mess with ->d_parent->d_name
  slub: don't mess with ->d_name
  sof-client-ipc-flood-test: don't mess with ->d_name
  qat: don't mess with ->d_name
  xhci: don't mess with ->d_iname
  mtu3: don't mess wiht ->d_iname
  greybus/camera - stop messing with ->d_iname
  mediatek: stop messing with ->d_iname
  netdevsim: don't embed file_operations into your structs
  b43legacy: make use of debugfs_get_aux()
  b43: stop embedding struct file_operations into their objects
  carl9170: stop embedding file_operations into their objects
  ...
2025-01-28 12:25:12 -08:00
Jakub Kicinski
6db9d3a536 netdevsim: don't assume core pre-populates HDS params on GET
Syzbot reports:

  BUG: KMSAN: uninit-value in nsim_get_ringparam+0xa8/0xe0 drivers/net/netdevsim/ethtool.c:77
   nsim_get_ringparam+0xa8/0xe0 drivers/net/netdevsim/ethtool.c:77
   ethtool_set_ringparam+0x268/0x570 net/ethtool/ioctl.c:2072
   __dev_ethtool net/ethtool/ioctl.c:3209 [inline]
   dev_ethtool+0x126d/0x2a40 net/ethtool/ioctl.c:3398
   dev_ioctl+0xb0e/0x1280 net/core/dev_ioctl.c:759

This is the SET path, where we call GET to either check user request
against max values, or check if any of the settings will change.

The logic in netdevsim is trying to report the default (ENABLED)
if user has not requested any specific setting. The user setting
is recorded in dev->cfg, don't depend on kernel_ringparam being
pre-populated with it.

Fixes: 928459bbda ("net: ethtool: populate the default HDS params in the core")
Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+b3bcd80232d00091e061@syzkaller.appspotmail.com
Tested-by: syzbot+b3bcd80232d00091e061@syzkaller.appspotmail.com
Link: https://patch.msgid.link/20250123221410.1067678-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:37:11 -08:00
Jakub Kicinski
50bf398e1c net: netdevsim: try to close UDP port harness races
syzbot discovered that we remove the debugfs files after we free
the netdev. Try to clean up the relevant dir while the device
is still around.

Reported-by: syzbot+2e5de9e3ab986b71d2bf@syzkaller.appspotmail.com
Fixes: 424be63ad8 ("netdevsim: add UDP tunnel port offload support")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250122224503.762705-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:24:34 -08:00
Jakub Kicinski
928459bbda net: ethtool: populate the default HDS params in the core
The core has the current HDS config, it can pre-populate the values
for the drivers. While at it, remove the zero-setting in netdevsim.
Zero are the default values since the config is zalloc'ed.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250119020518.1962249-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 11:44:58 -08:00
Jakub Kicinski
3c836451ca net: move HDS config from ethtool state
Separate the HDS config from the ethtool state struct.
The HDS config contains just simple parameters, not state.
Having it as a separate struct will make it easier to clone / copy
and also long term potentially make it per-queue.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250119020518.1962249-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 11:44:57 -08:00
Jakub Kicinski
ebda2f0bbd net: add netdev_lock() / netdev_unlock() helpers
Add helpers for locking the netdev instance, use it in drivers
and the shaper code. This will make grepping for the lock usage
much easier, as we extend the lock to cover more fields.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20250115035319.559603-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15 19:13:33 -08:00
Taehee Yoo
f394d07b19 netdevsim: add HDS feature
HDS options(tcp-data-split, hds-thresh) have dependencies between other
features like XDP. Basic dependencies are checked in the core API.
netdevsim is very useful to check basic dependencies.

The default tcp-data-split mode is UNKNOWN but netdevsim driver
returns ENABLED when ethtool dumps tcp-data-split mode.
The default value of HDS threshold is 0 and the maximum value is 1024.

ethtool shows like this.

ethtool -g eni1np1
Ring parameters for eni1np1:
Pre-set maximums:
...
HDS thresh:             1024
Current hardware settings:
...
TCP data split:         on
HDS thresh:             0

ethtool -G eni1np1 tcp-data-split on hds-thresh 1024
ethtool -g eni1np1
Ring parameters for eni1np1:
Pre-set maximums:
...
HDS thresh:             1024
Current hardware settings:
...
TCP data split:         on
HDS thresh:             1024

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250114142852.3364986-10-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15 14:42:12 -08:00
Al Viro
1eb87004d2 netdevsim: don't embed file_operations into your structs
Just use debugfs_get_aux() instead of debugfs_real_fops().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20250112080705.141166-10-viro@zeniv.linux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-15 13:14:36 +01:00
Jakub Kicinski
6917d207b4 netdevsim: add debugfs-triggered queue reset
Support triggering queue reset via debugfs for an upcoming test.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09 15:33:08 +01:00
Jakub Kicinski
5bc8e8dbef netdevsim: add queue management API support
Add queue management API support. We need a way to reset queues
to test NAPI reordering, the queue management API provides a
handy scaffolding for that.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09 15:33:08 +01:00
Jakub Kicinski
a565dd04a1 netdevsim: add queue alloc/free helpers
We'll need the code to allocate and free queues in the queue management
API, factor it out.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09 15:33:08 +01:00
Jakub Kicinski
915c82f842 netdevsim: allocate rqs individually
Make nsim->rqs an array of pointers and allocate them individually
so that we can swap them out one by one.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09 15:33:08 +01:00
Jakub Kicinski
00adf88b18 netdevsim: support NAPI config
Link the NAPI instances to their configs. This will be needed to test
that NAPI config doesn't break list ordering.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09 15:33:08 +01:00
Eric Dumazet
b9b8301d36 net: netdevsim: fix nsim_pp_hold_write()
nsim_pp_hold_write() has two problems:

1) It may return with rtnl held, as found by syzbot.

2) Its return value does not propagate an error if any.

Fixes: 1580cbcbfe ("net: netdevsim: add some fake page pool use")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241216083703.1859921-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17 17:46:17 -08:00
Eric Dumazet
ee76746387 netdevsim: prevent bad user input in nsim_dev_health_break_write()
If either a zero count or a large one is provided, kernel can crash.

Fixes: 82c93a87bf ("netdevsim: implement couple of testing devlink health reporters")
Reported-by: syzbot+ea40e4294e58b0292f74@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/675c6862.050a0220.37aaf.00b1.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20241213172518.2415666-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-15 13:26:47 -08:00
Sabrina Dubroca
494bd83bb5 netdevsim: add more hw_features
netdevsim currently only set HW_TC in its hw_features, but other
features should also be present to better reflect the behavior of real
HW.

In my macsec offload testing, this ends up as HW_CSUM being missing
from hw_features, so it doesn't stick in wanted_features when offload
is turned off. Then HW_CSUM (and thus TSO, thanks to
netdev_fix_features) is not automatically turned back on when offload
is re-enabled.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/b918dc4dd76410a57f7516a855f66b0a2bd58326.1730929545.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11 14:12:20 -08:00
Jakub Kicinski
5b1c965956 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.12-rc6).

Conflicts:

drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c
  cbe84e9ad5 ("wifi: iwlwifi: mvm: really send iwl_txpower_constraints_cmd")
  188a1bf894 ("wifi: mac80211: re-order assigning channel in activate links")
https://lore.kernel.org/all/20241028123621.7bbb131b@canb.auug.org.au/

net/mac80211/cfg.c
  c4382d5ca1 ("wifi: mac80211: update the right link for tx power")
  8dd0498983 ("wifi: mac80211: Fix setting txpower with emulate_chanctx")

drivers/net/ethernet/intel/ice/ice_ptp_hw.h
  6e58c33106 ("ice: fix crash on probe for DPLL enabled E810 LOM")
  e4291b64e1 ("ice: Align E810T GPIO to other products")
  ebb2693f8f ("ice: Read SDP section from NVM for pin definitions")
  ac532f4f42 ("ice: Cleanup unused declarations")
https://lore.kernel.org/all/20241030120524.1ee1af18@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31 18:10:07 -07:00
Zichen Xie
4ce1f56a1e netdevsim: Add trailing zero to terminate the string in nsim_nexthop_bucket_activity_write()
This was found by a static analyzer.
We should not forget the trailing zero after copy_from_user()
if we will further do some string operations, sscanf() in this
case. Adding a trailing zero will ensure that the function
performs properly.

Fixes: c6385c0b67 ("netdevsim: Allow reporting activity on nexthop buckets")
Signed-off-by: Zichen Xie <zichenxie0106@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20241022171907.8606-1-zichenxie0106@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:43:01 -07:00
Ales Nezbeda
1a629afd59 netdevsim: macsec: pad u64 to correct length in logs
Commit 02b34d03a2 ("netdevsim: add dummy macsec offload") pads u64
number to 8 characters using "%08llx" format specifier.

Changing format specifier to "%016llx" ensures that no matter the value
the representation of number in log is always the same length.

Before this patch, entry in log for value '1' would say:
    removing SecY with SCI 00000001 at index 2
After this patch is applied, entry in log will say:
    removing SecY with SCI 0000000000000001 at index 2

Signed-off-by: Ales Nezbeda <anezbeda@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20241017131933.136971-1-anezbeda@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23 09:51:06 +02:00
Paolo Abeni
91afa49a3e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.12-rc4).

Conflicts:

107a034d5c ("net/mlx5: qos: Store rate groups in a qos domain")
1da9cfd6c4 ("net/mlx5: Unregister notifier on eswitch init failure")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-21 09:14:18 +02:00
Eric Dumazet
a1494d532e netdevsim: use cond_resched() in nsim_dev_trap_report_work()
I am still seeing many syzbot reports hinting that syzbot
might fool nsim_dev_trap_report_work() with hundreds of ports [1]

Lets use cond_resched(), and system_unbound_wq
instead of implicit system_wq.

[1]
INFO: task syz-executor:20633 blocked for more than 143 seconds.
      Not tainted 6.12.0-rc2-syzkaller-00205-g1d227fcc7222 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor    state:D stack:25856 pid:20633 tgid:20633 ppid:1      flags:0x00004006
...
NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 16760 Comm: kworker/1:0 Not tainted 6.12.0-rc2-syzkaller-00205-g1d227fcc7222 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Workqueue: events nsim_dev_trap_report_work
 RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x70 kernel/kcov.c:210
Code: 89 fb e8 23 00 00 00 48 8b 3d 04 fb 9c 0c 48 89 de 5b e9 c3 c7 5d 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <f3> 0f 1e fa 48 8b 04 24 65 48 8b 0c 25 c0 d7 03 00 65 8b 15 60 f0
RSP: 0018:ffffc90000a187e8 EFLAGS: 00000246
RAX: 0000000000000100 RBX: ffffc90000a188e0 RCX: ffff888027d3bc00
RDX: ffff888027d3bc00 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88804a2e6000 R08: ffffffff8a4bc495 R09: ffffffff89da3577
R10: 0000000000000004 R11: ffffffff8a4bc2b0 R12: dffffc0000000000
R13: ffff88806573b503 R14: dffffc0000000000 R15: ffff8880663cca00
FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc90a747f98 CR3: 000000000e734000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 000000000000002b DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
 <NMI>
 </NMI>
 <TASK>
  __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
  spin_unlock_bh include/linux/spinlock.h:396 [inline]
  nsim_dev_trap_report drivers/net/netdevsim/dev.c:820 [inline]
  nsim_dev_trap_report_work+0x75d/0xaa0 drivers/net/netdevsim/dev.c:850
  process_one_work kernel/workqueue.c:3229 [inline]
  process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
  worker_thread+0x870/0xd30 kernel/workqueue.c:3391
  kthread+0x2f0/0x390 kernel/kthread.c:389
  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>

Fixes: ba5e127214 ("netdevsim: avoid potential loop in nsim_dev_trap_report_work()")
Reported-by: syzbot+d383dc9579a76f56c251@syzkaller.appspotmail.com
Reported-by: syzbot+c596faae21a68bf7afd0@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20241012094230.3893510-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 09:58:43 -07:00
Hangbin Liu
2cf567f421 netdevsim: copy addresses for both in and out paths
The current code only copies the address for the in path, leaving the out
path address set to 0. This patch corrects the issue by copying the addresses
for both the in and out paths. Before this patch:

  # cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
  SA count=2 tx=20
  sa[0] tx ipaddr=0.0.0.0
  sa[0]    spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
  sa[0]    key=0x3167608a ca4f1397 43565909 941fa627
  sa[1] rx ipaddr=192.168.0.1
  sa[1]    spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
  sa[1]    key=0x3167608a ca4f1397 43565909 941fa627

After this patch:

  = cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
  SA count=2 tx=20
  sa[0] tx ipaddr=192.168.0.2
  sa[0]    spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
  sa[0]    key=0x3167608a ca4f1397 43565909 941fa627
  sa[1] rx ipaddr=192.168.0.1
  sa[1]    spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
  sa[1]    key=0x3167608a ca4f1397 43565909 941fa627

Fixes: 7699353da8 ("netdevsim: add ipsec offload testing")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241010040027.21440-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11 15:44:27 -07:00
Hangbin Liu
c71bc6da61 netdevsim: print human readable IP address
Currently, IPSec addresses are printed in hexadecimal format, which is
not user-friendly. e.g.

  # cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
  SA count=2 tx=20
  sa[0] rx ipaddr=0x00000000 00000000 00000000 0100a8c0
  sa[0]    spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
  sa[0]    key=0x3167608a ca4f1397 43565909 941fa627
  sa[1] tx ipaddr=0x00000000 00000000 00000000 00000000
  sa[1]    spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
  sa[1]    key=0x3167608a ca4f1397 43565909 941fa627

This patch updates the code to print the IPSec address in a human-readable
format for easier debug. e.g.

 # cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
 SA count=4 tx=40
 sa[0] tx ipaddr=0.0.0.0
 sa[0]    spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
 sa[0]    key=0x3167608a ca4f1397 43565909 941fa627
 sa[1] rx ipaddr=192.168.0.1
 sa[1]    spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
 sa[1]    key=0x3167608a ca4f1397 43565909 941fa627
 sa[2] tx ipaddr=::
 sa[2]    spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
 sa[2]    key=0x3167608a ca4f1397 43565909 941fa627
 sa[3] rx ipaddr=2000::1
 sa[3]    spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
 sa[3]    key=0x3167608a ca4f1397 43565909 941fa627

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241010040027.21440-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11 15:44:27 -07:00
Paolo Abeni
b3ea416419 testing: net-drv: add basic shaper test
Leverage a basic/dummy netdevsim implementation to do functional
coverage for NL interface.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/43092afbf38365c796088bf8fc155e523ab434ae.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 08:30:23 -07:00
Al Viro
cb787f4ac0 [tree-wide] finally take no_llseek out
no_llseek had been defined to NULL two years ago, in commit 868941b144
("fs: remove no_llseek")

To quote that commit,

  At -rc1 we'll need do a mechanical removal of no_llseek -

  git grep -l -w no_llseek | grep -v porting.rst | while read i; do
	sed -i '/\<no_llseek\>/d' $i
  done

  would do it.

Unfortunately, that hadn't been done.  Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
	.llseek = no_llseek,
so it's obviously safe.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-27 08:18:43 -07:00
Kory Maincent
2111375b85 net: Add struct kernel_ethtool_ts_info
In prevision to add new UAPI for hwtstamp we will be limited to the struct
ethtool_ts_info that is currently passed in fixed binary format through the
ETHTOOL_GET_TS_INFO ethtool ioctl. It would be good if new kernel code
already started operating on an extensible kernel variant of that
structure, similar in concept to struct kernel_hwtstamp_config vs struct
hwtstamp_config.

Since struct ethtool_ts_info is in include/uapi/linux/ethtool.h, here
we introduce the kernel-only structure in include/linux/ethtool.h.
The manual copy is then made in the function called by ETHTOOL_GET_TS_INFO.

Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240709-feature_ptp_netnext-v17-6-b5317f50df2a@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15 08:02:26 -07:00
David Wei
5add2f7288 netdevsim: fix backwards compatibility in nsim_get_iflink()
The default ndo_get_iflink() implementation returns the current ifindex
of the netdev. But the overridden nsim_get_iflink() returns 0 if the
current nsim is not linked, breaking backwards compatibility for
userspace that depend on this behaviour.

Fix the problem by returning the current ifindex if not linked to a
peer.

Fixes: 8debcf5832 ("netdevsim: add ndo_get_iflink() implementation")
Reported-by: Yu Watanabe <watanabe.yu@gmail.com>
Suggested-by: Yu Watanabe <watanabe.yu@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 11:51:04 +01:00
David Wei
3762ec05a9 netdevsim: add NAPI support
Add NAPI support to netdevim, similar to veth.

* Add a nsim_rq rx queue structure to hold a NAPI instance and a skb
  queue.
* During xmit, store the skb in the peer skb queue and schedule NAPI.
* During napi_poll(), drain the skb queue and pass up the stack.
* Add assoc between rxq and NAPI instance using netif_queue_set_napi().

Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://lore.kernel.org/r/20240507163228.2066817-2-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08 18:59:47 -07:00
Eric Dumazet
1eb2cded45 net: annotate writes on dev->mtu from ndo_change_mtu()
Simon reported that ndo_change_mtu() methods were never
updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted
in commit 501a90c945 ("inet: protect against too small
mtu values.")

We read dev->mtu without holding RTNL in many places,
with READ_ONCE() annotations.

It is time to take care of ndo_change_mtu() methods
to use corresponding WRITE_ONCE()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Simon Horman <horms@kernel.org>
Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07 16:19:14 -07:00
Jakub Kicinski
1580cbcbfe net: netdevsim: add some fake page pool use
Add very basic page pool use so that we can exercise
the netlink uAPI in a selftest.

Page pool gets created on open, destroyed on close.
But we control allocating of a single page thru debugfs.
This page may survive past the page pool itself so that
we can test orphaned page pools.

Link: https://lore.kernel.org/r/20240412141436.828666-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-15 11:21:11 -07:00
Jakub Kicinski
f216306bfb netdevsim: report stats by default, like a real device
Real devices should implement qstats. Devices which support
pause or FEC configuration should also report the relevant stats.

nsim was missing FEC stats completely, some of the qstats
and pause stats required toggling a debugfs knob.

Note that the tests which used pause always initialize the setting
so they shouldn't be affected by the different starting value.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-08 11:40:41 +01:00
David Wei
8debcf5832 netdevsim: add ndo_get_iflink() implementation
Add an implementation for ndo_get_iflink() in netdevsim that shows the
ifindex of the linked peer, if any.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:43:10 +00:00
David Wei
9eb95228a7 netdevsim: forward skbs from one connected port to another
Forward skbs sent from one netdevsim port to its connected netdevsim
port using dev_forward_skb, in a spirit similar to veth.

Add a tx_dropped variable to struct netdevsim, tracking the number of
skbs that could not be forwarded using dev_forward_skb().

The xmit() function accessing the peer ptr is protected by an RCU read
critical section. The rcu_read_lock() is functionally redundant as since
v5.0 all softirqs are implicitly RCU read critical sections; but it is
useful for human readers.

If another CPU is concurrently in nsim_destroy(), then it will first set
the peer ptr to NULL. This does not affect any existing readers that
dereferenced a non-NULL peer. Then, in unregister_netdevice(), there is
a synchronize_rcu() before the netdev is actually unregistered and
freed. This ensures that any readers i.e. xmit() that got a non-NULL
peer will complete before the netdev is freed.

Any readers after the RCU_INIT_POINTER() but before synchronize_rcu()
will dereference NULL, making it safe.

The codepath to nsim_destroy() and nsim_create() takes both the newly
added nsim_dev_list_lock and rtnl_lock. This makes it safe with
concurrent calls to linking two netdevsims together.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:43:10 +00:00
David Wei
f532957d76 netdevsim: allow two netdevsim ports to be connected
Add two netdevsim bus attribute to sysfs:
/sys/bus/netdevsim/link_device
/sys/bus/netdevsim/unlink_device

Writing "A M B N" to link_device will link netdevsim M in netnsid A with
netdevsim N in netnsid B.

Writing "A M" to unlink_device will unlink netdevsim M in netnsid A from
its peer, if any.

rtnl_lock is taken to ensure nothing changes during the linking.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:43:10 +00:00
Ricardo B. Marliere
52042e092b net: netdevsim: constify the struct device_type usage
Since commit aed65af1cc ("drivers: make device_type const"), the driver
core can properly handle constant struct device_type. Move the
nsim_bus_dev_type variable to be a constant structure as well, placing it
into read-only memory which can not be modified at runtime.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-21 09:45:23 +00:00
Jakub Kicinski
3be042cf46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

drivers/net/ethernet/stmicro/stmmac/common.h
  38cc3c6dcc ("net: stmmac: protect updates of 64-bit statistics counters")
  fd5a6a7131 ("net: stmmac: est: Per Tx-queue error count for HLBF")
  c5c3e1bfc9 ("net: stmmac: Offload queueMaxSDU from tc-taprio")

drivers/net/wireless/microchip/wilc1000/netdev.c
  c901388028 ("wifi: fill in MODULE_DESCRIPTION()s for wilc1000")
  328efda22a ("wifi: wilc1000: do not realloc workqueue everytime an interface is added")

net/unix/garbage.c
  11498715f2 ("af_unix: Remove io_uring code for GC.")
  1279f9d9de ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-08 15:30:33 -08:00
Ricardo B. Marliere
56b93cd358 netdevsim: make nsim_bus const
Now that the driver core can properly handle constant struct bus_type,
move the nsim_bus variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-07 13:40:27 +00:00
Eric Dumazet
ba5e127214 netdevsim: avoid potential loop in nsim_dev_trap_report_work()
Many syzbot reports include the following trace [1]

If nsim_dev_trap_report_work() can not grab the mutex,
it should rearm itself at least one jiffie later.

[1]
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 32383 Comm: kworker/0:2 Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Workqueue: events nsim_dev_trap_report_work
 RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:89 [inline]
 RIP: 0010:memory_is_nonzero mm/kasan/generic.c:104 [inline]
 RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline]
 RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
 RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
 RIP: 0010:kasan_check_range+0x101/0x190 mm/kasan/generic.c:189
Code: 07 49 39 d1 75 0a 45 3a 11 b8 01 00 00 00 7c 0b 44 89 c2 e8 21 ed ff ff 83 f0 01 5b 5d 41 5c c3 48 85 d2 74 4f 48 01 ea eb 09 <48> 83 c0 01 48 39 d0 74 41 80 38 00 74 f2 eb b6 41 bc 08 00 00 00
RSP: 0018:ffffc90012dcf998 EFLAGS: 00000046
RAX: fffffbfff258af1e RBX: fffffbfff258af1f RCX: ffffffff8168eda3
RDX: fffffbfff258af1f RSI: 0000000000000004 RDI: ffffffff92c578f0
RBP: fffffbfff258af1e R08: 0000000000000000 R09: fffffbfff258af1e
R10: ffffffff92c578f3 R11: ffffffff8acbcbc0 R12: 0000000000000002
R13: ffff88806db38400 R14: 1ffff920025b9f42 R15: ffffffff92c578e8
FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c00994e078 CR3: 000000002c250000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <NMI>
 </NMI>
 <TASK>
  instrument_atomic_read include/linux/instrumented.h:68 [inline]
  atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline]
  queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline]
  debug_spin_unlock kernel/locking/spinlock_debug.c:101 [inline]
  do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:141
  __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:150 [inline]
  _raw_spin_unlock_irqrestore+0x22/0x70 kernel/locking/spinlock.c:194
  debug_object_activate+0x349/0x540 lib/debugobjects.c:726
  debug_work_activate kernel/workqueue.c:578 [inline]
  insert_work+0x30/0x230 kernel/workqueue.c:1650
  __queue_work+0x62e/0x11d0 kernel/workqueue.c:1802
  __queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953
  queue_delayed_work_on+0x106/0x130 kernel/workqueue.c:1989
  queue_delayed_work include/linux/workqueue.h:563 [inline]
  schedule_delayed_work include/linux/workqueue.h:677 [inline]
  nsim_dev_trap_report_work+0x9c0/0xc80 drivers/net/netdevsim/dev.c:842
  process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
  process_scheduled_works kernel/workqueue.c:2706 [inline]
  worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
  kthread+0x2c6/0x3a0 kernel/kthread.c:388
  ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
 </TASK>

Fixes: 012ec02ae4 ("netdevsim: convert driver to use unlocked devlink API during init/fini")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240201175324.3752746-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-02 11:00:38 -08:00
Jakub Kicinski
ea937f7720 net: netdevsim: don't try to destroy PHC on VFs
PHC gets initialized in nsim_init_netdevsim(), which
is only called if (nsim_dev_port_is_pf()).

Create a counterpart of nsim_init_netdevsim() and
move the mock_phc_destroy() there.

This fixes a crash trying to destroy netdevsim with
VFs instantiated, as caught by running the devlink.sh test:

    BUG: kernel NULL pointer dereference, address: 00000000000000b8
    RIP: 0010:mock_phc_destroy+0xd/0x30
    Call Trace:
     <TASK>
     nsim_destroy+0x4a/0x70 [netdevsim]
     __nsim_dev_port_del+0x47/0x70 [netdevsim]
     nsim_dev_reload_destroy+0x105/0x120 [netdevsim]
     nsim_drv_remove+0x2f/0xb0 [netdevsim]
     device_release_driver_internal+0x1a1/0x210
     bus_remove_device+0xd5/0x120
     device_del+0x159/0x490
     device_unregister+0x12/0x30
     del_device_store+0x11a/0x1a0 [netdevsim]
     kernfs_fop_write_iter+0x130/0x1d0
     vfs_write+0x30b/0x4b0
     ksys_write+0x69/0xf0
     do_syscall_64+0xcc/0x1e0
     entry_SYSCALL_64_after_hwframe+0x6f/0x77

Fixes: b63e78fca8 ("net: netdevsim: use mock PHC driver")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-17 10:56:44 +00:00